Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Gracefully handle empty intervals in difference (or other operators) #56

Closed
damien-sornette opened this issue Jun 12, 2023 · 6 comments
Assignees

Comments

@damien-sornette
Copy link

Problem

Currently, passing an empty IntervalArray or IntervalSet to the difference operator will cause an error to be thrown. It's not so much a problem when you just make a difference between two elements, a simple if will protect you. However, it becomes a bit tedious to handle with 3 elements where the second and third can sometimes be empty. The number of possible cases becomes exponentially higher when you add more sets.

Of course you can add the arguments in an array and the iterate over them to "accumulate" the difference, but it seems overcomplicated for something that could be handled gracefully by the lib

Unless there is a reason not to ?

Solution

When an empty argument is passed, it is ignored, and the reference (leftmost argument) interval is returned. And if the left most argument is empty, then return it.

@venaturum
Copy link
Collaborator

Hi @damien-sornette , thanks for opening this issue. I'll look to implement and publish within the next month unless I hit an obstacle. Cheers.

@damien-sornette
Copy link
Author

Great, thanks ! I'll keep an eye on this issue then :)

@venaturum venaturum self-assigned this Jun 27, 2023
@venaturum
Copy link
Collaborator

Hi @damien-sornette,

Are you able to provide a reproducible example. The following works ok for me so maybe I'm not imagining the situation properly

import pandas as pd
import piso
piso.register_accessors()

arr1 = pd.arrays.IntervalArray.from_tuples(
    [(0, 4), (2, 5), (3, 6), (7, 8), (8, 9), (10, 12)],
)

arr2 = pd.arrays.IntervalArray.from_tuples([])

piso.difference(arr1, arr2)
arr1.piso.difference(arr2)

Are you using the latest version of piso?

Cheers.

@damien-sornette
Copy link
Author

damien-sornette commented Jun 27, 2023

I'm currently using piso 0.9.0, it seems to be the last version available on pip.

Your code work for me, but this example doesn't :

import pandas as pd
from pandas.core.arrays import IntervalArray
from piso import difference

if __name__ == '__main__':
    an_interval = pd.Interval(pd.Timestamp('2017-01-01T12'), pd.Timestamp('2018-01-01T12'), closed="left")
    difference(IntervalArray([an_interval]), IntervalArray([]))

I built this case to look as close as possible to my real code.

The main difference I see is that I'm using a different way to build my Interval, and it's a time interval, not an int interval.

Let me know if you need more infos (or if I made a stupid mistake that explains that behaviour) !

@venaturum
Copy link
Collaborator

Thanks @damien-sornette, that's all I need. By default pandas gives empty interval arrays a "right" closed value and non empty interval arrays a default of "left" closed value - not ideal in my mind. Before performing an operation piso checks that all the closed values in an operation involving multiple interval arrays are all the same, and this is what is causing the issue.

difference(IntervalArray([an_interval], closed="right"), IntervalArray([]))  # works
difference(IntervalArray([an_interval]), IntervalArray([], closed="left"))  # works

I have implemented a solution, and just waiting on pandas to publish a fix for an issue which has broken my test pipelines on Mac and Windows before I can publish a new package. pandas-dev/pandas#53858

@venaturum
Copy link
Collaborator

Closed with #58 and release of piso 1.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants