-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Gracefully handle empty intervals in difference (or other operators) #56
Comments
Hi @damien-sornette , thanks for opening this issue. I'll look to implement and publish within the next month unless I hit an obstacle. Cheers. |
Great, thanks ! I'll keep an eye on this issue then :) |
Hi @damien-sornette, Are you able to provide a reproducible example. The following works ok for me so maybe I'm not imagining the situation properly import pandas as pd
import piso
piso.register_accessors()
arr1 = pd.arrays.IntervalArray.from_tuples(
[(0, 4), (2, 5), (3, 6), (7, 8), (8, 9), (10, 12)],
)
arr2 = pd.arrays.IntervalArray.from_tuples([])
piso.difference(arr1, arr2)
arr1.piso.difference(arr2) Are you using the latest version of piso? Cheers. |
I'm currently using piso 0.9.0, it seems to be the last version available on pip. Your code work for me, but this example doesn't : import pandas as pd
from pandas.core.arrays import IntervalArray
from piso import difference
if __name__ == '__main__':
an_interval = pd.Interval(pd.Timestamp('2017-01-01T12'), pd.Timestamp('2018-01-01T12'), closed="left")
difference(IntervalArray([an_interval]), IntervalArray([])) I built this case to look as close as possible to my real code. The main difference I see is that I'm using a different way to build my Interval, and it's a time interval, not an int interval. Let me know if you need more infos (or if I made a stupid mistake that explains that behaviour) ! |
Thanks @damien-sornette, that's all I need. By default pandas gives empty interval arrays a "right" closed value and non empty interval arrays a default of "left" closed value - not ideal in my mind. Before performing an operation piso checks that all the closed values in an operation involving multiple interval arrays are all the same, and this is what is causing the issue. difference(IntervalArray([an_interval], closed="right"), IntervalArray([])) # works
difference(IntervalArray([an_interval]), IntervalArray([], closed="left")) # works I have implemented a solution, and just waiting on pandas to publish a fix for an issue which has broken my test pipelines on Mac and Windows before I can publish a new package. pandas-dev/pandas#53858 |
Closed with #58 and release of piso 1.0.0 |
Problem
Currently, passing an empty IntervalArray or IntervalSet to the difference operator will cause an error to be thrown. It's not so much a problem when you just make a difference between two elements, a simple if will protect you. However, it becomes a bit tedious to handle with 3 elements where the second and third can sometimes be empty. The number of possible cases becomes exponentially higher when you add more sets.
Of course you can add the arguments in an array and the iterate over them to "accumulate" the difference, but it seems overcomplicated for something that could be handled gracefully by the lib
Unless there is a reason not to ?
Solution
When an empty argument is passed, it is ignored, and the reference (leftmost argument) interval is returned. And if the left most argument is empty, then return it.
The text was updated successfully, but these errors were encountered: