-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-interval set operators #179
Conversation
Closing/re-opening to trigger CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite have time for a full review. I do suspect the half-open concept doesn't need to be a special type and could just be a simple ishalfopen
function a call. I'll try to dive deep into this tomorrow. I'll mention the tests for this package a pretty extensive so be sure to lean on them
src/endpoint.jl
Outdated
struct DirectionBound{T} end | ||
const LeftClosed = DirectionBound{:LeftClosed}() | ||
const RightClosed = DirectionBound{:RightClosed}() | ||
struct HalfOpenEndpoint{T, B} <: AbstractEndpoint{T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intervals are half-open/closed and not endpoints of intervals. This seems to me like maybe you want is a HalfOpenInterval
type. I'll need to take a closer look into how this is used though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is as follows: during the process of "unbunching" intervals into endpoints, and then "bunching" them back into intervals, I want to preserve the fact that the original array has an all left-closed or all right-closed intervals. That's because
1.) said form leads to type stable arrays
2.) said form can ignore the checks for edge types in mergesets
(hence the track_endpoint
flag).
Maybe there's a better name for this. I did strugle to a way to describe what it is.
I guess another approach could be to have a flag and/or special container that denotes the
"all closed in one direction" property, and use the existing Endpoint type in all cases. It's not obvious to me, at the moment, how to write that in a type stable way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the design as I see it is that endpoints are dealing with the left/right side of the interval without knowing anything about the other endpoint in the interval. I'll be able to speak better to alternatives once I read through the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me, at the moment, how to write that in a type stable way though.
I'd suggest putting code readability over type stability to start with as otherwise you can optimize yourself into a corner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the design as I see it is that endpoints are dealing with the left/right side of the interval without knowing anything about the other endpoint in the interval
My idea was to create a subtype of endpoint that is only used in those cases where you do know something about the other endpoints. I do also feel that there is probably a better design, combing back to this now (after some time away). I'll think it over some more and wait for whatever further comments you have.
src/interval.jl
Outdated
#=@show=# t = first_is_less(x, y) ? first(x) : first(y) | ||
#=@show=# x_isless = first_is_less(x, y) | ||
#=@show=# x_equal = first_is_equal(x, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call iterate
or Iterators.peel
and then do these checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes one of the iterables is empty; I find this form more readible than introducing the branches to check for nothing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the other point to make here is that we only want to peel the values when they are "first" (i.e. less than or equal to the value in the other list), so you don't want to peel before that.
I'm still working out a few bits with the larger unit test (outside the tests for sets). Might make sense to wait to review until that happens. I'll mark it as draft again and mark it ready for review when that's resolved. |
Thanks for letting me know. I'll try to keep an eye on this as I think some early course correction could save a bunch of work. |
96ce0e3
to
7732752
Compare
Codecov Report
@@ Coverage Diff @@
## master #179 +/- ##
==========================================
+ Coverage 81.73% 84.63% +2.90%
==========================================
Files 11 12 +1
Lines 624 794 +170
==========================================
+ Hits 510 672 +162
- Misses 114 122 +8
Continue to review full report at Codecov.
|
Looks like there are a few bits I need to add test coverage for, but it should be in good shape to review. |
@omus: Can you help me understand the remaining errors I'm getting in "Julia 1" (I think that is 1.7). It looks like there's something broken about printing that is unrelated to the changes I've made. In other news, I think this is otherwise ready for review. |
There was an unrelated CI failure for "Julia 1" so I restarted the CI jobs
Thanks for letting me know. I don't quite have time to review such a big PR this week but I'll make time next week. |
Thanks! I understand, it's a bit large. I don't think there's any great urgency on my end; I'm making use of the branch directly where I need it in some analyses |
Just a small bump here: just wanted to check in about when a review would be possible and/or if I should do something to make reviewing this easier (break it up?). |
@haberdashPI breaking up a PR would make it easier to review but it may not be necessary. I've been rather swamped lately but I did have this on the agenda for next week. Sorry about the delay on review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't quite finished reviewing yet but I can already see this code needs additional iteration. In it's current state it'll be rather difficult to maintain. I'll do more review soon
src/endpoint.jl
Outdated
struct DirectionBound{T} end | ||
const LeftClosed = DirectionBound{:LeftClosed}() | ||
const RightClosed = DirectionBound{:RightClosed}() | ||
struct HalfOpenEndpoint{T, B} <: AbstractEndpoint{T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me, at the moment, how to write that in a type stable way though.
I'd suggest putting code readability over type stability to start with as otherwise you can optimize yourself into a corner
6115225
to
28cea42
Compare
Alright! I think that addresses the outstanding issues. The one open question is whether you buy my arguments above @omus for switching some of the tests to use |
Which tests assume this? This would definitely be incorrect. The code at the moment only should combine these intervals if you do a union of
You are definitely correct in I made a mistake for this case. I'll review the other corrections |
My bad: I meant something different
|
Thanks for the clarification. I agree with you that both answers are valid but I think for our implementation we should ideally return an interval set where all intervals in the returned vector are disjoint. In the future I can see us making an As the current state of this PR returns the ideal interval set I'm inclined to update the comparison tests to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last changes
Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
Alright, placed some unions on the appropriate expected_xor's and that seems to have cleared up the remaining failed tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @haberdashPI for sticking with this PR. It definitely took some time and thought to get here but I think we're now in a good place with the functional interface. I do think there are some internal improvements to be made here yet but now we're in a good place to make those changes without having to worry about deprecations.
I'll wait until Monday to squash and merge this PR. This should allow anyone who was waiting for the dust to settle on this PR the chance to do a final review before we merge this and release version 1.7. |
Agreed! I have had a few thoughts in that respect, that I've been holding back on until this is merged. |
Registration PR: JuliaRegistries/General#61415 |
NOTE: This release actively breaks our code base which relies heavily on the current |
To clarify the specific change that's breaking is that you could previously perform a normal
and this would behave as expected Now when you make the same call you get
To clarify, the latter is probably more correct, but since we don't distinguish an Assuming we don't want to dedicate a type for it then I think we should make these separate function |
Hi @rofinn, First off, sounds like this was pretty disruptive; so sorry for that. A few thoughts on how to move forward:
I think this is not quite right. You can pass the objects as julia> intersect(Set([1..2, 2..3, 3..4, 4..5]), Set([2..3, 3..4]))
Set{Interval{Int64, Closed, Closed}} with 2 elements:
Interval{Int64, Closed, Closed}(2, 3)
Interval{Int64, Closed, Closed}(3, 4) So perhaps, in concept, what is needed here is a deprecation like this? @deprecate intersect(a::AbstractArray, b::AbstractArray) intersect(Set(a), Set(b)) I guess a potential issue there would be that there might be a performance hit for first converting to In the meantime, perhaps the new behavior can be made available by creating a simple version of the |
Yeah, I think at minimum we should either define a minimal |
Also, mostly as a note to myself: would be good to add some tests in here for your use case as well. |
Very reasonable. I apologize for missing this breaking behaviour change. Sounds like we have some additional tests that need to be added to the Intervals.jl test to ensure we don't break expected this behaviour again. Is there a better way to communicate when large changes like this are going to be released? Seems like the typical people-who-care-are-watching isn't working |
I'm in favour of making a minimal |
Yeah, this is a bit of a weird edge case where we'd need to test for a behaviour that wasn't defined here. I've become rather skittish of extending functions if there's concern that the new method doesn't align with the original concept/definition in Base.
Yeah, I don't think anyone was ever assigned to keep an eye on this repo after you left. I guess for now it'd probably be best to just assign myself or @fchorney to larger reviews for now as we're the next biggest contributors. I'll raise this maintenance concern more broadly to see if there's a more sustainable option.
I was hoping that namespacing would be the simpler solution, but given how many things we already extend in base I think you're probably correct. Moving things from the Base to Intervals namespaces would likely be more breaking. |
This defines a set of functions for easily computing intersections, unions etc... of a set of intervals (passed as an array).
cc: @omus, @ericphanson, @kleinschmidt
This is based off of a closed PR to TimeSpans (beacon-biosignals/TimeSpans.jl#11)