Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in FiniteDateRange.is_disjoint (and union as a result) #99

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

josh-lynch
Copy link
Contributor

@josh-lynch josh-lynch commented Oct 10, 2023

Primarily this change is aimed at changing the behaviour of the
FiniteDateRange.union. Previously FiniteDateRanges would not make
a union with adjacent ranges, despite their inclusive boundaries
meaning the presence of an adjacent range means they cover a
fully contiguous period of time.

This change makes adjacent FiniteDateRanges unionize by no longer
considering ranges on adjacent days disjoint.

@josh-lynch josh-lynch changed the base branch from main to ranges/bugfix-days October 10, 2023 16:22
@josh-lynch josh-lynch changed the title ranges/bugfix union Fix bug in FiniteDateRange.union Oct 10, 2023
@josh-lynch josh-lynch force-pushed the ranges/bugfix-union branch 7 times, most recently from 495d8a1 to cdf8acf Compare October 12, 2023 11:27
@josh-lynch josh-lynch changed the title Fix bug in FiniteDateRange.union Fix bug in FiniteDateRange.is_disjoint (and union as a result) Oct 12, 2023
@josh-lynch
Copy link
Contributor Author

I also can't figure out the typing complaints here and they are driving me insane. Insights welcome.

@josh-lynch josh-lynch requested a review from a team October 12, 2023 11:36
@pydanny
Copy link
Contributor

pydanny commented Oct 12, 2023

I also can't figure out the typing complaints here and they are driving me insane. Insights welcome.

What about being more explicit?

    def is_disjoint(self, other: Range[datetime.date]) -> bool:
        # Adjacent dates should not be considered disjoint, we extend the other
        # range to allow them to be considered adjacent.
        other_start = other.start
        if other._is_right_inclusive:
			if other_start is not None:  # replaces assertion
	            other_start -= datetime.timedelta(days=1)
        other_end = other.end
        if other._is_right_inclusive:
			if other_end is not None:  # replaces assertion
	            other_end += datetime.timedelta(days=1)
        return super().is_disjoint(
            Range(start=other_start, end=other_end, boundaries=other.boundaries)
        )

@delfick
Copy link
Contributor

delfick commented Oct 13, 2023

I also can't figure out the typing complaints here and they are driving me insane. Insights welcome.

var2 = var1.blah
assert var1.blah is not None

When you do this, mypy still doesn't know if var2 is not None or not because var1.blah might be a different value to var2.

so when you do

        other_start = other.start
        if other._is_right_inclusive:
            assert other.start
            other_start -= datetime.timedelta(days=1)

what you actually want is

        other_start = other.start
        if other._is_right_inclusive:
            assert other_start is not None
            other_start -= datetime.timedelta(days=1)

(I have not tested this theory, but I imagine that's what it is)

@frederizia
Copy link

what you actually want is

        other_start = other.start
        if other._is_right_inclusive:
            assert other_start is not None
            other_start -= datetime.timedelta(days=1)

(I have not tested this theory, but I imagine that's what it is)

Yes was just coming here to say this (I have tested it and that sorts it). Other option is doing other_start = other.start - datetime.timedelta(days=1) but that's not as nice.

other_start = other.start
if other._is_right_inclusive:
assert other.start
other_start -= datetime.timedelta(days=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐮 this bug is wider than dates - it also affects any Range where the values are discrete. For example: integers.

I think it could also potentially affect other boundary conditions than inclusive-inclusive, although I guess we'd have to go through and figure it out.

I wonder if a wider solution would be some kind of increment or resolution field? Either directly on the Range itself, or coming from a mapping RANGE_TYPE_RESOLUTIONS = {datetime.Date: datetime.timedelta(days=1), int: 1, ...}.

(I don't want my comment to block this PR. I'm sure date ranges are our biggest problem in Kraken and your fix as it stands would be a very useful win.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this might be a wider problem, and some kind of resolution might be a good idea (although since ranges can be for any orderable type, it might not always work).

I agree that we shouldn't block here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider the integer problem here, but I think it's a bit different because if you consider float values then there are plenty of values that exist between 2 and 3.

end=datetime.date(2000, 1, 4),
)

def test_handles_ranges_out_of_order(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐼 I don't follow - isn't this identical to the 2nd case for the test above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I meant to delete this in a test refactor, my bad.

union = range | other
assert union is None

@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐼 this test is complex enough that it should probably have ids on its cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

Copy link
Contributor

@GlennS GlennS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a 👍 from me - this has been bugging me for a while.

Copy link

@leamingrad leamingrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sensible to me, but I think it might be bugged in its current form so am requesting changes.

xocto/ranges.py Outdated
# range to allow them to be considered adjacent.
other_start = other.start
if other._is_right_inclusive:
assert other.start

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Why does this assertion hold? I would expect (None, '2022-01-01'] to be a valid range.

Maybe this should be ._is_left_inclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 You are right, this is a bug, I will review my tests to see why they weren't failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I was testing this behaviour via union it turns out this method wasn't run in the problematic case, because the union method orders the ranges first and then was calling the is_disjoint method on the other range instead.

I have addressed now with additional tests specifically for the is_disjoint method.

other_start = other.start
if other._is_right_inclusive:
assert other.start
other_start -= datetime.timedelta(days=1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this might be a wider problem, and some kind of resolution might be a good idea (although since ranges can be for any orderable type, it might not always work).

I agree that we shouldn't block here though.

@josh-lynch
Copy link
Contributor Author

Ahh @delfick thanks that makes total sense 👌

Primarily this change is aimed at changing the behaviour of the
FiniteDateRange.union. Previously FiniteDateRanges would not make
a union with adjacent ranges, despite their inclusive boundaries
meaning the presence of an adjacent range means they cover a
fully contiguous period of time.

This change makes adjacent FiniteDateRanges unionize by no longer
considering ranges on adjacent days disjoint.
Base automatically changed from ranges/bugfix-days to main October 17, 2023 12:20
Copy link

@leamingrad leamingrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@josh-lynch josh-lynch merged commit 97a54c4 into main Oct 17, 2023
1 check passed
@josh-lynch josh-lynch deleted the ranges/bugfix-union branch October 17, 2023 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants