Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ir): fix window boundaries being forcefully casted #8400

Merged
merged 3 commits into from
Feb 23, 2024

Conversation

chloeh13q
Copy link
Contributor

Description of changes

Fix issue #8368

Issues closed

Resolves #8368

Comment on lines 182 to 183
start = self._maybe_cast_boundary(start, dtype)
end = self._maybe_cast_boundary(end, dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, does removing these lines not address the problem?

Copy link
Contributor Author

@chloeh13q chloeh13q Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing that, and that breaks the windowing logic for at least some backends. For example, if the window is something like -ibis.interval(seconds=20), 0 where one of the boundary is an interval and the other one is an int (most commonly in the case of 0), it ends up erroring out when executed on the backend (e.g. pandas). So essentially, instead of baking the casting logic into the IR we would move the casting logic out to backends where casting is necessary, which is doable but seems like it may be a more complex solution

Copy link
Member

@kszucs kszucs Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike introducing is_same_base_type() api just to overcome this issue. There are basically two kind of inputs are supported here, numeric and interval. Could we spell out all combinations instead, something like:

def _maybe_cast_boundaries(self, start, end):
    if start.dtype.is_interval() and end.dtype.is_numeric():
        return start, ops.Cast(end, start.dtype)
    elif start.dtype.is_numeric() and end.dtype.is_interval():
        return ops.Cast(start, end.dtype), end
    else:
        return start, end

Also just check for rlz.comparable(start, end) during the WindowFrame validation.

Copy link
Contributor Author

@chloeh13q chloeh13q Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fair.

Also just check for rlz.comparable(start, end) during the WindowFrame validation.

I think this would yield the same result as above, when we were to remove the _maybe_cast_boundaries() logic entirely. At this point it should be required that window start and window end are both numeric or both intervals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, then we should validate for that.

@chloeh13q chloeh13q marked this pull request as ready for review February 21, 2024 00:35
@chloeh13q
Copy link
Contributor Author

@cpcloud Thanks for the suggestion! I implemented the changes.

ibis/expr/builders.py Outdated Show resolved Hide resolved
@kszucs kszucs changed the title fix: fix window boundaries being forcefully casted fix(ir): fix window boundaries being forcefully casted Feb 22, 2024
Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Chloe!

@cpcloud cpcloud added this to the 9.0 milestone Feb 23, 2024
@cpcloud cpcloud added the bug Incorrect behavior inside of ibis label Feb 23, 2024
@cpcloud cpcloud merged commit 09b6ada into ibis-project:main Feb 23, 2024
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: when the start and end of window boundaries span across data types, ibis forces casting
3 participants