-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ir): fix window boundaries being forcefully casted #8400
fix(ir): fix window boundaries being forcefully casted #8400
Conversation
ibis/expr/builders.py
Outdated
| start = self._maybe_cast_boundary(start, dtype) | ||
| end = self._maybe_cast_boundary(end, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, does removing these lines not address the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing that, and that breaks the windowing logic for at least some backends. For example, if the window is something like -ibis.interval(seconds=20), 0 where one of the boundary is an interval and the other one is an int (most commonly in the case of 0), it ends up erroring out when executed on the backend (e.g. pandas). So essentially, instead of baking the casting logic into the IR we would move the casting logic out to backends where casting is necessary, which is doable but seems like it may be a more complex solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike introducing is_same_base_type() api just to overcome this issue. There are basically two kind of inputs are supported here, numeric and interval. Could we spell out all combinations instead, something like:
def _maybe_cast_boundaries(self, start, end):
if start.dtype.is_interval() and end.dtype.is_numeric():
return start, ops.Cast(end, start.dtype)
elif start.dtype.is_numeric() and end.dtype.is_interval():
return ops.Cast(start, end.dtype), end
else:
return start, endAlso just check for rlz.comparable(start, end) during the WindowFrame validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fair.
Also just check for
rlz.comparable(start, end)during theWindowFramevalidation.
I think this would yield the same result as above, when we were to remove the _maybe_cast_boundaries() logic entirely. At this point it should be required that window start and window end are both numeric or both intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, then we should validate for that.
82563ac
to
d607437
Compare
|
@cpcloud Thanks for the suggestion! I implemented the changes. |
37405f5
to
c7764e0
Compare
c7764e0
to
b9b20a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Chloe!
Description of changes
Fix issue #8368
Issues closed
Resolves #8368