-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support Table.asof_join #1162
Conversation
closes ibis-project#1118 with a Pandas implementation
I'm realizing merge_asof may not be available in the versions of pandas for 27 and 34. |
it’s available in 2.7 3.4 is not supported in pandas 0.21 but not out yet it should be available in 0.19.2 or after |
Odd. The 2.7 build fails with the 3.4 build fails with |
ibis/pandas/tests/test_operations.py
Outdated
@@ -50,6 +50,32 @@ def test_join(how, left, right, df1, df2): | |||
tm.assert_frame_equal(result[expected.columns], expected) | |||
|
|||
|
|||
def test_asof_join(time_left, time_right, time_df1, time_df2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just skip these tests on the versions of pandas we test that don't have merge_asof
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this?
merge_asof_minversion = pytest.mark.skipif(
pd.__version__ < '0.19.2',
reason="at least pandas-0.19.2 required for merge_asof")
@merge_asof_minversion
def test_asof_join...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor comments.
ibis/pandas/execution.py
Outdated
merge_asof_args['left_by'] = left_by | ||
merge_asof_args['right_by'] = right_by | ||
|
||
return pd.merge_asof(**merge_asof_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to pass the args in directly? I think it makes it a little more difficult to debug by adding an extra dictionary to look through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what the behavior would be if you pass left_by=[]
and right_by=[]
. I imagine it's a no-op, but I was trying to be safe. Didn't think about the debug trade off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a quick test tells me that passing empty lists is not the same as passing None. So the choice is either the dict or something like this that gets passed in as args. I'm open to either.
left_by = left_by if left_by else None
right_by = right_by if right_by else None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look at it without the dict, it looks much better. I'll push up the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to do something like:
pd.merge_asof(..., left_by=left_by or None, right_by=right_by or None)
ibis/pandas/execution.py
Outdated
|
||
def _validate_columns(orig_columns, *key_lists): | ||
all_keys = set([item for sublist in key_lists for item in sublist]) | ||
overlapping_columns = orig_columns.difference(all_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using difference
you can actually pass the generator in directly:
foo.difference(x for x in ...)
LGTM, merging. |
Thanks @toryhaavik! |
closes #1118 with a Pandas implementation