Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support Table.asof_join #1162

Closed
wants to merge 5 commits into from

Conversation

toryhaavik
Copy link
Contributor

closes #1118 with a Pandas implementation

@cpcloud cpcloud self-requested a review October 6, 2017 20:22
@cpcloud cpcloud self-assigned this Oct 6, 2017
@cpcloud cpcloud added feature Features or general enhancements pandas The pandas backend labels Oct 6, 2017
@cpcloud cpcloud modified the milestones: 0.12, 0.11.3 Oct 6, 2017
closes ibis-project#1118 with a Pandas implementation
@toryhaavik
Copy link
Contributor Author

I'm realizing merge_asof may not be available in the versions of pandas for 27 and 34.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2017

it’s available in 2.7

3.4 is not supported in pandas 0.21 but not out yet

it should be available in 0.19.2 or after

@toryhaavik
Copy link
Contributor Author

Odd. The 2.7 build fails with AttributeError: 'module' object has no attribute 'merge_asof' and uses pandas==0.18.1. The docs don't have any reference to merge_asof in 0.18.1

the 3.4 build fails with TypeError: merge_asof() got an unexpected keyword argument 'left_by' and uses pandas==0.19.0. Seems left_by was added after 0.19.0

@@ -50,6 +50,32 @@ def test_join(how, left, right, df1, df2):
tm.assert_frame_equal(result[expected.columns], expected)


def test_asof_join(time_left, time_right, time_df1, time_df2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just skip these tests on the versions of pandas we test that don't have merge_asof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this?

merge_asof_minversion = pytest.mark.skipif(
    pd.__version__ < '0.19.2',
    reason="at least pandas-0.19.2 required for merge_asof")

@merge_asof_minversion
def test_asof_join...

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments.

merge_asof_args['left_by'] = left_by
merge_asof_args['right_by'] = right_by

return pd.merge_asof(**merge_asof_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to pass the args in directly? I think it makes it a little more difficult to debug by adding an extra dictionary to look through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what the behavior would be if you pass left_by=[] and right_by=[]. I imagine it's a no-op, but I was trying to be safe. Didn't think about the debug trade off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a quick test tells me that passing empty lists is not the same as passing None. So the choice is either the dict or something like this that gets passed in as args. I'm open to either.

left_by = left_by if left_by else None
right_by = right_by if right_by else None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it without the dict, it looks much better. I'll push up the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to do something like:

pd.merge_asof(..., left_by=left_by or None, right_by=right_by or None)


def _validate_columns(orig_columns, *key_lists):
all_keys = set([item for sublist in key_lists for item in sublist])
overlapping_columns = orig_columns.difference(all_keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using difference you can actually pass the generator in directly:

foo.difference(x for x in ...)

@cpcloud
Copy link
Member

cpcloud commented Oct 10, 2017

LGTM, merging.

@cpcloud cpcloud closed this in 91a044c Oct 10, 2017
@cpcloud
Copy link
Member

cpcloud commented Oct 10, 2017

Thanks @toryhaavik!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge_asof for Ibis Tables
3 participants