New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support Table.asof_join #1162

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@toryhaavik
Contributor

toryhaavik commented Oct 6, 2017

closes #1118 with a Pandas implementation

@cpcloud cpcloud self-requested a review Oct 6, 2017

@cpcloud cpcloud self-assigned this Oct 6, 2017

@cpcloud cpcloud modified the milestones: 0.12, 0.11.3 Oct 6, 2017

@toryhaavik toryhaavik force-pushed the toryhaavik:asof_join branch from 44e8737 to 6bf5f4d Oct 6, 2017

ENH: Support Table.asof_join
closes #1118 with a Pandas implementation

@toryhaavik toryhaavik force-pushed the toryhaavik:asof_join branch from 6bf5f4d to bbba5d8 Oct 9, 2017

@toryhaavik

This comment has been minimized.

Contributor

toryhaavik commented Oct 9, 2017

I'm realizing merge_asof may not be available in the versions of pandas for 27 and 34.

@jreback

This comment has been minimized.

Contributor

jreback commented Oct 9, 2017

it’s available in 2.7

3.4 is not supported in pandas 0.21 but not out yet

it should be available in 0.19.2 or after

@toryhaavik

This comment has been minimized.

Contributor

toryhaavik commented Oct 9, 2017

Odd. The 2.7 build fails with AttributeError: 'module' object has no attribute 'merge_asof' and uses pandas==0.18.1. The docs don't have any reference to merge_asof in 0.18.1

the 3.4 build fails with TypeError: merge_asof() got an unexpected keyword argument 'left_by' and uses pandas==0.19.0. Seems left_by was added after 0.19.0

@@ -50,6 +50,32 @@ def test_join(how, left, right, df1, df2):
tm.assert_frame_equal(result[expected.columns], expected)
def test_asof_join(time_left, time_right, time_df1, time_df2):

This comment has been minimized.

@cpcloud

cpcloud Oct 9, 2017

Member

I would just skip these tests on the versions of pandas we test that don't have merge_asof.

This comment has been minimized.

@toryhaavik

toryhaavik Oct 9, 2017

Contributor

something like this?

merge_asof_minversion = pytest.mark.skipif(
    pd.__version__ < '0.19.2',
    reason="at least pandas-0.19.2 required for merge_asof")

@merge_asof_minversion
def test_asof_join...

toryhaavik added some commits Oct 9, 2017

@cpcloud

LGTM, minor comments.

merge_asof_args['left_by'] = left_by
merge_asof_args['right_by'] = right_by
return pd.merge_asof(**merge_asof_args)

This comment has been minimized.

@cpcloud

cpcloud Oct 9, 2017

Member

Any reason not to pass the args in directly? I think it makes it a little more difficult to debug by adding an extra dictionary to look through.

This comment has been minimized.

@toryhaavik

toryhaavik Oct 9, 2017

Contributor

I wasn't sure what the behavior would be if you pass left_by=[] and right_by=[]. I imagine it's a no-op, but I was trying to be safe. Didn't think about the debug trade off.

This comment has been minimized.

@toryhaavik

toryhaavik Oct 9, 2017

Contributor

Yeah, a quick test tells me that passing empty lists is not the same as passing None. So the choice is either the dict or something like this that gets passed in as args. I'm open to either.

left_by = left_by if left_by else None
right_by = right_by if right_by else None

This comment has been minimized.

@toryhaavik

toryhaavik Oct 9, 2017

Contributor

Now that I look at it without the dict, it looks much better. I'll push up the change.

This comment has been minimized.

@cpcloud

cpcloud Oct 9, 2017

Member

probably want to do something like:

pd.merge_asof(..., left_by=left_by or None, right_by=right_by or None)
def _validate_columns(orig_columns, *key_lists):
all_keys = set([item for sublist in key_lists for item in sublist])
overlapping_columns = orig_columns.difference(all_keys)

This comment has been minimized.

@cpcloud

cpcloud Oct 9, 2017

Member

If you're using difference you can actually pass the generator in directly:

foo.difference(x for x in ...)

toryhaavik added some commits Oct 9, 2017

@cpcloud

This comment has been minimized.

Member

cpcloud commented Oct 10, 2017

LGTM, merging.

@cpcloud cpcloud closed this in 91a044c Oct 10, 2017

@cpcloud

This comment has been minimized.

Member

cpcloud commented Oct 10, 2017

Thanks @toryhaavik!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment