Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: merge_asof with interpolation #22410

Open
JoaoAparicio opened this issue Aug 18, 2018 · 4 comments
Open

Feature Request: merge_asof with interpolation #22410

JoaoAparicio opened this issue Aug 18, 2018 · 4 comments
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@JoaoAparicio
Copy link

JoaoAparicio commented Aug 18, 2018

merge_asof with interpolation would be a new feature. When you merge df1 and df2, the merged columns of df2 would not be the exact "asof" values as in df2, but the interpolated values (for example in a linearly time-weighted fashion).

Define

np.random.seed(0)
start = pd.Timestamp("2018-07-23 09:00:00")
df1 = pd.DataFrame(np.random.normal(size=5), index=pd.date_range(start, periods=5, freq='S'), columns=['something'])

df2 = pd.DataFrame(np.random.normal(size=5), index=pd.date_range(start, periods=5, freq='S')+datetime.timedelta(seconds=0.5), columns=['something_else'])
df2 = df2.iloc[:2].append(df2.iloc[3:])

df1:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
something
2018-07-23 09:00:001.764052
2018-07-23 09:00:010.400157
2018-07-23 09:00:020.978738
2018-07-23 09:00:032.240893
2018-07-23 09:00:041.867558

df2:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
something_else
2018-07-23 09:00:00.500-0.977278
2018-07-23 09:00:01.5000.950088
2018-07-23 09:00:03.500-0.103219
2018-07-23 09:00:04.5000.410599

This is how merge_asof works:

pd.merge_asof(df1, df2, left_index=True, right_index=True)

Returns:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
somethingsomething_else
2018-07-23 09:00:001.764052NaN
2018-07-23 09:00:010.400157-0.977278
2018-07-23 09:00:020.9787380.950088
2018-07-23 09:00:032.2408930.950088
2018-07-23 09:00:041.867558-0.103219

And so

pd.merge_asof(df1, df2, left_index=True, right_index=True, method="linear")

would return

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
somethingsomething_else
2018-07-23 09:00:001.764052NaN
2018-07-23 09:00:010.400157-0.013595
2018-07-23 09:00:020.9787380.598986
2018-07-23 09:00:032.2408930.247884
2018-07-23 09:00:041.8675580.153690

method="last" could reproduce default behaviour.

This could also be reproduced by doing these operations manually:

pd.merge(df1, df2, left_index=True, right_index=True, how="outer").interpolate("linear").merge(df1, left_index=True, right_index=True, on="something", how="right")
@WillAyd
Copy link
Member

WillAyd commented Aug 18, 2018

Thanks! I could see some utility here. Not sure how complicated the implementation would be but if you feel up for it PRs are always welcome!

@WillAyd WillAyd added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Resample resample method and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 18, 2018
@JoaoAparicio
Copy link
Author

JoaoAparicio commented Aug 19, 2018

I just had a brief look, and it seems that merge_asof uses _AsOfMerge which in turns uses _OrderedMerge. The latter already has the fill_method argument (see here) but only ffill and None values are implemented. It seems that the most straightforward way to implement interpolation would be to do it for _OrderedMerge. Pros of doing it this way is we would also get interpolation for merge_ordered which currently only has ffill. Cons is that this small change would affect multiple functions. What do you think?

Also, hypothetically, what would this entail exactly? Do the change above, plus write tests, plus write the docs for merge_asof, merge_ordered, and all affected functions. Anything else?

@WillAyd
Copy link
Member

WillAyd commented Aug 20, 2018

Not intimately familiar with the implementation so can’t advise on the former off the top of my head. Probably better served if you just submit a PR for review.

As far a what that entails you have it covered and along with a what’s new note for 0.25

@jorikdima
Copy link

I would also like to have this feature.

@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Resample resample method labels Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants