BUG: merge_asof only merges by first column of 'by=' list #15676

Closed
seibs opened this Issue Mar 13, 2017 · 1 comment

Comments

Projects
None yet
3 participants

seibs commented Mar 13, 2017

Code Sample, a copy-pastable example if possible

left = pd.DataFrame([
    [pd.to_datetime('20160602'), 1, pd.to_datetime('20150430')],
    [pd.to_datetime('20160602'), 2, pd.to_datetime('20150430')],
    [pd.to_datetime('20160603'), 1, pd.to_datetime('20150501')],
    [pd.to_datetime('20160603'), 2, pd.to_datetime('20150501')],
], columns=['time', 'key_1', 'key_2']).set_index('time')

right = pd.DataFrame([
    [pd.to_datetime('20160502'), 1, pd.to_datetime('20150430'), 1.0],
    [pd.to_datetime('20160502'), 2, pd.to_datetime('20150430'), 2.0],
    [pd.to_datetime('20160503'), 1, pd.to_datetime('20150501'), 3.0],
    [pd.to_datetime('20160503'), 2, pd.to_datetime('20150501'), 4.0],
], columns=['time', 'key_1', 'key_2', 'value']).set_index('time')

expected = pd.DataFrame([
    [pd.to_datetime('20160602'), 1, pd.to_datetime('20150430'), 1.0],
    [pd.to_datetime('20160602'), 2, pd.to_datetime('20150430'), 2.0],
    [pd.to_datetime('20160603'), 1, pd.to_datetime('20150501'), 3.0],
    [pd.to_datetime('20160603'), 2, pd.to_datetime('20150501'), 4.0],
], columns=['time', 'key_1', 'key_2', 'value']).set_index('time')

result = pd.merge_asof(left,
                       right,
                       left_index=True,
                       right_index=True,
                       by=['key_1', 'key_2'],
                       tolerance=pd.Timedelta('31d'))

assert_frame_equal(expected, result)

Problem description

merge_asof does not seem to be working with multiple "by" keys. The output looks like it only takes into account the first key in the list - with key_1 first in the list, the output is the same as by=['key_1'] and with key_2 first in the list, the output is the same as by=['key_2'].

Expected Output

            key_1      key_2  value
time                               
2016-06-02      1 2015-04-30    1.0
2016-06-02      2 2015-04-30    2.0
2016-06-03      1 2015-05-01    3.0
2016-06-03      2 2015-05-01    4.0

Actual Output

            key_1      key_2  value
time                               
2016-06-02      1 2015-04-30    3.0
2016-06-02      2 2015-04-30    4.0
2016-06-03      1 2015-05-01    3.0
2016-06-03      2 2015-05-01    4.0

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.1.35-pv-ts2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.11.3 scipy: None statsmodels: None xarray: None IPython: 5.1.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.4 boto: None pandas_datareader: None
Contributor

chrisaycock commented Mar 13, 2017

Interestingly enough, I get the right answer if I .reset_index():

In [21]: pd.merge_asof(left, right, on='time', by=['key_1', 'key_2'])
Out[21]:
        time  key_1      key_2  value
0 2016-06-02      1 2015-04-30    1.0
1 2016-06-02      2 2015-04-30    2.0
2 2016-06-03      1 2015-05-01    3.0
3 2016-06-03      2 2015-05-01    4.0

I will have to dig into this further.

jreback added this to the Next Major Release milestone Mar 13, 2017

@jreback jreback modified the milestone: 0.20.0, Next Major Release Mar 14, 2017

jreback closed this in 2621b31 Mar 14, 2017

@AnkurDedania AnkurDedania added a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017

@AnkurDedania Christopher C. Aycock + AnkurDedania BUG: Allow multiple 'by' parameters in merge_asof() when DataFrames a…
…re indexed (#15676)


closes #15676

Author: Christopher C. Aycock <christopher.aycock@twosigma.com>

Closes #15679 from chrisaycock/GH15676 and squashes the following commits:

965caf2 [Christopher C. Aycock] Verify that 'by' parameters are the same length
4a2cc09 [Christopher C. Aycock] BUG: Allow multiple 'by' parameters in merge_asof() when DataFrames are indexed (#15676)
8f410a4

@mattip mattip added a commit to mattip/pandas that referenced this issue Apr 3, 2017

@mattip Christopher C. Aycock + mattip BUG: Allow multiple 'by' parameters in merge_asof() when DataFrames a…
…re indexed (#15676)


closes #15676

Author: Christopher C. Aycock <christopher.aycock@twosigma.com>

Closes #15679 from chrisaycock/GH15676 and squashes the following commits:

965caf2 [Christopher C. Aycock] Verify that 'by' parameters are the same length
4a2cc09 [Christopher C. Aycock] BUG: Allow multiple 'by' parameters in merge_asof() when DataFrames are indexed (#15676)
793e262
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment