Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a.merge(b, left_on='name', right_index=True) merges on index whereas a.merge(b, on='name') where name is the index of b merges on column #32067

Open
arobrien opened this issue Feb 17, 2020 · 4 comments
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@arobrien
Copy link

Code Sample

a = pd.DataFrame({'a':[10,11,15,21],'b':[7,6,4,2]}).set_index('a')
b = pd.DataFrame({'b':[7,6,4,2],'c':['a','b','c','d']}).set_index('b')
a.merge(b, on='b')

Problem description

If I want to merge two DataFrame's, where column 'name' of a matches index named 'name' on b, with the old syntax I needed to write a.merge(b, left_on='name', right_index=True). This would produce a new DataFrame with the index of a retained as an index, the shared column/index retained as a column, and other columns of a and b also retained (as shown in expected output).

With recent versions, however, I could just use a.merge(b, on='name'), but it drops the index of a (merging on column mode instead of merging on index).

The documentation states (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html):

The join is done on columns or indexes. If joining columns on columns, the
DataFrame indexes will be ignored. Otherwise if joining indexes on indexes
or indexes on a column or columns, the index will be passed on.

Perhaps this could be clarified in the doc, but the behaviour is at least inconsistent. For me it is more intuitive to perform an index merge internally in this case.

Expected Output

b c
a
10 7 a
11 6 b
15 4 c
21 2 d

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : 2.4.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@arobrien
Copy link
Author

This example also merges on index:

a = pd.DataFrame({'a':[10,11,15,21],'b':[7,6,4,2]}).set_index('a')
b = pd.DataFrame({'a':[10,11,15,21],'c':['a','b','c','d']}).set_index('a')
a.merge(b, on='a')

@MarcoGorelli MarcoGorelli added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Feb 18, 2020
@MarcoGorelli
Copy link
Member

Thanks @arobrien for the report

Looking into this - in pandas/core/reshape/merge.py, when we get here

            (left_indexer, right_indexer) = self._get_join_indexers()

            if self.right_index:
                if len(self.left) > 0:
                    join_index = self._create_join_index(
                        self.left.index,
                        self.right.index,
                        left_indexer,
                        right_indexer,
                        how="right",
                    )
                else:
                    join_index = self.right.index.take(right_indexer)
                    left_indexer = np.array([-1] * len(join_index))
            elif self.left_index:
                if len(self.right) > 0:
                    join_index = self._create_join_index(
                        self.right.index,
                        self.left.index,
                        right_indexer,
                        left_indexer,
                        how="left",
                    )
                else:
                    join_index = self.left.index.take(left_indexer)
                    right_indexer = np.array([-1] * len(join_index))
            else:
                join_index = Index(np.arange(len(left_indexer)))

we have self.right_index is False, and so the join index is set to Index(np.arange(len(left_indexer)))

@phofl
Copy link
Member

phofl commented May 28, 2020

@MarcoGorelli Is this the desired behavior? I would have expected the same as @arobrien

@MarcoGorelli
Copy link
Member

@phofl I don't think it's expected, IIRC I'd left my comment above because I thought it might be useful to anyone interested in debugging this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants