Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/ERR: better error message on no common merge keys #19391

Closed
swyoon opened this issue Jan 25, 2018 · 4 comments · Fixed by #19427
Closed

DOC/ERR: better error message on no common merge keys #19391

swyoon opened this issue Jan 25, 2018 · 4 comments · Fixed by #19427
Labels
Docs Error Reporting Incorrect or improved errors from pandas good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@swyoon
Copy link
Contributor

swyoon commented Jan 25, 2018

Currently, merging two DataFrames on indices is not a default option. When the keyword argument on is not specified, MergeError is occurred.

Code Sample and Problem Statement

In [1]: import pandas as pd
In [2]: a = pd.DataFrame({'a':[1,2]})
In [3]: b = pd.DataFrame({'b':[10,20]})
In [4]: a.merge(b)
---------------------------------------------------------------------------
MergeError                                Traceback (most recent call last)
<ipython-input-6-fbb650d95f99> in <module>()
----> 1 a.merge(b)

/home/swyoon/env/swyoon/local/lib/python2.7/site-packages/pandas/core/frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   5368                      right_on=right_on, left_index=left_index,
   5369                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 5370                      copy=copy, indicator=indicator, validate=validate)
   5371
   5372     def round(self, decimals=0, *args, **kwargs):

/home/swyoon/env/swyoon/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     55                          right_index=right_index, sort=sort, suffixes=suffixes,
     56                          copy=copy, indicator=indicator,
---> 57                          validate=validate)
     58     return op.get_result()
     59

/home/swyoon/env/swyoon/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    558             warnings.warn(msg, UserWarning)
    559
--> 560         self._validate_specification()
    561
    562         # note this function has side effects

/home/swyoon/env/swyoon/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in _validate_specification(self)
    951                     self.right.columns)
    952                 if len(common_cols) == 0:
--> 953                     raise MergeError('No common columns to perform merge on')
    954                 if not common_cols.is_unique:
    955                     raise MergeError("Data columns not unique: {common!r}"

MergeError: No common columns to perform merge on

For a successful merge, we need to specify rather verbose keyword arguments left_index=True, right_index=True.

In [5]: a.merge(b, left_index=True, right_index=True)
Out[5]:
   a   b
0  1  10
1  2  20

Expected Output

What if we could do

In [1]: import pandas as pd
In [2]: a = pd.DataFrame({'a':[1,2]})
In [3]: b = pd.DataFrame({'b':[10,20]})
In [4]: a.merge(b)
Out[4]:
   a   b
0  1  10
1  2  20

A briefer syntax for merge on index could facilitate the usability.

Output of pd.show_versions()

In [5]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-42-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.22.0
pytest: 3.1.3
pip: 9.0.1
setuptools: 38.2.4
Cython: None
numpy: 1.14.0
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.15
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
@jschendel
Copy link
Member

DataFrame.join merges on index by default, and provides the functionality you're looking for:

In [1]: import pandas as pd

In [2]: a = pd.DataFrame({'a':[1,2]})

In [3]: b = pd.DataFrame({'b':[10,20]})

In [4]: a.join(b)
Out[4]:
   a   b
0  1  10
1  2  20

See the joining on index section of the documentation for additional details.

Currently the API reference for DataFrame.merge does not mention DataFrame.join, so perhaps this could be made a bit more explicit there.

@jreback
Copy link
Contributor

jreback commented Jan 25, 2018

so would take a PR to update the doc-strings to reflect this.

@jreback
Copy link
Contributor

jreback commented Jan 25, 2018

also, we could update the error message to show the options that were passed, e.g. left_index, right_index, left_on, right_on

@jreback jreback added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode Error Reporting Incorrect or improved errors from pandas Effort Low good first issue labels Jan 25, 2018
@jreback jreback added this to the Next Major Release milestone Jan 25, 2018
@jreback jreback changed the title merge on index is not default for DataFrame.merge DOC/ERR: better error message on no common merge keys Jan 25, 2018
@swyoon
Copy link
Contributor Author

swyoon commented Jan 25, 2018

great. will work on it shortly.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Error Reporting Incorrect or improved errors from pandas good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants