Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.merge fails if columns of only one side are hierarchical (even if index is equal) #2024

gerigk opened this issue Oct 5, 2012 · 2 comments


Copy link

commented Oct 5, 2012

import pandas as pd
import numpy as np
df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
new_df = df.groupby(['a']).agg({'b': [np.mean, np.sum]})
other_df = df = pd.DataFrame([(1,2,3), (7,10,6)], columns = ['a','b','d'])
other_df.set_index('a', inplace=True)
print new_df
print other_df
pd.merge(new_df, other_df, left_index=True, right_index=True)

TypeError                                 Traceback (most recent call last)
<ipython-input-11-4c9da13e85ff> in <module>()
      7 print new_df
      8 print other_df
----> 9 pd.merge(new_df, other_df, left_index=True, right_index=True)

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     31                          right_index=right_index, sort=sort, suffixes=suffixes,
     32                          copy=copy)
---> 33     return op.get_result()
     34 if __debug__: merge.__doc__ = _merge_doc % '\nleft : DataFrame'

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/tools/merge.pyc in get_result(self)
    182         # this is a bit kludgy
--> 183         ldata, rdata = self._get_merge_data()
    185         # TODO: more efficiently handle group keys to avoid extra consolidation!

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/tools/merge.pyc in _get_merge_data(self)
    271         lsuf, rsuf = self.suffixes
    272         ldata, rdata = ldata._maybe_rename_join(rdata, lsuf, rsuf,
--> 273                                                 copydata=False)
    274         return ldata, rdata

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in _maybe_rename_join(self, other, lsuffix, rsuffix, copydata)
   1116     def _maybe_rename_join(self, other, lsuffix, rsuffix, copydata=True):
-> 1117         to_rename = self.items.intersection(other.items)
   1118         if len(to_rename) > 0:
   1119             if not lsuffix and not rsuffix:

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/index.pyc in intersection(self, other)
   2270         Index
   2271         """
-> 2272         self._assert_can_do_setop(other)
   2274         if self.equals(other):

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/index.pyc in _assert_can_do_setop(self, other)
   2319             if len(other) == 0:
   2320                 return True
-> 2321             raise TypeError('can only call with other hierarchical '
   2322                             'index objects')

TypeError: can only call with other hierarchical index objects

   mean  sum
1     2    2
4     5    5
    b  d
1   2  3
7  10  6

also this fails

pd.merge(new_df.reset_index(), other_df.reset_index(), left_index=False, right_index=False, left_on =('a', ''), right_on ='a' )

and this

pd.merge(new_df.reset_index(), other_df.reset_index(), left_index=False, right_index=False, left_on =[('a', '')], right_on =['a'] )

This comment has been minimized.

Copy link

commented Oct 5, 2012

it it possible somehow to use the agg({'a': [f1, f1], 'b': [f3, f4]}) syntax without creating the hierarchical columns?
of course, I could reset the index after doing this step to something like column-name_function-name and then I could easily join with the other dataframe but this feels wrong.

@wesm wesm closed this in 96545d0 Nov 28, 2012


This comment has been minimized.

Copy link

commented Nov 28, 2012

Made joins work. hierarchical levels become tuples

In [2]: pd.merge(new_df, other_df, left_index=True, right_index=True)
   (b, mean)  (b, sum)  b  d
1          2         2  2  3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.