Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.merge fails if columns of only one side are hierarchical (even if index is equal) #2024

Closed
gerigk opened this issue Oct 5, 2012 · 2 comments

Comments

@gerigk
Copy link

commented Oct 5, 2012

import pandas as pd
import numpy as np
df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
new_df = df.groupby(['a']).agg({'b': [np.mean, np.sum]})
other_df = df = pd.DataFrame([(1,2,3), (7,10,6)], columns = ['a','b','d'])
other_df.set_index('a', inplace=True)
print new_df
print other_df
pd.merge(new_df, other_df, left_index=True, right_index=True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-4c9da13e85ff> in <module>()
      7 print new_df
      8 print other_df
----> 9 pd.merge(new_df, other_df, left_index=True, right_index=True)

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     31                          right_index=right_index, sort=sort, suffixes=suffixes,
     32                          copy=copy)
---> 33     return op.get_result()
     34 if __debug__: merge.__doc__ = _merge_doc % '\nleft : DataFrame'
     35 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/tools/merge.pyc in get_result(self)
    181 
    182         # this is a bit kludgy
--> 183         ldata, rdata = self._get_merge_data()
    184 
    185         # TODO: more efficiently handle group keys to avoid extra consolidation!

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/tools/merge.pyc in _get_merge_data(self)
    271         lsuf, rsuf = self.suffixes
    272         ldata, rdata = ldata._maybe_rename_join(rdata, lsuf, rsuf,
--> 273                                                 copydata=False)
    274         return ldata, rdata
    275 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in _maybe_rename_join(self, other, lsuffix, rsuffix, copydata)
   1115 
   1116     def _maybe_rename_join(self, other, lsuffix, rsuffix, copydata=True):
-> 1117         to_rename = self.items.intersection(other.items)
   1118         if len(to_rename) > 0:
   1119             if not lsuffix and not rsuffix:

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/index.pyc in intersection(self, other)
   2270         Index
   2271         """
-> 2272         self._assert_can_do_setop(other)
   2273 
   2274         if self.equals(other):

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/index.pyc in _assert_can_do_setop(self, other)
   2319             if len(other) == 0:
   2320                 return True
-> 2321             raise TypeError('can only call with other hierarchical '
   2322                             'index objects')
   2323 

TypeError: can only call with other hierarchical index objects

      b     
   mean  sum
a           
1     2    2
4     5    5
    b  d
a       
1   2  3
7  10  6

also this fails

pd.merge(new_df.reset_index(), other_df.reset_index(), left_index=False, right_index=False, left_on =('a', ''), right_on ='a' )

and this

pd.merge(new_df.reset_index(), other_df.reset_index(), left_index=False, right_index=False, left_on =[('a', '')], right_on =['a'] )
@gerigk

This comment has been minimized.

Copy link
Author

commented Oct 5, 2012

it it possible somehow to use the agg({'a': [f1, f1], 'b': [f3, f4]}) syntax without creating the hierarchical columns?
of course, I could reset the index after doing this step to something like column-name_function-name and then I could easily join with the other dataframe but this feels wrong.

@wesm wesm closed this in 96545d0 Nov 28, 2012

@wesm

This comment has been minimized.

Copy link
Member

commented Nov 28, 2012

Made joins work. hierarchical levels become tuples

In [2]: pd.merge(new_df, other_df, left_index=True, right_index=True)
Out[2]: 
   (b, mean)  (b, sum)  b  d
a                           
1          2         2  2  3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.