ERR: better error message on invalid on with multi-index columns #9455

amelio-vazquez-reina · 2015-02-09T22:11:29Z

Consider the following:

> a_df

   object_uid       item_uid
0  0FlIRitxKX     0D1dPxep2C
1  0FLPRriacw     0DoYxRkSdr
2  0FUh6FQ4mQ     0DsOBGsYl9
3  0FH1hIfNCL     0DoYxRkSdr
4  0FZVBzi2DX     0DoYxRkSdr
5  0FAJ5vfLQs     0DoYxRkSdr
6  0F7lEk65qa     0DoYxRkSdr
7  0Fqe3TIN7Q     0DoYxRkSdr
8  0FrSgt4Vut     0DsOBGsYl9
9  0FByA7F19x     0DoYxRkSdr

and

> b_df

   object_uid         after        before change_rate
                micro_spend   micro_spend            
0  0F0092Ntoi  2.940774e+09  2.932917e+09    0.267909
1  0F01SZnTfs  3.000654e+07  2.887034e+07    3.935523
2  0F027Rm6rd  6.729106e+07  6.912947e+07   -2.659376
3  0F02BZeTr6  2.072908e+09  2.048737e+09    1.179802
4  0F02Vwd6Ou  1.619934e+08  1.682784e+08   -3.734909

I would like to do an inner join in object_uid. When I do:

pd.merge(a_df, b_df, on='object_uid')

I get:
AttributeError: 'numpy.ndarray' object has no attribute 'start'

This is with:

> pd.pandas.__version__
'0.15.2'

The full error trace is below:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-131-5e11878c8b10> in <module>()
----> 1 pd.merge(exp_setup_df.head(10).iloc[:, [0,4]].copy(), spend_diff.reset_index().head(10).copy(), on='flight_uid')

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tools/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, s
uffixes, copy)                                                                                                                                                         
     36                          right_on=right_on, left_index=left_index,
     37                          right_index=right_index, sort=sort, suffixes=suffixes,
---> 38                          copy=copy)
     39     return op.get_result()
     40 if __debug__:

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tools/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right
_index, sort, suffixes, copy)                                                                                                                                          
    182         (self.left_join_keys,
    183          self.right_join_keys,
--> 184          self.join_names) = self._get_merge_keys()
    185 
    186     def get_result(self):

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tools/merge.py in _get_merge_keys(self)
    386 
    387         if right_drop:
--> 388             self.right = self.right.drop(right_drop, axis=1)
    389 
    390         return left_keys, right_keys, join_names

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py in drop(self, labels, axis, level, inplace, **kwargs)
   1559                 new_axis = axis.drop(labels, level=level)
   1560             else:
-> 1561                 new_axis = axis.drop(labels)
   1562             dropped = self.reindex(**{axis_name: new_axis})
   1563             try:

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/index.py in drop(self, labels, level)
   3734                 inds.append(loc)
   3735             else:
-> 3736                 inds.extend(lrange(loc.start, loc.stop))
   3737 
   3738         return self.delete(inds)

AttributeError: 'numpy.ndarray' object has no attribute 'start'

Also, the following works:

> result_df = pd.concat([a_df.set_index('object_uid'), b_df.set_index('object_uid'], axis=1).columns

but it flattens my columns:

> result_df.columns
Index(['object_uid', ('after','micro_spend'), ('before', 'micro_spend'), 'item_id'], dtype='object']

The text was updated successfully, but these errors were encountered:

jreback · 2015-02-09T22:30:01Z

show .info() on both frames

amelio-vazquez-reina · 2015-02-09T22:32:36Z

Thanks @jreback

> a_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 2 columns):
object_uid      10 non-null object
item_uid        10 non-null object
dtypes: object(2)
memory usage: 240.0+ bytes

and

> b_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 4 columns):
(object_uid, )           10 non-null object
(after, micro_spend)     10 non-null float64
(before, micro_spend)    10 non-null float64
(change_rate, )          10 non-null float64
dtypes: float64(3), object(1)
memory usage: 400.0+ bytes

jreback · 2015-02-09T22:37:59Z

how did you create the columns ?

jreback · 2015-02-09T22:38:08Z

the columns index mean

amelio-vazquez-reina · 2015-02-09T22:49:00Z

I created the multi-index one (b_df as follows):

b_df            = pd.concat([left_df, right_df], keys=['after', 'before'], axis=1)
b_df.index.name = 'object_uid'
b_df            = b_df.reset_index()

Other than that, a_df and the data in left_df and right_df comes from SQL queries and perhaps a few other joins before that.

amelio-vazquez-reina · 2015-02-09T22:57:27Z

That said @jreback I have tried using .copy() on both dataframes with the hope that that may perhaps recreate the dataframes with no luck.

amelio-vazquez-reina · 2015-02-09T23:32:31Z

Also, the following (flattening columns for b_df) works:

b_df.columns = ['object_uid', 'after', 'before', 'change_rate']
pd.merge(a_df, b_df, on='object_uid')

jreback · 2015-02-10T14:23:35Z

When you have a multi-index column, you have to specify the entire column depth in the on, IOW, a tuple. The error message could be better I think though. Want to submit a pull-request to improve that?

amelio-vazquez-reina · 2015-02-11T16:55:48Z

Thanks @jreback I wonder if the following problem is related to this:

Consider a dataframe that we populate as follows:

df = pd.DataFrame()
df['col1'] = ...
df['col2'] = ...

idx = pd.IndexSlice
df[idx['higher_col_1', 'col3']] = ...
df[idx['higher_col_2', 'col4']] = ...

I end up with

list(df.columns)
['col1',
'col2',
('higher_col_1', 'col3')
('higher_col_1', 'col4')
]

and when I do:

df.sort_index(axis=1

I get:

TypeError: unorderable types: str() > tuple()

amelio-vazquez-reina · 2015-02-11T16:57:37Z

And on that note, is there any way to recreate the index in the example above automatically? (e.g. pushing any index value with lower depth than the max depth of the axis to the highest and lower level, and creating, empty level values for any depth missing?)

jreback · 2015-02-11T17:50:21Z

what you are doing is not supported, you need to use df.loc[...]

amelio-vazquez-reina · 2015-02-11T17:55:55Z

Thanks @jreback. That's good to know. I changed all my df[idx[..]] commands to df.loc[:, idx[..]] and, for what is worth, i get the same exact result (unable to sort the index).

I will definitely try to improve the error message with a PR, but I think the problem I am having in my last two posts is different (unable to sort the index with columns with different depths).

jreback · 2015-02-11T18:03:28Z

@amelio-vazquez-reina you cannot have columns of different depths; you think you do, but they are really all the same depth.

amelio-vazquez-reina · 2015-02-11T19:37:20Z

Thanks @jreback. I understand now the problem.

nbonnotte · 2016-02-03T12:15:11Z

Xref #12078, #11640

The initial issue has been solved with PR #12158

I'll do a PR to add some tests

jreback added Error Reporting Incorrect or improved errors from pandas MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 10, 2015

jreback added this to the 0.17.0 milestone Feb 10, 2015

jreback added the Good as first PR label Feb 10, 2015

jreback changed the title ~~Unable to merge single-level with multi-level~~ ERR: better error message on invalid on with multi-index columns Feb 10, 2015

nbonnotte mentioned this issue Feb 3, 2016

ERR automatic broadcast for merging different levels, #9455 #12219

Closed

jreback modified the milestones: 0.18.1, Next Major Release Apr 25, 2016

jreback closed this as completed in bb9b9c5 Apr 25, 2016

dinya mentioned this issue Sep 9, 2020

DEPR: Merging on different number of levels #34862

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR: better error message on invalid on with multi-index columns #9455

ERR: better error message on invalid on with multi-index columns #9455

amelio-vazquez-reina commented Feb 9, 2015

jreback commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

jreback commented Feb 9, 2015

jreback commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

jreback commented Feb 10, 2015

amelio-vazquez-reina commented Feb 11, 2015

amelio-vazquez-reina commented Feb 11, 2015

jreback commented Feb 11, 2015

amelio-vazquez-reina commented Feb 11, 2015

jreback commented Feb 11, 2015

amelio-vazquez-reina commented Feb 11, 2015

nbonnotte commented Feb 3, 2016

ERR: better error message on invalid on with multi-index columns #9455

ERR: better error message on invalid on with multi-index columns #9455

Comments

amelio-vazquez-reina commented Feb 9, 2015

jreback commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

jreback commented Feb 9, 2015

jreback commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

amelio-vazquez-reina commented Feb 9, 2015

jreback commented Feb 10, 2015

amelio-vazquez-reina commented Feb 11, 2015

amelio-vazquez-reina commented Feb 11, 2015

jreback commented Feb 11, 2015

amelio-vazquez-reina commented Feb 11, 2015

jreback commented Feb 11, 2015

amelio-vazquez-reina commented Feb 11, 2015

nbonnotte commented Feb 3, 2016