Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: better error message on invalid on with multi-index columns #9455

Closed
amelio-vazquez-reina opened this issue Feb 9, 2015 · 15 comments
Closed
Labels
Error Reporting Incorrect or improved errors from pandas MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@amelio-vazquez-reina
Copy link
Contributor

Consider the following:

> a_df

   object_uid       item_uid
0  0FlIRitxKX     0D1dPxep2C
1  0FLPRriacw     0DoYxRkSdr
2  0FUh6FQ4mQ     0DsOBGsYl9
3  0FH1hIfNCL     0DoYxRkSdr
4  0FZVBzi2DX     0DoYxRkSdr
5  0FAJ5vfLQs     0DoYxRkSdr
6  0F7lEk65qa     0DoYxRkSdr
7  0Fqe3TIN7Q     0DoYxRkSdr
8  0FrSgt4Vut     0DsOBGsYl9
9  0FByA7F19x     0DoYxRkSdr

and

> b_df

   object_uid         after        before change_rate
                micro_spend   micro_spend            
0  0F0092Ntoi  2.940774e+09  2.932917e+09    0.267909
1  0F01SZnTfs  3.000654e+07  2.887034e+07    3.935523
2  0F027Rm6rd  6.729106e+07  6.912947e+07   -2.659376
3  0F02BZeTr6  2.072908e+09  2.048737e+09    1.179802
4  0F02Vwd6Ou  1.619934e+08  1.682784e+08   -3.734909

I would like to do an inner join in object_uid. When I do:

pd.merge(a_df, b_df, on='object_uid')

I get:
AttributeError: 'numpy.ndarray' object has no attribute 'start'

This is with:

> pd.pandas.__version__
'0.15.2'

The full error trace is below:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-131-5e11878c8b10> in <module>()
----> 1 pd.merge(exp_setup_df.head(10).iloc[:, [0,4]].copy(), spend_diff.reset_index().head(10).copy(), on='flight_uid')

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tools/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, s
uffixes, copy)                                                                                                                                                         
     36                          right_on=right_on, left_index=left_index,
     37                          right_index=right_index, sort=sort, suffixes=suffixes,
---> 38                          copy=copy)
     39     return op.get_result()
     40 if __debug__:

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tools/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right
_index, sort, suffixes, copy)                                                                                                                                          
    182         (self.left_join_keys,
    183          self.right_join_keys,
--> 184          self.join_names) = self._get_merge_keys()
    185 
    186     def get_result(self):

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tools/merge.py in _get_merge_keys(self)
    386 
    387         if right_drop:
--> 388             self.right = self.right.drop(right_drop, axis=1)
    389 
    390         return left_keys, right_keys, join_names

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py in drop(self, labels, axis, level, inplace, **kwargs)
   1559                 new_axis = axis.drop(labels, level=level)
   1560             else:
-> 1561                 new_axis = axis.drop(labels)
   1562             dropped = self.reindex(**{axis_name: new_axis})
   1563             try:

/Users/avazquez/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/index.py in drop(self, labels, level)
   3734                 inds.append(loc)
   3735             else:
-> 3736                 inds.extend(lrange(loc.start, loc.stop))
   3737 
   3738         return self.delete(inds)

AttributeError: 'numpy.ndarray' object has no attribute 'start'

Also, the following works:

> result_df = pd.concat([a_df.set_index('object_uid'), b_df.set_index('object_uid'], axis=1).columns

but it flattens my columns:

> result_df.columns
Index(['object_uid', ('after','micro_spend'), ('before', 'micro_spend'), 'item_id'], dtype='object']
@jreback
Copy link
Contributor

jreback commented Feb 9, 2015

show .info() on both frames

@amelio-vazquez-reina
Copy link
Contributor Author

Thanks @jreback

> a_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 2 columns):
object_uid      10 non-null object
item_uid        10 non-null object
dtypes: object(2)
memory usage: 240.0+ bytes

and

> b_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 4 columns):
(object_uid, )           10 non-null object
(after, micro_spend)     10 non-null float64
(before, micro_spend)    10 non-null float64
(change_rate, )          10 non-null float64
dtypes: float64(3), object(1)
memory usage: 400.0+ bytes

@jreback
Copy link
Contributor

jreback commented Feb 9, 2015

how did you create the columns ?

@jreback
Copy link
Contributor

jreback commented Feb 9, 2015

the columns index mean

@amelio-vazquez-reina
Copy link
Contributor Author

I created the multi-index one (b_df as follows):

b_df            = pd.concat([left_df, right_df], keys=['after', 'before'], axis=1)
b_df.index.name = 'object_uid'
b_df            = b_df.reset_index()

Other than that, a_df and the data in left_df and right_df comes from SQL queries and perhaps a few other joins before that.

@amelio-vazquez-reina
Copy link
Contributor Author

That said @jreback I have tried using .copy() on both dataframes with the hope that that may perhaps recreate the dataframes with no luck.

@amelio-vazquez-reina
Copy link
Contributor Author

Also, the following (flattening columns for b_df) works:

b_df.columns = ['object_uid', 'after', 'before', 'change_rate']
pd.merge(a_df, b_df, on='object_uid')

@jreback
Copy link
Contributor

jreback commented Feb 10, 2015

When you have a multi-index column, you have to specify the entire column depth in the on, IOW, a tuple. The error message could be better I think though. Want to submit a pull-request to improve that?

@jreback jreback added Error Reporting Incorrect or improved errors from pandas MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 10, 2015
@jreback jreback added this to the 0.17.0 milestone Feb 10, 2015
@jreback jreback changed the title Unable to merge single-level with multi-level ERR: better error message on invalid on with multi-index columns Feb 10, 2015
@amelio-vazquez-reina
Copy link
Contributor Author

Thanks @jreback I wonder if the following problem is related to this:

Consider a dataframe that we populate as follows:

df = pd.DataFrame()
df['col1'] = ...
df['col2'] = ...

idx = pd.IndexSlice
df[idx['higher_col_1', 'col3']] = ...
df[idx['higher_col_2', 'col4']] = ...

I end up with

list(df.columns)
['col1',
'col2',
('higher_col_1', 'col3')
('higher_col_1', 'col4')
]

and when I do:

df.sort_index(axis=1

I get:

TypeError: unorderable types: str() > tuple()

@amelio-vazquez-reina
Copy link
Contributor Author

And on that note, is there any way to recreate the index in the example above automatically? (e.g. pushing any index value with lower depth than the max depth of the axis to the highest and lower level, and creating, empty level values for any depth missing?)

@jreback
Copy link
Contributor

jreback commented Feb 11, 2015

what you are doing is not supported, you need to use df.loc[...]

@amelio-vazquez-reina
Copy link
Contributor Author

Thanks @jreback. That's good to know. I changed all my df[idx[..]] commands to df.loc[:, idx[..]] and, for what is worth, i get the same exact result (unable to sort the index).

I will definitely try to improve the error message with a PR, but I think the problem I am having in my last two posts is different (unable to sort the index with columns with different depths).

@jreback
Copy link
Contributor

jreback commented Feb 11, 2015

@amelio-vazquez-reina you cannot have columns of different depths; you think you do, but they are really all the same depth.

@amelio-vazquez-reina
Copy link
Contributor Author

Thanks @jreback. I understand now the problem.

@nbonnotte
Copy link
Contributor

Xref #12078, #11640

The initial issue has been solved with PR #12158

I'll do a PR to add some tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants