Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_index breaks depending on column name or number #2008

Closed
PhE opened this issue Oct 2, 2012 · 4 comments
Closed

set_index breaks depending on column name or number #2008

PhE opened this issue Oct 2, 2012 · 4 comments
Assignees
Milestone

Comments

@PhE
Copy link

PhE commented Oct 2, 2012

I get a very strange behaviour with pandas current dev version (tagged as '0.9.0rc2') :
Calling set_index() on a DataFrame breaks depending on the columns name I use !!!

This code fails :

df3 = DataFrame({
    '0001-INTERNAL_PRODUCT_CODE': {('A', 'A0006000', 'nuit'): 'A0006000'},
    '0004-PACKAGING_LANGUAGE_CODE': {('A', 'A0006000', 'nuit'): nan},
    '1050-LOCALISED_DESCRIPTION': {('A', 'A0006000', 'nuit'): nan},
    '0024-PRODUCT_BAR_CODED': {('A', 'A0006000', 'nuit'): nan},
    'Source system': {('A', 'A0006000', 'nuit'): 'A'}, 
    '0027-LOCAL_COMMERCIAL_AUTHORIZATION': {('A', 'A0006000', 'nuit'): nan},
})
df3.set_index(['0001-INTERNAL_PRODUCT_CODE', '1050-LOCALISED_DESCRIPTION']) # fails !

But If I change the indexed column names to 'a' and 'x', It works :

df3 = DataFrame({
    'a': {('A', 'A0006000', 'nuit'): 'A0006000'},
    '0004-PACKAGING_LANGUAGE_CODE': {('A', 'A0006000', 'nuit'): nan},
    'x': {('A', 'A0006000', 'nuit'): nan},
    '0024-PRODUCT_BAR_CODED': {('A', 'A0006000', 'nuit'): nan},
    'Source system': {('A', 'A0006000', 'nuit'): 'A'}, 
    '0027-LOCAL_COMMERCIAL_AUTHORIZATION': {('A', 'A0006000', 'nuit'): nan},
})
df3.set_index(['a', 'x']) # works !

At first I thought It was related to column name .... but If I keep my exotic column name and remove one column (one not used by set_index), it works !! :

df3 = DataFrame({
    '0001-INTERNAL_PRODUCT_CODE': {('A', 'A0006000', 'nuit'): 'A0006000'},
    '1050-LOCALISED_DESCRIPTION': {('A', 'A0006000', 'nuit'): nan},
    '0024-PRODUCT_BAR_CODED': {('A', 'A0006000', 'nuit'): nan},
    'Source system': {('A', 'A0006000', 'nuit'): 'A'}, 
    '0027-LOCAL_COMMERCIAL_AUTHORIZATION': {('A', 'A0006000', 'nuit'): nan},
})
df3.set_index(['0001-INTERNAL_PRODUCT_CODE', '1050-LOCALISED_DESCRIPTION']) # works !
@lodagro
Copy link
Contributor

lodagro commented Oct 2, 2012

on master:

In [9]: df3.set_index(['0001-INTERNAL_PRODUCT_CODE', '1050-LOCALISED_DESCRIPTION'])
Out[9]: 
                                                       0004-PACKAGING_LANGUAGE_CODE  0024-PRODUCT_BAR_CODED  0027-LOCAL_COMMERCIAL_AUTHORIZATION Source system
0001-INTERNAL_PRODUCT_CODE 1050-LOCALISED_DESCRIPTION                                                                                                         
A0006000                   nan                                                  NaN                     NaN                                  NaN             A

However, this way you are using NaN in the index, which is a no-no, see also #1971.

@wesm
Copy link
Member

wesm commented Nov 3, 2012

Can this be closed?

@changhiskhan
Copy link
Contributor

Don't think so. Looks like this is broken on master now. Another instance of na support issues with MultiIndex.
It's not dependent on the column name. It's the call to Index.summary that craps out and the format in the difference use cases differ.

@wesm
Copy link
Member

wesm commented Nov 4, 2012

I'll have a look. After food, though

@ghost ghost assigned wesm Nov 4, 2012
@wesm wesm closed this as completed in 81661af Nov 5, 2012
joaonatali pushed a commit to joaonatali/pandas that referenced this issue Nov 10, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants