New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiIndex with Heterogenous dtype #2207

Closed
TomAugspurger opened this Issue Nov 9, 2012 · 5 comments

Comments

Projects
None yet
2 participants
@TomAugspurger
Contributor

TomAugspurger commented Nov 9, 2012

I'm not sure if this is supposed to work or not.

I'm reading in a csv file:

df = pd.read_csv('nc201052.dat', index_col=['FLOW', 'PERIOD', 'DECLARANT', 'PARTNER', 'PRODUCT_NC'], nrows=1000)

df.head(5)

                                          STAT_REGIME  VALUE_1000ECU  QUANTITY_TON  SUP_QUANTITY
FLOW PERIOD DECLARANT PARTNER PRODUCT_NC                                                        
1    201052 001       3       01                    4       41818.21       13419.4             0
                      4       01                    4       17667.97        3939.6             0
                      5       01                    4        6956.63        1181.9             0
                      6       01                    4       44011.98        1031.2             0
                      7       01                    4        7141.68         559.0             0

The 'DECLARANT' columns index type is 'object' (they coded one item as 'EU').

df.index[0] gives (1, 201052, '001', 3, '01'), but df.ix[1, 201052, '001'] raises this error.

When I only import the first 10 rows, the parser infers the "DECLARANT" column as an integer, and the slice works. That's what is leading me to guess that it's the type of the index that's messing things up.

Sorry if this isn't actually a bug. Still very new to python. Thanks!

@wesm

This comment has been minimized.

Show comment
Hide comment
@wesm

wesm Nov 9, 2012

Member

Is it possible to share the first 1000 rows of the file for one of us to have a look?

Member

wesm commented Nov 9, 2012

Is it possible to share the first 1000 rows of the file for one of us to have a look?

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Nov 9, 2012

Contributor

Sure. I'm reading into a DataFrame with df2 = pd.read_csv('test.csv', index_col=['FLOW', 'PERIOD', 'DECLARANT', 'PRODUCT_NC', 'PARTNER'])

https://www.dropbox.com/s/w3cijnlblmhv0jj/test.csv

Thanks.

Contributor

TomAugspurger commented Nov 9, 2012

Sure. I'm reading into a DataFrame with df2 = pd.read_csv('test.csv', index_col=['FLOW', 'PERIOD', 'DECLARANT', 'PRODUCT_NC', 'PARTNER'])

https://www.dropbox.com/s/w3cijnlblmhv0jj/test.csv

Thanks.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Nov 10, 2012

Contributor

I'm not sure if this helps or makes things more confusing, but when I pass

index_col=['FLOW', 'PERIOD', 'PRODUCT_NC', 'DECLARANT', 'PARTNER']

where I've just changed the order of the columns, all the .ix slicing is working as expected.

Contributor

TomAugspurger commented Nov 10, 2012

I'm not sure if this helps or makes things more confusing, but when I pass

index_col=['FLOW', 'PERIOD', 'PRODUCT_NC', 'DECLARANT', 'PARTNER']

where I've just changed the order of the columns, all the .ix slicing is working as expected.

@wesm

This comment has been minimized.

Show comment
Hide comment
@wesm

wesm Jan 20, 2013

Member

sorry that file's no longer valid =/

Member

wesm commented Jan 20, 2013

sorry that file's no longer valid =/

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jan 20, 2013

Contributor

I'm not sure but I think I found the same file. Anyway, when I try that indexing from above, both work correctly. I'll close this since I can't reproduce the bug. Thanks.

Contributor

TomAugspurger commented Jan 20, 2013

I'm not sure but I think I found the same file. Anyway, when I try that indexing from above, both work correctly. I'll close this since I can't reproduce the bug. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment