Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: values array for non-unique, interleaved cols incorrect #4460

Merged
merged 1 commit into from
Aug 5, 2013

Conversation

Komnomnomnom
Copy link
Contributor

I came across this problem when I was doing some work for #4362. When a frame with mixed dtypes including Timestamps has dupe column labels the values array returned is different, as illustrated below (master branch):

In [3]: df = pd.DataFrame([[pd.Timestamp('20130101'),3.5],[pd.Timestamp('20130102'),4.5]], columns=['x', 'x'], index=[1,2])

In [4]: df
Out[4]: 
                    x    x
1 2013-01-01 00:00:00  3.5
2 2013-01-02 00:00:00  4.5

In [5]: df.values
Out[5]: 
array([[1356998400000000000L, 3.5],
       [1357084800000000000L, 4.5]], dtype=object)

In [6]: df = pd.DataFrame([[pd.Timestamp('20130101'),3.5],[pd.Timestamp('20130102'),4.5]], columns=['x', 'y'], index=[1,2])

In [7]: df
Out[7]: 
                    x    y
1 2013-01-01 00:00:00  3.5
2 2013-01-02 00:00:00  4.5

In [8]: df.values
Out[8]: 
array([[datetime.datetime(2013, 1, 1, 0, 0), 3.5],
       [datetime.datetime(2013, 1, 2, 0, 0), 4.5]], dtype=object)

The included changes should fix the problem. This is my first time messing about in pandas internals so any feedback is appreciated!

Possibly related to #4377


df_unique = df.copy()
df_unique.columns = ['x', 'y']
self.assert_((df_unique.values == df.values).all())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I think you can use from numpy.testing import assert_array_equal here (same goes for testing pandas objects pandas.util.testing functions which give more informative error messages :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll update it.

@hayd
Copy link
Contributor

hayd commented Aug 5, 2013

This looks good, interestingly in 0.12 this was an assertion error (so it's good it's tested!):

AssertionError: All items must be in block items

@Komnomnomnom
Copy link
Contributor Author

I think @jreback fixed the assertion error in #4388.

@jreback
Copy link
Contributor

jreback commented Aug 5, 2013

let me look at this a sec

@jreback
Copy link
Contributor

jreback commented Aug 5, 2013

@Komnomnomnom this looks good....nice catch

jreback added a commit that referenced this pull request Aug 5, 2013
BUG: values array for non-unique, interleaved cols incorrect
@jreback jreback merged commit 442b7ee into pandas-dev:master Aug 5, 2013
@Komnomnomnom
Copy link
Contributor Author

Awesome, thanks :-)

@Komnomnomnom Komnomnomnom deleted the frame-dupe-cols branch August 5, 2013 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants