DataFrame.to_records converts dates wrongly #1908

Closed
ukch opened this Issue Sep 13, 2012 · 7 comments

Comments

Projects
None yet
5 participants

ukch commented Sep 13, 2012

Possibly related to #1720:

When converting a DataFrame to a recarray using df.to_records, date indexes are incorrectly converted.

>>> import pandas
>>> df = pandas.DataFrame([["one", "two", "three"], ["four", "five", "six"]], index=pandas.date_range("2012-01-01", "2012-01-02"))
>>> df
               0     1      2
2012-01-01   one   two  three
2012-01-02  four  five    six
>>> df.to_records()
rec.array([(datetime.datetime(1970, 1, 16, 224, 0), 'one', 'two', 'three'),
       (datetime.datetime(1970, 1, 16, 248, 0), 'four', 'five', 'six')], 
      dtype=[('index', ('<M8[ns]', {})), ('0', '|O8'), ('1', '|O8'), ('2', '|O8')])

Notice the dates have been converted to 1970, even though the original dates were in 2012.

ukch commented Sep 13, 2012

>>> pandas.__version__
'0.9.0.dev-a83e691'
>>> numpy.__version__
'1.6.2'

ukch commented Sep 13, 2012

I have found that converting the index to Python datetime values (using index.topydatetime()) yields the expected value.

Owner

wesm commented Sep 13, 2012

It's a display/repr issue in NumPy 1.6 unfortunately. The actual nanosecond timestamps have not been altered

ukch commented Sep 13, 2012

I am pretty sure this is not simply a display/repr issue. See the following output:

>>> recs[0][0]
1970-01-16 224:00:00
>>> recs[0][0].astype(datetime.datetime)
datetime.datetime(1970, 1, 16, 224, 0)

I noticed this problem while trying to convert a DataFrame object into a PostgreSQL table using the psycopg2 library. The values generated by psycopg2 when passed the above datetime-converted objects were for dates in 1970.

Owner

wesm commented Sep 13, 2012

All caused by the same NumPy 1.6 bug. Maybe a solution is to add an option to to_records which sidesteps NumPy to properly convert the values to datetime.datetime

petergx commented Oct 31, 2012

+1

A new contributor who can reproduce this on their system should be able to write an implementation of this fairly quickly. (They would need to have a buggy version of NumPy, which may be very common! Otherwise, they would need to be familiar with 'pip' or other ways of installing/changing the version of NumPy installed.)

A question: should skipping NumPy be the default mode?

It should be possible to write a test case that checks that NumPy is skipped with the new argument to to_records, so it seems to me that the pull request should include a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment