New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.to_records() converts DatetimeIndex to datetime object, not np.datetime64 #18160

Closed
jzwinck opened this Issue Nov 8, 2017 · 11 comments

Comments

Projects
None yet
5 participants
@jzwinck
Contributor

jzwinck commented Nov 8, 2017

This code:

df = pd.DataFrame([pd.to_datetime('2017-11-07')], [pd.to_datetime('2017-11-08')])
df.to_records()

Produces a totally different column type for the index vs the data:

rec.array([(datetime.datetime(2017, 11, 8, 0, 0), '2017-11-07T00:00:00.000000000')], 
          dtype=[('index', 'O'), ('0', '<M8[ns]')])

It should produce:

rec.array([('2017-11-08T00:00:00.000000000', '2017-11-07T00:00:00.000000000')], 
          dtype=[('index', '<M8[ns]'), ('0', '<M8[ns]')])

I'm using Pandas 0.20.3, NumPy 1.13.1, and Python 3.5.3.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 8, 2017

seems buggy
would appreciate a PR

@jreback jreback added this to the Next Major Release milestone Nov 8, 2017

@jzwinck

This comment has been minimized.

Contributor

jzwinck commented Nov 10, 2017

@jreback I just discovered this was broken by #1908 (commit 38333f4) which added the to_records(convert_datetime64) option with the horrible default of True!

It looks like the option was added in order to work around some now-ancient NumPy bug. Setting convert_datetime64=False now works fine (for my test case and for the one in #1908).

What do you think about changing the default from True to False? The current default loses precision and is inefficient.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 10, 2017

i would be in favor of deprecating that option entirely

@jzwinck

This comment has been minimized.

Contributor

jzwinck commented Nov 10, 2017

Does that require a DeprecationWarning first, then later removal, or should we just remove the option at once?

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 10, 2017

FutureWarning

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Dec 23, 2017

@H0R5E

This comment has been minimized.

Contributor

H0R5E commented Aug 7, 2018

I actually get a bad conversion with to_records() now unless I set convert_datetime64=True. The docs do say "By default, timestamps are converted to datetime.datetime".

>conda info

     active environment : _dtocean_dev
    active env location : C:\Anaconda\envs\_dtocean_dev
            shell level : 1
       user config file : C:\Users\Work\.condarc
 populated config files : C:\Users\Work\.condarc
          conda version : 4.5.9
    conda-build version : 3.0.30
         python version : 2.7.14.final.0
       base environment : C:\Anaconda  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/win-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/win-64
                          https://repo.anaconda.com/pkgs/pro/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
                          https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/dataonlygreater/win-64
                          https://conda.anaconda.org/dataonlygreater/noarch
                          https://conda.anaconda.org/t/<TOKEN>/topper/win-64
                          https://conda.anaconda.org/t/<TOKEN>/topper/noarch
          package cache : C:\Anaconda\pkgs
                          C:\Users\Work\AppData\Local\conda\conda\pkgs
       envs directories : C:\Anaconda\envs
                          C:\Users\Work\AppData\Local\conda\conda\envs
                          C:\Users\Work\.conda\envs
               platform : win-64
             user-agent : conda/4.5.9 requests/2.18.4 CPython/2.7.14 Windows/7 W
indows/6.1.7601
          administrator : False
             netrc file : None
           offline mode : False
> conda list
...
numpy                     1.11.3           py27he0c0ee4_9
...
pandas                    0.23.3           py27h39f3610_0
...
@reidy-p

This comment has been minimized.

Contributor

reidy-p commented Aug 17, 2018

@H0R5E is this still an issue for you? If so, can you post a reproducible example and I will try to see if I can find the problem.

@H0R5E

This comment has been minimized.

Contributor

H0R5E commented Aug 18, 2018

@reidy-p, sure here you go:

import numpy as np
import pandas as pd

rng = pd.date_range('1/1/2011', periods=2, freq='H')
data = {"a": np.random.randn(len(rng)),
        "b": np.random.randn(len(rng))}
df = pd.DataFrame(data, index=rng)

type(df.to_records()[0][0]) # returns numpy.datetime64, expecting datetime.datetime
@H0R5E

This comment has been minimized.

Contributor

H0R5E commented Aug 18, 2018

My point is that you've made a significant change of behaviour without correcting your docs. The examples show the expected output to be datetime.datetime. There is also no test for which type it should be.

        By default, timestamps are converted to `datetime.datetime`:
        >>> df.index = pd.date_range('2018-01-01 09:00', periods=2, freq='min')
        >>> df
                             A     B
        2018-01-01 09:00:00  1  0.50
        2018-01-01 09:01:00  2  0.75
        >>> df.to_records()
        rec.array([(datetime.datetime(2018, 1, 1, 9, 0), 1, 0.5 ),
                   (datetime.datetime(2018, 1, 1, 9, 1), 2, 0.75)],
                  dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

I'm surprised I'm the only one to have been caught out by this.

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Aug 18, 2018

@H0R5E could you submit a PR fixing the docs? Thanks.

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Aug 18, 2018

You can remove to_records from this list too:

-k"-assign -axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_records -to_stata -transform"
so that it's tested.

@H0R5E H0R5E referenced this issue Aug 19, 2018

Merged

DOC: fix Dataframe.to_records examples #22419

2 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment