Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame[timedelta64] / timedelta64 or pydatetime has wrong dtype and wrong values #20088

Closed
nmusolino opened this issue Mar 9, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@nmusolino
Copy link
Contributor

commented Mar 9, 2018

Code Sample

In [1]: import pandas, numpy, datetime

In [2]: df = pandas.DataFrame({'x': numpy.timedelta64(1, 'ms') * numpy.arange(0, 5)})

In [3]: df
Out[3]:
                x
0        00:00:00
1 00:00:00.001000
2 00:00:00.002000
3 00:00:00.003000
4 00:00:00.004000

In [4]: df.dtypes
Out[4]:
x    timedelta64[ns]
dtype: object

In [5]: df['x'] / datetime.timedelta(milliseconds=1)
Out[5]:
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
Name: x, dtype: float64

In [6]: df[['x']] / datetime.timedelta(milliseconds=1)
Out[6]:
                x
0        00:00:00
1 00:00:00.000000
2 00:00:00.000000
3 00:00:00.000000
4 00:00:00.000000

In [8]: df[['x']] / numpy.timedelta64(1, 'ms')
Out[8]:
                x
0        00:00:00
1 00:00:00.000000
2 00:00:00.000000
3 00:00:00.000000
4 00:00:00.000000

Problem description

When performing true division on a dataframe containing timedelta64 values, and dividing by a datetime.timedelta object or a timedelta64, there are two problems:

  1. the resulting values are incorrect; and
  2. the dtype (timedelta64[ns]) in the resulting dataframe is not consistent with the results of the same operation on the pandas Series or the numpy array. (In those cases, the result is a float series or array.)

Expected Output

The dataframe should contain a float64 column, with values equal to df['x'] / numpy.timedelta64(1, 'ms'):

In [11]: df.apply(lambda s: s.values / numpy.timedelta64(1, 'ms'))
Out[11]:
     x
0  0.0
1  1.0
2  2.0
3  3.0
4  4.0

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.4.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: 1.5.0
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.3
html5lib: 0.999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.3
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.43.0
pandas_datareader: None

@nmusolino nmusolino changed the title DataFrame[timedelta64] / timedelta64 or pydatetime has inconsistent dtype and is numerically wrong DataFrame[timedelta64] / timedelta64 or pydatetime has wrong dtype and wrong values Mar 9, 2018

@nmusolino

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2018

Related issues are listed on a "Roundup" issue here: #18824, but I did not see this particular issue listed there or when searching.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

you at using an older version of pandas

try on 0.22

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 10, 2018

DataFrames haven't really been addressed too much for timedelta operations, I'll let @jbrockmendel add this to his list. closing.

@jreback jreback closed this Mar 10, 2018

@jreback jreback added this to the No action milestone Mar 10, 2018

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Mar 10, 2018

@nmusolino is this different from #20088? If so, can you please comment in #18824 to clarify how

@nmusolino

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2018

@nmusolino is this different from #20088? If so, can you please comment in #18824 to clarify how

This is issue #20088. I commented in that issue as requested.

@nmusolino

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2018

@jreback I think you are closing this issue incorrectly.

#18824 is a compendium of multiple issues around a theme, but every listed issue there is kept open until it is resolved. For example, all these dataframe-related issues are open, and also listed in the last section of #18824:

pd.DataFrame([pd.NaT]).eq(pd.NaT) returns NaN instead of bool. == is OK. #15697.
DataFrame.eq is raising instead of returning bool #13128
DataFrame.sub broadcasting problem #12437
DataFrame[datetime64] - datetime is returning DataFrame[datetime64] instead of > DataFrame[timedelta64] #8554
pd.Timestamp('2000-01-01') > pd.DataFrame({'x': range(5)}) returns 5 Trues instead of raising #8932
datetime64 comparisons raise incorrectly #9006
#17559 DataFrame[datetime64] != pd.Series([pd.NaT]) raises

I suggest re-opening this issue so that it can be added to the list in #18824.

@nmusolino

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2018

@jbrockmendel I recommend re-opening this issue to be consistent with other issues listed in the same section of omnibus issue #18824. Those other issues are the following, which are all open: #8554, #8932, #9006, #12437, #13128, #15697, and #17559.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Apr 24, 2018

@nmusolino This issue is listed (and open) in #18824, so it hasn't been forgotten about. jreback chose to close this in part to make it easier to triage other outstanding issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.