Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: timedelta64[s] series constructor isn't equal with alternative constructor using to_timedelta unit='s' #48312

Open
3 tasks done
Tracked by #46587
ntachukwu opened this issue Aug 30, 2022 · 8 comments
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type

Comments

@ntachukwu
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas import Series
import pandas._testing as tm

result = Series([1000000, 200000, 3000000], dtype="timedelta64[s]")
expected = Series(pd.to_timedelta([1000000, 200000, 3000000], unit="s"))
tm.assert_series_equal(result, expected)

Issue Description

This code passes

 result = Series([1000000, 200000, 3000000], dtype="timedelta64[ns]")
 expected = Series(pd.to_timedelta([1000000, 200000, 3000000], unit="ns"))
 tm.assert_series_equal(result, expected)

But when dtype="timedelta64[s]" and unit="s" it returns

AssertionError: numpy array are different

numpy array values are different (100.0 %)
[index]: [0, 1, 2]
[left]:  [1000000, 200000, 3000000]
[right]: [1000000000000000, 200000000000000, 3000000000000000]

Expected Behavior

Both series should be equal.

Installed Versions

INSTALLED VERSIONS

commit : 201cbf6
python : 3.9.10.final.0
python-bits : 64
OS : Darwin
OS-release : 21.3.0
Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.5.0.dev0+1364.g201cbf6bc1.dirty
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 60.9.3
pip : 22.1.1
Cython : 0.29.32
pytest : 7.1.2
hypothesis : 6.52.3
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.3.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@ntachukwu ntachukwu added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 30, 2022
@phofl
Copy link
Member

phofl commented Sep 1, 2022

Simple reproducer:

result = np.array([1000000, 200000, 3000000], dtype="timedelta64[s]")
result_pandas = pd.Series([1000000, 200000, 3000000], dtype="timedelta64[s]")
tm.assert_numpy_array_equal(result, result_pandas.values)

This should pass, but we seem to ignore the seconds and interpret it as nanoseconds

@phofl
Copy link
Member

phofl commented Sep 1, 2022

This is currently not supported and should raise imo rather than returning buggy conversions

cc @jbrockmendel

@gfyoung gfyoung added Timestamp pd.Timestamp and associated methods and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2022
@jbrockmendel
Copy link
Member

This is currently not supported and should raise imo rather than returning buggy conversions

Agreed.

Also will be supported in 2.0, so just need a temporary patch for 1.4.x/1.5.x

@jbrockmendel
Copy link
Member

cc @mroeschke @jreback this becomes more salient with non-nano support. pd.Series([1, 2, 3], dtype="m8[s]") i think ideally should interpret those integers as seconds, but without an API change it will interpret them as nanoseconds, then cast the result to m8[s]. Interpreting them as seconds would also be consistent with pd.Series([1, 2 , 3]).astype("m8[s]")

@jbrockmendel jbrockmendel added the Non-Nano datetime64/timedelta64 with non-nanosecond resolution label Oct 12, 2022
@mroeschke
Copy link
Member

That sounds reasonable; it also make it effectively similar to to_timedelta([ints], unit="s") which in spirit mangles "unit" and "reso" but may not matter.

@jbrockmendel
Copy link
Member

cc @jreback

@jbrockmendel
Copy link
Member

possible deprecation cycles notwithstanding, my preferred behavior would be for pd.Series(some_ints, dtype="m8[unit]").to_numpy() to match np.array(some_ints, dtype="m8[unit]"). i'd do the same for dt64 dtypes.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2022

proposal sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type
Projects
None yet
6 participants