Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misrepresented fractional seconds in timestamps and timedeltas #23059

Open
cbertinato opened this issue Oct 9, 2018 · 2 comments
Open

misrepresented fractional seconds in timestamps and timedeltas #23059

cbertinato opened this issue Oct 9, 2018 · 2 comments
Labels
Bug Needs Discussion Requires discussion from core team before further action Timedelta Timedelta data type

Comments

@cbertinato
Copy link
Contributor

Code Sample

>>> timestamp = 1490193630.8
>>> pd.to_timedelta(timestamp, unit='s')
Timedelta('17247 days 14:40:30.799999')

However, this works:

>>> pd.to_timedelta(timestamp*1e9, unit='ns')
Timedelta('17247 days 14:40:30.800000')

Problem description

Depending upon the value of the float to be converted to a timestamp or timedelta and on the unit, the resulting timestamp and timedelta will occasionally misrepresent the fractional part of the input float.

Expected Output

Timedelta('17247 days 14:40:30.8')

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: a4482db
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+716.ga4482db46
pytest: 3.5.1
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0.post1
xarray: 0.10.3
IPython: 6.4.0
sphinx: 1.7.4
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: 0.8.1
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@cbertinato
Copy link
Contributor Author

The root of the issue goes back to cast_from_unit in https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/timedeltas.pyx#L275, so it could very well affect other types. This is a classical case of representation error, but I think that we should represent the fractional second that was intended.

One thought is to use the decimal module here:

timestamp = Decimal(str(ts))
base = Decimal(<int64_t>ts)
frac = timestamp - base

It does the job, but there is a performance penalty. Currently:

%timeit pd.to_timedelta(pd.Series([ts]*10000), unit='s')
28.1 ms ± 873 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

versus using Decimal:

%timeit pd.to_timedelta(pd.Series([ts]*10000), unit='s')
44.9 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@mroeschke
Copy link
Member

Maybe an artifact of #19732 that was chalked up to floating point precision errors. We have a related warning here: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#epoch-timestamps

It would be great to have a more precise calculation here, but the performance impact of using Decimal is not too attractive.

@mroeschke mroeschke added Timedelta Timedelta data type Needs Discussion Requires discussion from core team before further action labels Oct 9, 2018
@mroeschke mroeschke added the Bug label May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Discussion Requires discussion from core team before further action Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

2 participants