New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rounding error in Timestamp.floor() #19206

Closed
frexvahi opened this Issue Jan 12, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@frexvahi
Contributor

frexvahi commented Jan 12, 2018

Code Sample, a copy-pastable example if possible

>>> pd.Timestamp('2117-01-01 00:00:45').floor('15s')
Timestamp('2117-01-01 00:00:30')

>>> pd.Timestamp('1823-01-01 00:00:01').floor('1s')
Timestamp('1823-01-01 00:00:00')

Problem description

For some timestamps more than a hundred years or so in the past or future, Timestamp.floor() rounds an already-round timestamp down to the next round value. I came up against the problem because I was assuming that Timestamp.floor() is idempotent.

I've looked at #14572 and #14440, these have both been closed as fixed but this bug is still present.

Expected Output

>>> pd.Timestamp('2117-01-01 00:00:45').floor('15s')
Timestamp('2117-01-01 00:00:45')

>>> pd.Timestamp('1823-01-01 00:00:01').floor('1s')
Timestamp('1823-01-01 00:00:01')

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.22.0
pytest: 3.3.0
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None


</details>
@jreback

This comment has been minimized.

Contributor

jreback commented Jan 12, 2018

hmm, must be an error in the offset computation we are essentially doing

In [1]: from pandas.tseries.frequencies import to_offset

In [2]: divisor = to_offset('1s').nanos

In [4]: pd.Timestamp(np.floor(divisor * pd.Timestamp('1823-01-01 00:00:01').value / divisor))
Out[4]: Timestamp('1823-01-01 00:00:00.999999488')

We have some compensation for nanos freq, but I think that needs to be bigger as we are losing precision

@jreback jreback added this to the Next Major Release milestone Jan 12, 2018

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 12, 2018

PR welcome!

@cbertinato

This comment has been minimized.

Contributor

cbertinato commented Jan 15, 2018

Definitely a round-off issue with large numbers. Some curious behavior of true division in Python 3. I would have thought that int / float(int) would be equivalent to int / int. It is when numbers are not large, but in this case:

>>> value = pd.Timestamp('1823-01-01 00:00:01').value
>>> value
-4638902399000000000
>>> unit = to_offset('1s')
>>> unit
1000000000
>>> value / unit
-4638902399.0
>>> value / float(unit)
-4638902399.000001

Changing the division in the offset calculation from int / float(int) to int / int solves the problem, but raises issues for Python 2, which __future__ division did appear to fix, even though it works in the terminal.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 15, 2018

@jreback jreback changed the title from Rounding error in `Timestamp.floor()` to Rounding error in Timestamp.floor() Jan 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment