Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_json returns incorrect timestamps if DatetimeIndex precision is not ns #53686

Closed
3 tasks done
ketakopter opened this issue Jun 15, 2023 · 2 comments · Fixed by #53757
Closed
3 tasks done

BUG: to_json returns incorrect timestamps if DatetimeIndex precision is not ns #53686

ketakopter opened this issue Jun 15, 2023 · 2 comments · Fixed by #53757
Labels
Bug IO JSON read_json, to_json, json_normalize Non-Nano datetime64/timedelta64 with non-nanosecond resolution

Comments

@ketakopter
Copy link

ketakopter commented Jun 15, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

d = pd.DataFrame({'testcol': pd.Series([12], index=[np.datetime64('2023-01-01T11:22:33.123456')])})
d.index = d.index.astype('datetime64[us]')

d.to_json(date_format='iso')

Issue Description

With a DataFrame in which the DatetimeIndex precision has been set to us, the json representation is incorrect.

With the usual initialization, the DatetimeIndex has precision ns, and it works fine:

>>> import pandas as pd
>>> d = pd.DataFrame({'testcol': pd.Series([12], index=[np.datetime64('2023-01-01T11:22:33.123456')])})
>>> d.to_json(date_format='iso')
'{"testcol":{"2023-01-01T11:22:33.123":12}}'

If the precision is changed (I tried it as a workaround of issue #53684 ), the date is incorrect in to_json:

>>> d.index = d.index.astype('datetime64[us]')
>>> d
                            testcol
2023-01-01 11:22:33.123456       12

>>> d.to_json(date_format='iso')
'{"testcol":{"1970-01-20T08:36:12.153":12}}'

The date_unit parameter doesn't help, the date is still incorrect.

Other precisions fail too:

>>> d.index = d.index.astype('datetime64[s]')
>>> d
                     testcol
2023-01-01 11:22:33       12

>>> d.to_json(date_format='iso')
'{"testcol":{"1970-01-01T00:00:01.672":12}}'

Expected Behavior

The output of to_json in iso format should be '{"testcol":{"2023-01-01T11:22:33.123456":12}}'

Installed Versions

INSTALLED VERSIONS

commit : 965ceca
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.12.14-195-default
Version : #1 SMP Tue May 7 10:55:11 UTC 2019 (8fba516)
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.2
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.2
Cython : None
pytest : 7.3.1
hypothesis : None
sphinx : 7.0.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.13.2
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@ketakopter ketakopter added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2023
@ketakopter
Copy link
Author

Just tested the development version. Bug still there.

INSTALLED VERSIONS ------------------ commit : 0bc16da python : 3.10.8.final.0 python-bits : 64 OS : Linux OS-release : 4.12.14-195-default Version : #1 SMP Tue May 7 10:55:11 UTC 2019 (8fba516) machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.1.0.dev0+977.g0bc16da1e5
numpy : 2.0.0.dev0+84.g828fba29e
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.14.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@lithomas1 lithomas1 added IO JSON read_json, to_json, json_normalize Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 19, 2023
@lithomas1
Copy link
Member

Thanks for the bug report.

I think I see what's going on. In the JSON C code when converting the numpy datetimes to strings, we assume that the datetime has nanosecond resolution (non-nano support in pandas is still newish).

https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/src/datetime/date_conversions.c#L48

I'll try to get at this sometime soonish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants