Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow in to_datetime when using nanoseconds #21383

Open
DieterDePaepe opened this issue Jun 8, 2018 · 2 comments
Open

Overflow in to_datetime when using nanoseconds #21383

DieterDePaepe opened this issue Jun 8, 2018 · 2 comments
Labels
Bug Error Reporting Incorrect or improved errors from pandas Timeseries

Comments

@DieterDePaepe
Copy link

DieterDePaepe commented Jun 8, 2018

Code Sample

nano_time = '1518071940360000000'
print(pd.to_datetime(pd.to_numeric(nano_time), unit="ns")) # Works
print(pd.to_datetime(nano_time, unit="ns")) # OverflowError: Python int too large to convert to C long

Problem description

It appears the to_numeric function is not suited for very long numbers in string form, which is a problem for parsing nano-second based times. The method should be smarter in processing its input.

Expected Output

Not an error.

Timestamp('2018-02-08 06:39:00.360000')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.23.0
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jun 8, 2018

the error message is actually incorrect. passing a unit requires an integer (and not a string).

this is almost a duplicate of #15836

cc @chris-b1

@DieterDePaepe if you wanted to submit a PR to fix would be great.

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Jun 8, 2018
@jreback
Copy link
Contributor

jreback commented Jun 8, 2018

note canonically you can simply do

In [9]: pd.to_datetime(int(nano_time))
Out[9]: Timestamp('2018-02-08 06:39:00.360000')

@jreback jreback added this to the Next Major Release milestone Jun 8, 2018
@mroeschke mroeschke added the Bug label Apr 1, 2020
@jbrockmendel jbrockmendel added Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels Jan 10, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Timeseries
Projects
No open projects
Development

No branches or pull requests

4 participants