Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_datetime does not ignore the error when there is NaN before wrong datetime when format is %Y%m%d #25512

Closed
ligyxy opened this issue Mar 1, 2019 · 2 comments

Comments

@ligyxy
Copy link

commented Mar 1, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd

# Doesn't work
pd.to_datetime(pd.Series([pd.np.nan, '19750501', '19820001', '19770501']), format='%Y%m%d', errors='coerce')
pd.to_datetime(pd.Series(['19750501', pd.np.nan, '19820001', '19770501']), format='%Y%m%d', errors='coerce')

# Works
pd.to_datetime(pd.Series(['19750501', '19820001', '19770501']), format='%Y%m%d', errors='coerce')

# Works
pd.to_datetime(pd.Series(['19750501', '19820001', pd.np.nan, '19770501']), format='%Y%m%d', errors='coerce')

Problem description

with errors='ignore' or 'coerce', pandas should be able to ignore the wrong datetime '19820001' in it. However, if there is NaN before the wrong datetime, pandas returns error "OverflowError: signed integer is less than minimum"

It only happens when format %Y%m%d

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.1
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.2.17
pymysql: 0.9.3
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@ligyxy ligyxy changed the title to_datetime does not ignore the error when there is NaN before wrong datetime to_datetime does not ignore the error when there is NaN before wrong datetime when format is %Y%m%d Mar 2, 2019

@coderop2

This comment has been minimized.

Copy link

commented Mar 4, 2019

I checked the functioning of code for both the scenarios and it is working fine in datetimes.py file but in the scenarios when nan is passed the arguments is passed to cython file datetime...maybe the error is being raised from the cython module.
Any idea how can i solve that ?

nathalier added a commit to nathalier/pandas that referenced this issue May 29, 2019

BUG: ignore errors for invalid dates in to_datetime with coerce (pand…
…as-dev#25512)

parsing.try_parse_year_month_day() in _attempt_YYYYMMDD() throws not only ValueError but also OverFlowError for incorrect dates. So handling of this error was added.
@nathalier

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

While writing tests for the issue I found a couple of issues when errors='ignore'.

  1. when there is an incorrect value in a series, conversion does not happen for all values not only incorrect one. I.e.,
    pd.to_datetime(pd.Series(['19750501', '19820001', '19770501']), format='%Y%m%d', errors='ignore')
    returns just the initial values.
    Not sure it's intended..
  2. Exception "TypeError: 'int' object is unsliceable" is thrown by array_strptime() for some incorrect integer values
    pd.to_datetime(pd.Series([19801222, 20011301, 19991222]), format='%Y%m%d', errors='ignore')
    returns error

nathalier added a commit to nathalier/pandas that referenced this issue May 29, 2019

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 29, 2019

nathalier added a commit to nathalier/pandas that referenced this issue May 29, 2019

nathalier added a commit to nathalier/pandas that referenced this issue May 30, 2019

jreback added a commit that referenced this issue Jun 1, 2019

vaibhavhrt added a commit to vaibhavhrt/pandas that referenced this issue Jun 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.