Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix TypeError for invalid integer dates %Y%m%d with errors='ignore' #26585

Merged

Conversation

nathalier
Copy link
Contributor

@nathalier nathalier commented May 31, 2019

array_strptime returned TypeError when trying to slice 'too long' integer for the given format %Y%m%d (for example 2121010101).
After parsing date in the first 8 symbols it tried to return the remaining symbols in ValueError message as a slice of integer which in turn caused TypeError.
Converted to string value is now used to make slice for that ValueError message.
In case of 20209911, it tried to parse 20209911 to datetime(2020, 9, 9) and had 11 left unparsed.

@pep8speaks
Copy link

pep8speaks commented May 31, 2019

Hello @nathalier! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-01 20:03:56 UTC

@simonjayhawkins simonjayhawkins added Bug Datetime Datetime data dtype labels May 31, 2019
@codecov
Copy link

codecov bot commented May 31, 2019

Codecov Report

Merging #26585 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26585      +/-   ##
==========================================
- Coverage   91.84%   91.83%   -0.01%     
==========================================
  Files         174      174              
  Lines       50644    50644              
==========================================
- Hits        46516    46511       -5     
- Misses       4128     4133       +5
Flag Coverage Δ
#multiple 90.37% <ø> (ø) ⬆️
#single 41.69% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.81% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c54dd2...c53b284. Read the comment docs.

@codecov
Copy link

codecov bot commented May 31, 2019

Codecov Report

Merging #26585 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26585      +/-   ##
==========================================
- Coverage   91.84%   91.83%   -0.01%     
==========================================
  Files         174      174              
  Lines       50644    50644              
==========================================
- Hits        46515    46511       -4     
- Misses       4129     4133       +4
Flag Coverage Δ
#multiple 90.37% <ø> (ø) ⬆️
#single 41.69% <ø> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6a7cc1...458b5cb. Read the comment docs.

@nathalier nathalier force-pushed the invalid-int-date-todatetime branch 2 times, most recently from 38a65a6 to 1e97e8b Compare May 31, 2019 11:15
@nathalier nathalier changed the title BUG: fix IndexError for invalid integer dates %Y%m%d with errors='ign… BUG: fix TypeError for invalid integer dates %Y%m%d with errors='ign… May 31, 2019
@@ -140,13 +140,13 @@ def array_strptime(object[:] values, object fmt,
iresult[i] = NPY_NAT
continue
raise ValueError("time data %r does not match "
"format %r (match)" % (values[i], fmt))
"format %r (match)" % (val, fmt))
Copy link
Member

@mroeschke mroeschke May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind converting these to use format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tried to do it but a couple of tests failed. I don't know the reason yet.

@mroeschke
Copy link
Member

Minor comment, needs a whatsnew, but generally LGTM.

@nathalier nathalier force-pushed the invalid-int-date-todatetime branch from 480df1b to 458b5cb Compare June 1, 2019 01:59
@@ -427,6 +427,7 @@ Datetimelike
- Bug in :class:`Series` and :class:`DataFrame` repr where ``np.datetime64('NaT')`` and ``np.timedelta64('NaT')`` with ``dtype=object`` would be represented as ``NaN`` (:issue:`25445`)
- Bug in :func:`to_datetime` which does not replace the invalid argument with ``NaT`` when error is set to coerce (:issue:`26122`)
- Bug in adding :class:`DateOffset` with nonzero month to :class:`DatetimeIndex` would raise ``ValueError`` (:issue:`26258`)
- Bug in :func:`to_datetime` which raises a ``TypeError`` when called with a too long for a given format integer date with ``errors='ignore'``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you say this is for %Y%m%d format

also not clear about 'too long', maybe length of the integer is != 8

[20129930, 20129930],
[20120011, 20120011],
[2012010101, 2012010101]])
def test_int_to_datetime_format_YYYYMMDD_typeerror(self, int_date,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try with a too short as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jreback,
Thanks for feedback.
I reworded whatsnew message and added a couple of short dates.

Also I found result can be unpredictable for some short dates (either in integer or string format) with length > 4.

pd.to_datetime(12912, format="%Y%m%d", errors='ignore')
Out[4]: 12912
pd.to_datetime(21213, format="%Y%m%d", errors='ignore')
Out[5]: datetime.datetime(2, 12, 13, 0, 0)

Nevertheless it's not connected to this TypeError.
I'll add these examples to corresponding issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is weird, these should for sure fail, but maybe going down an odd path. ok can address in another issue.

Copy link
Contributor Author

@nathalier nathalier Jun 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I added it to #14487. It's all about how format '%Y%m%d is treated.

@nathalier nathalier force-pushed the invalid-int-date-todatetime branch from 458b5cb to d93d835 Compare June 1, 2019 19:51
…re' (# GH 26583)

array_strptime returned TypeError when trying to slice 'too long' integer for the given format %Y%m%d (for example 2121010101).
After parsing date in the first 8 symbols it tried to return the remaining symbols in ValueError message as a slice of integer which in turn caused TypeError.
Converted to string value is now used to make slice for that ValueError message.
In case of 20209911, it tried to parse 20209911 to datetime(2020, 9, 9) and had 11 unparsed.
@nathalier nathalier force-pushed the invalid-int-date-todatetime branch from d93d835 to 11eddeb Compare June 1, 2019 20:03
@jreback jreback added this to the 0.25.0 milestone Jun 2, 2019
@jreback
Copy link
Contributor

jreback commented Jun 2, 2019

@nathalier you are adding the short digits to an issue? (not this one that is being closed, correct)? if so we can merge this.

@jorisvandenbossche jorisvandenbossche changed the title BUG: fix TypeError for invalid integer dates %Y%m%d with errors='ign… BUG: fix TypeError for invalid integer dates %Y%m%d with errors='ignore' Jun 5, 2019
@jorisvandenbossche jorisvandenbossche merged commit 1d7ad5f into pandas-dev:master Jun 5, 2019
@jorisvandenbossche
Copy link
Member

Thanks @nathalier !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_datetime returns TypeError for invalid integer dates with %Y%m%d format
6 participants