Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: to_datetime with non-ISO format, float, and nan fails on main #50237

Closed
3 tasks done
MarcoGorelli opened this issue Dec 13, 2022 · 0 comments · Fixed by #50238
Closed
3 tasks done

REGR: to_datetime with non-ISO format, float, and nan fails on main #50237

MarcoGorelli opened this issue Dec 13, 2022 · 0 comments · Fixed by #50238
Labels
Bug Datetime Datetime data dtype

Comments

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Dec 13, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

ser = Series([198012, 198012] + [198101] * 5)
ser[2] = np.nan

result = to_datetime(ser, format="%Y%m")

Issue Description

This gives

File ~/pandas-dev/pandas/core/tools/datetimes.py:1066, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
   1064         result = arg.map(cache_array)
   1065     else:
-> 1066         values = convert_listlike(arg._values, format)
   1067         result = arg._constructor(values, index=arg.index, name=arg.name)
   1068 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):

File ~/pandas-dev/pandas/core/tools/datetimes.py:442, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact)
    439 require_iso8601 = format is not None and format_is_iso(format)
    441 if format is not None and not require_iso8601:
--> 442     return _to_datetime_with_format(
    443         arg,
    444         orig_arg,
    445         name,
    446         utc,
    447         format,
    448         exact,
    449         errors,
    450     )
    452 result, tz_parsed = objects_to_datetime64ns(
    453     arg,
    454     dayfirst=dayfirst,
   (...)
    461     exact=exact,
    462 )
    464 if tz_parsed is not None:
    465     # We can take a shortcut since the datetime64 numpy array
    466     # is in UTC

File ~/pandas-dev/pandas/core/tools/datetimes.py:543, in _to_datetime_with_format(arg, orig_arg, name, utc, fmt, exact, errors)
    540         return _box_as_indexlike(result, utc=utc, name=name)
    542 # fallback
--> 543 res = _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors)
    544 return res

File ~/pandas-dev/pandas/core/tools/datetimes.py:485, in _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors)
    481 """
    482 Call array_strptime, with fallback behavior depending on 'errors'.
    483 """
    484 try:
--> 485     result, timezones = array_strptime(
    486         arg, fmt, exact=exact, errors=errors, utc=utc
    487     )
    488 except OutOfBoundsDatetime:
    489     if errors == "raise":

File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:198, in pandas._libs.tslibs.strptime.array_strptime()
    196         iresult[i] = NPY_NAT
    197         continue
--> 198     raise ValueError(f"time data '{val}' does not match "
    199                      f"format '{fmt}' (match)")
    200 if len(val) != found.end():

ValueError: time data '-9223372036854775808' does not match format '%Y%m' (match)

Expected Behavior

I think I'd still expect it to fail, because 198012.0 doesn't match '%Y%m'. But

ValueError: time data '-9223372036854775808' does not match format '%Y%m' (match)

does look quite mysterious, and not expected


From git bisect, this was caused by #49361 (cc @jbrockmendel sorry for yet another ping!)

https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=113733639

Labelling as regression as it works on 1.5.2:

In [2]: ser = Series([198012, 198012] + [198101] * 5)
   ...: ser[2] = np.nan
   ...:
   ...: result = to_datetime(ser, format="%Y%m")

In [3]: result
Out[3]:
0   1980-12-01
1   1980-12-01
2          NaT
3   1981-01-01
4   1981-01-01
5   1981-01-01
6   1981-01-01
dtype: datetime64[ns]

Installed Versions

INSTALLED VERSIONS

commit : 749d59d
python : 3.8.15.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.0.dev0+915.g749d59db6e
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.61.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.12.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.45
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.12.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None
None

@MarcoGorelli MarcoGorelli added Bug Datetime Datetime data dtype labels Dec 13, 2022
@MarcoGorelli MarcoGorelli changed the title BUG: to_datetime with non-ISO format and nan fails on main REGR: to_datetime with non-ISO format, float, and nan fails on main Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant