Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.to_datetime raises when given pd.NA #32213

Closed
dsaxton opened this issue Feb 24, 2020 · 5 comments · Fixed by #32214
Closed

pandas.to_datetime raises when given pd.NA #32213

dsaxton opened this issue Feb 24, 2020 · 5 comments · Fixed by #32214
Labels
Datetime Datetime data dtype NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@dsaxton
Copy link
Member

dsaxton commented Feb 24, 2020

to_datetime raises when we pass in pd.NA but it should probably convert to pd.NaT, I think (or at least this would be better than an error)?

In [1]: import pandas as pd                                                                                                                                            

In [2]: pd.to_datetime([pd.NA])                                                                                                                                        
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-1af314749017> in <module>
----> 1 pd.to_datetime([pd.NA])

~/pandas/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    755             result = _convert_and_box_cache(arg, cache_array)
    756         else:
--> 757             result = convert_listlike(arg, format)
    758     else:
    759         result = convert_listlike(np.array([arg]), format)[0]

~/pandas/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    447             errors=errors,
    448             require_iso8601=require_iso8601,
--> 449             allow_object=True,
    450         )
    451 

~/pandas/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1848             dayfirst=dayfirst,
   1849             yearfirst=yearfirst,
-> 1850             require_iso8601=require_iso8601,
   1851         )
   1852     except ValueError as e:

~/pandas/pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
    479 @cython.wraparound(False)
    480 @cython.boundscheck(False)
--> 481 cpdef array_to_datetime(ndarray[object] values, str errors='raise',
    482                         bint dayfirst=False, bint yearfirst=False,
    483                         object utc=None, bint require_iso8601=False):

~/pandas/pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
    701 
    702     except TypeError:
--> 703         return array_to_datetime_object(values, errors,
    704                                         dayfirst, yearfirst)
    705 

~/pandas/pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
    839         else:
    840             if is_raise:
--> 841                 raise
    842             return values, None
    843     return oresult, None

~/pandas/pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
    674                         iresult[i] = NPY_NAT
    675                     else:
--> 676                         raise TypeError(f"{type(val)} is not convertible to datetime")
    677 
    678             except OutOfBoundsDatetime:

TypeError: <class 'pandas._libs.missing.NAType'> is not convertible to datetime

In [3]: pd.__version__                                                                                                                                                 
Out[3]: '1.1.0.dev0+572.gaa6f241f5'

Expected Output

DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)
@jorisvandenbossche
Copy link
Member

As mentioned on the PR, I am not fully sure we should convert it to pd.NaT, or at least not until we have a better idea how we want to handle NA in datetime-like dtypes in the future.

@jorisvandenbossche jorisvandenbossche added NA - MaskedArrays Related to pd.NA and nullable extension arrays Datetime Datetime data dtype labels Feb 24, 2020
@jreback
Copy link
Contributor

jreback commented Feb 24, 2020

we should absolutely convert this; all null values correctly convert to the current missing value NaT
; changing that is a completely separate discussion

@dsaxton
Copy link
Member Author

dsaxton commented Feb 24, 2020

Part of my concern is that having errors like these could discourage people from using NA and nullable types in general. For example, suppose someone takes the reasonable steps of storing date strings along with missing values in a string dtype and then wants to convert them using pd.to_datetime. Getting an error here creates an unnecessary edge case that people then may try to avoid by not using the types. It seems you'd want to have it return the "usual" output in the short term even if the plan is to have a special NA sentinel for timestamps in the long term.

@WillAyd
Copy link
Member

WillAyd commented Feb 24, 2020

+1 to having this convert as well

@jorisvandenbossche
Copy link
Member

Note that for the conversion the other way around (from nullable dtype to a dtype that uses np.nan), we decided that we (for now) don't want to do implicit conversion of pd.NA to np.nan, since that has different behaviour.

The example case of @dsaxton is a good one though, and I agree it would be nice if that works. As long as we are fine with breaking this behaviour in the near future (to return a datetime-like with pd.NA), I am OK with converting it now.

all null values correctly convert to the current missing value NaT
; changing that is a completely separate discussion

I don't think that is a separate discussion. It is now that we have for the first time a missing value indicator that behaves differently than the others, so it is now that such questions about conversions will come up and are relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants