Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow Arrow types directly with pd.to_datetime #58220

Open
3 tasks done
WillAyd opened this issue Apr 11, 2024 · 6 comments
Open
3 tasks done

ENH: Allow Arrow types directly with pd.to_datetime #58220

WillAyd opened this issue Apr 11, 2024 · 6 comments
Labels
Arrow pyarrow functionality datetime.date stdlib datetime.date support Enhancement

Comments

@WillAyd
Copy link
Member

WillAyd commented Apr 11, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When users just want dates, the first thing they might try is:

ser = pd.Series(["2024-01-01", "2024-01-02"])
pd.to_datetime(ser).dt.date

But that unfortunately returns dtype=object

An arguably better approach would be something like

pd.to_datetime(ser).astype(pd.ArrowDtype(pa.date32()))

But has the disadvantage of taking an extra step to get the desired type

Feature Description

Should we add an arrow backend/family argument to pd.to_datetime? Alternately maybe we need to introduce a new pd.to_date function? @jbrockmendel curious what you might think

Alternative Solutions

n/a

Additional Context

No response

@WillAyd WillAyd added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member datetime.date stdlib datetime.date support and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 11, 2024
@jbrockmendel
Copy link
Member

I'm very skeptical of making pd.to_datetime more complicated (both as an API and the implementation).

pd.to_datetime(ser).dt.to_period("D") is effectively a date dtype.

Alternately maybe we need to introduce a new pd.to_date

Side-note: I've been kicking around the idea of a pd.to.foo namespace to collect all of the to_foo functions, since the top-level namespace is pretty big. ATM to_offset and to_time are buried and could be included.

@mroeschke
Copy link
Member

mroeschke commented Apr 11, 2024

to_numeric had a dtype_backend keyword added in 2.0 IIRC. I wouldn't be opposed to adding that keyword to to_datetime for symmetry but return pa.timestamp types and not pa.date types.

@mroeschke mroeschke added the Arrow pyarrow functionality label Apr 11, 2024
@jbrockmendel
Copy link
Member

I wasnt aware of the keyword in to_numeric; I would have been -0.75 on that.

In to_datetime it has the added downside of complicating the return type (not just dtype). The base case to_datetime returns a DatetimeIndex. A keyword would change that to be a base class Index.

.convert_dtypes already works for the timestamp dtypes. One Obvious Way.

@WillAyd
Copy link
Member Author

WillAyd commented Apr 11, 2024

I also think dtype_backend here is a partial solution that doesn't necessarily clarify how to accomplish the task the best way.

pd.to_datetime(ser).dt.to_period("D") is effectively a date dtype.

That's in interesting idea but I think would be really tough to roundtrip and use effectively with our I/O

AFAIU the only way to specifically get a non-object date in this case is to ser.astype(pd.ArrowDtype(pa.date32())) - is that correct?

@mroeschke
Copy link
Member

AFAIU the only way to specifically get a non-object date in this case is to ser.astype(pd.ArrowDtype(pa.date32())) - is that correct?

Correct

@mroeschke
Copy link
Member

.convert_dtypes already works for the timestamp dtypes. One Obvious Way.

This is a good point. to_numeric with dtype_backend="pyarrow" essentially astypes from nullable numpy types to arrow types which is covered by convert_dtypes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality datetime.date stdlib datetime.date support Enhancement
Projects
None yet
Development

No branches or pull requests

3 participants