Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame with scalar tzaware Timestamp #44518

Merged
merged 4 commits into from
Nov 20, 2021

Conversation

jbrockmendel
Copy link
Member

@simonjayhawkins where does the whatsnew for this go?

@simonjayhawkins
Copy link
Member

@simonjayhawkins where does the whatsnew for this go?

wdyt?

sequence_to_dt64ns is called from sequence_to_datetimes which is called from maybe_infer_to_datetimelike and maybe_cast_to_datetime in pandas/core/dtypes/cast.py so this change is not narrowly targeted to the reported issue or reverting to previous behavior.

I have no doubt that this is the correct long term fix, option 2 in #42505 (comment) but not sure if other interactions would expose other latent bugs.

This is fine for 1.4 IMO, but would a more narrowly targeted fix for 1.3.5 be safer, option 3 in #42505 (comment)

We must have little testing of scalars passed to DatetimeArray._from_sequence. Is there a risk that fixing this properly here change any existing behavior?

@simonjayhawkins simonjayhawkins added Constructors Series/DataFrame/Index/pd.array Constructors Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype labels Nov 19, 2021
@jbrockmendel
Copy link
Member Author

wdyt?

I guess for now put it in the 1.4 file and discuss whether to backport?

We must have little testing of scalars passed to DatetimeArray._from_sequence. Is there a risk that fixing this properly here change any existing behavior?

Only incorrect behavior AFAICT.

but would a more narrowly targeted fix for 1.3.5 be safer, option 3 in #42505 (comment)

I'm not clear on how we can get more targeted than this.

@jreback jreback added this to the 1.3.5 milestone Nov 20, 2021
@jreback
Copy link
Contributor

jreback commented Nov 20, 2021

this is fine, can you add a whatsnew for 1.3.5

@jreback jreback merged commit a327ad1 into pandas-dev:master Nov 20, 2021
@lumberbot-app

This comment has been minimized.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app

This comment has been minimized.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2021

@jbrockmendel if you can push the backport :-<>

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Nov 20, 2021
jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Nov 20, 2021
@jbrockmendel jbrockmendel deleted the bug-42505 branch November 20, 2021 23:15
@simonjayhawkins
Copy link
Member

but would a more narrowly targeted fix for 1.3.5 be safer, option 3 in #42505 (comment)

I'm not clear on how we can get more targeted than this.

I was thinking special case in construct_1d_arraylike_from_scalar. The return type from infer_dtype_from_scalar is now a Timestamp whereas it was a int in pandas 1.2.5

So we then have the block... if isinstance(dtype, ExtensionDtype) where we pass the value onto DataTimeArray._from_sequence, so my thinking was to pass value.value if is instance of Timestamp to restore the 1.2.5 behavior explicitly without touching DataTimeArray._from_sequence.

so I was thinking something like...

diff --git a/pandas/core/dtypes/cast.py b/pandas/core/dtypes/cast.py
index 6d5162f3fe..a02e8e085c 100644
--- a/pandas/core/dtypes/cast.py
+++ b/pandas/core/dtypes/cast.py
@@ -1900,6 +1900,8 @@ def construct_1d_arraylike_from_scalar(
 
     """
 
+    from pandas.core.arrays import DatetimeArray
+
     if dtype is None:
         try:
             dtype, value = infer_dtype_from_scalar(value, pandas_dtype=True)
@@ -1908,6 +1910,8 @@ def construct_1d_arraylike_from_scalar(
 
     if isinstance(dtype, ExtensionDtype):
         cls = dtype.construct_array_type()
+        if issubclass(cls, DatetimeArray) and isinstance(value, Timestamp):
+            value = value.value
         subarr = cls._from_sequence([value] * length, dtype=dtype)
 
     else:

with an appropriate code comment about the hack.

The non-extension array block seems to include some special handling for dtype.kind in ["M", "m"] to unbox the values so this location for a fix, unboxing the values from a TimeStamp is maybe not so strange after all?

We must have little testing of scalars passed to DatetimeArray._from_sequence. Is there a risk that fixing this properly here change any existing behavior?

Only incorrect behavior AFAICT.

yes. any change in behavior from doing the proper fix for 1.3.5 can probably be put under the "bug fix" umbrella, so maybe this is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Regression - AmbiguousTimeError creating DataFrame
3 participants