Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: scalar assignment of a tz-aware is object dtype #19843

Closed
jreback opened this issue Feb 22, 2018 · 5 comments · Fixed by #19973
Closed

BUG: scalar assignment of a tz-aware is object dtype #19843

jreback opened this issue Feb 22, 2018 · 5 comments · Fixed by #19973
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

[3] should be a datetime64[ns, UTC]

In [1]: df = pd.DataFrame({'A': [0, 1]})

In [3]: df['now'] = pd.Timestamp('20130101', tz='UTC')

In [4]: df
Out[4]: 
   A                        now
0  0  2013-01-01 00:00:00+00:00
1  1  2013-01-01 00:00:00+00:00

In [5]: df.dtypes
Out[5]: 
A       int64
now    object
dtype: object

In [6]: df['now2'] = pd.DatetimeIndex([pd.Timestamp('20130101', tz='UTC')]).repeat(len(df))

In [7]: df.dtypes
Out[7]: 
A                     int64
now                  object
now2    datetime64[ns, UTC]
dtype: object
@jreback jreback added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype Difficulty Intermediate labels Feb 22, 2018
@jreback jreback added this to the 0.23.0 milestone Feb 22, 2018
@DylanDmitri
Copy link
Contributor

I will try and fix this.

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2018

great!

@DylanDmitri
Copy link
Contributor

DylanDmitri commented Feb 23, 2018

Currently, infer_dtype_from_scalar (on datetimey/timestampy objects) returns a np.datetime64 if no timezone is given, and defaults to np.object_ on objects with timezones. Fixing this problem means returning something else, rather than np.object_.

Ideally return DatetimeTZDtypeType. However, this crashes on np.empty(shape, dtype=dtype) in cast_scalar_to_array. Seems like this should work, but it doesn't.

Quick fix is returning np.datetime64 rather than np.object_. You lose the timezone name, but numpy applies the correct offset before saving so the numbers are correct. This change doesn't break any tests, and results in the following behavior:

In [1]: df = pd.DataFrame({'A': [0, 1]})

In [3]: df['now'] = pd.Timestamp('20130101', tz='UTC')

In [5]: df.dtypes
Out[5]: 
A               int64
now    datetime64[ns]
dtype: object

In [6]: df['now2'] = pd.DatetimeIndex([pd.Timestamp('20130101', tz='UTC')]).repeat(len(df))

In [7]: df.dtypes
Out[7]: 
A                     int64
now          datetime64[ns]
now2    datetime64[ns, UTC]
dtype: object

Raises some inconsistencies, potentially problems with mixing in timezone-naive datetimes. Is the quick fix good enough?

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2018

@DylanDmitri you don't want to ever have numpy deal with timezones, they are completely wrong. infer_dtype_from_scalar has a pandas_dtype parameter that will make this work. We should actully just change this to do this by default (though this might break other things)

@DylanDmitri
Copy link
Contributor

DylanDmitri commented Mar 2, 2018

Been busy the last week, sorry. Here's the problem code (from line 2874 of frame.py)

# BEFORE
value = cast_scalar_to_array(len(self.index), value)
value = maybe_cast_to_datetime(value, value.dtype)

Main issue: cast_scalar_to_array defaults to dtype np.object_, which is then ignored by maybe_cast_to_datetime. Want to capture the real pandas dtype, and then pass that into maybe_cast_to_datetime, which then works properly.

# AFTER
from pandas.core.dtypes.cast import infer_dtype_from_scalar
pandas_dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)

value = cast_scalar_to_array(len(self.index), value)
value = maybe_cast_to_datetime(value, pandas_dtype)

This fixes the problem. Will check tests, and have a PR soon.

DylanDmitri added a commit to DylanDmitri/pandas that referenced this issue Mar 2, 2018
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.23.4, 0.24.0 Aug 2, 2018
minggli added a commit to minggli/pandas that referenced this issue Aug 5, 2018
* master: (47 commits)
  Run tests in conda build [ci skip] (pandas-dev#22190)
  TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165)
  CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184)
  Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179)
  DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175)
  DOC: updated Series.str.contains see also section (pandas-dev#22176)
  0.23.4 whatsnew (pandas-dev#22177)
  fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973)
  BUG: Fix get dummies unicode error (pandas-dev#22131)
  Fixed py36-only syntax [ci skip] (pandas-dev#22167)
  DEPR: pd.read_table (pandas-dev#21954)
  DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119)
  BUG: Matplotlib scatter datetime (pandas-dev#22039)
  CLN: Use public method to capture UTC offsets (pandas-dev#22164)
  implement tslibs/src to make tslibs self-contained (pandas-dev#22152)
  Fix categorical from codes nan 21767 (pandas-dev#21775)
  BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125)
  use memoryviews instead of ndarrays (pandas-dev#22147)
  Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155)
  API: Default to_* methods to compression='infer' (pandas-dev#22011)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants