Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: tz-aware datetime with column-wise comparisions failing with np.minmum/maximum #15552

Closed
adbull opened this issue Mar 2, 2017 · 5 comments

Comments

@adbull
Copy link
Contributor

commented Mar 2, 2017

Code Sample, a copy-pastable example if possible

>>> import numpy as np
>>> import pandas as pd

>>> naive = pd.to_datetime(['now'])
>>> utc = naive.tz_localize('UTC')

>>> np.minimum(naive, naive)
>>> np.minimum(utc, utc)
>>> np.minimum(pd.Series(naive), pd.Series(naive))
>>> np.minimum(pd.Series(utc), pd.Series(utc))

  File "bug.py", line 10, in <module>
    np.minimum(pd.Series(utc), pd.Series(utc))
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 498, in __array_prepare__
    op=context[0].__name__))
TypeError: Series with dtype datetime64[ns, UTC] cannot perform the numpy op minimum

Problem description

When a tz-aware datetime is placed in a Series, the numpy operations fmin/fmax/minimum/maximum throw an error, even though these operations work fine on a tz-aware DatetimeIndex.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.8-100.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.8.0
xarray: 0.9.1
IPython: 4.2.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2017

this is related to this: #15553

but that said, there is only so much that can be done when passing things directly to numpy arrays like this (its actually not the passing, but returning, some numpy functions are friendly and some are not).

I suppose this could be made to work, below is a much more idiomatic way to do this. See the 2nd part for the actual issue.

naive

In [61]: s = Series(pd.date_range('20130101', periods=4))

In [62]: s2 = s[::-1]

In [63]: df = DataFrame({'A':s, 'B':s2.values})

In [64]: df
Out[64]: 
           A          B
0 2013-01-01 2013-01-04
1 2013-01-02 2013-01-03
2 2013-01-03 2013-01-02
3 2013-01-04 2013-01-01

In [65]: df.max(axis=1)
Out[65]: 
0   2013-01-04
1   2013-01-03
2   2013-01-03
3   2013-01-04
dtype: datetime64[ns]

doesn't raise, but incorrect results for tz-aware

In [67]: s = Series(pd.date_range('20130101', periods=4, tz='US/Eastern'))

In [68]: s2 = s[::-1]

In [69]: df = DataFrame({'A':s, 'B':s2.values})

In [70]: df
Out[70]: 
                          A                   B
0 2013-01-01 00:00:00-05:00 2013-01-04 05:00:00
1 2013-01-02 00:00:00-05:00 2013-01-03 05:00:00
2 2013-01-03 00:00:00-05:00 2013-01-02 05:00:00
3 2013-01-04 00:00:00-05:00 2013-01-01 05:00:00

In [71]: df.max(axis=1)
Out[71]: 
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

@jreback jreback added this to the Next Major Release milestone Mar 2, 2017

@jreback jreback changed the title BUG: tz-aware datetime Series throws error on fmin/fmax/minimum/maximum BUG: tz-aware datetime with column-wise comparisions failing Mar 2, 2017

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2017

and @adbull

When a tz-aware datetime is placed in a Series, the numpy operations fmin/fmax/minimum/maximum throw an error, even though these operations work fine on a tz-aware DatetimeIndex.

this is not true at all, they naively look like they are working, but because of the same issue above (numpy has no clue about timezones, and forget about missing values), these are completely wrong (they are tz shifted incorrectly)

In [78]: i = pd.DatetimeIndex(df.A)
Out[78]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
3   2013-01-04 00:00:00-05:00
Name: A, dtype: datetime64[ns, US/Eastern]

In [79]: np.maximum(i, i)
Out[79]: DatetimeIndex(['2013-01-01 05:00:00-05:00', '2013-01-02 05:00:00-05:00', '2013-01-03 05:00:00-05:00', '2013-01-04 05:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', name='A', freq='D')

@mroeschke mroeschke changed the title BUG: tz-aware datetime with column-wise comparisions failing BUG: tz-aware datetime with column-wise comparisions failing with np.minmum/maximum Jul 26, 2018

@jbrockmendel jbrockmendel added this to Reductions in DatetimeArray Refactor Nov 16, 2018

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

This work on master now. Could use a test as always.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jun 18, 2019

It looks like the np.minimum(pd.Series(utc), pd.Series(utc)) case incorrectly returns tz-naive on master

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

This looks fixed on master again:

In [24]: np.minimum(pd.Series(utc), pd.Series(utc))
Out[24]:
0   2019-07-12 05:14:37.896979+00:00
dtype: datetime64[ns, UTC]

In [25]: pd.__version__
Out[25]: '0.25.0rc0+50.g5a7a8e1de'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.