Call unique() on a timezone aware datetime series returns non timezone aware result #13565

Closed
paulgueltekin opened this Issue Jul 5, 2016 · 11 comments

Comments

Projects
None yet
4 participants

paulgueltekin commented Jul 5, 2016 edited

Call unique() on a timezone aware datetime series returns non timezone aware result.

Code Sample

import pandas as pd
import pytz
import datetime

In [242]: ts = pd.Series([datetime.datetime(2011,2,11,20,0,0,0,pytz.utc), datetime.datetime(2011,2,11,20,0,0,0,pytz.utc), datetime.datetime(2011,2,11,21,0,0,0,pytz.utc)])

In [243]: ts
Out[243]:
0 2011-02-11 20:00:00+00:00
1 2011-02-11 20:00:00+00:00
2 2011-02-11 21:00:00+00:00
dtype: datetime64[ns, UTC]

In [244]: ts.unique()
Out[244]: array(['2011-02-11T20:00:00.000000000', '2011-02-11T21:00:00.000000000'], dtype='datetime64[ns]')

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_AT.UTF-8

pandas: 0.18.1
nose: 1.3.4
pip: 8.1.2
setuptools: 22.0.5
Cython: 0.21.1
numpy: 1.11.0
scipy: 0.14.0
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.2.3
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: 2.3.5
xlrd: 0.9.2
xlwt: 0.7.4
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: 1.0b3
httplib2: 0.9
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
jinja2: 2.7.3
boto: None
pandas_datareader: None

Contributor

jreback commented Jul 5, 2016

This is correct and as expected, you get a UTC numpy array back, and numpy displays things in your local timezone.

When this eventually returns an Index, see #13395 then the timezone can be properly attached.

In [6]: ts.unique()
Out[6]: array(['2011-02-11T20:00:00.000000000', '2011-02-11T21:00:00.000000000'], dtype='datetime64[ns]')

In [7]: Index(ts.unique())
Out[7]: DatetimeIndex(['2011-02-11 20:00:00', '2011-02-11 21:00:00'], dtype='datetime64[ns]', freq=None)

jreback closed this Jul 5, 2016

jorisvandenbossche added this to the No action milestone Jul 5, 2016

Yes the output is allways in UTC even if the input dates are from a different timezone. This isnt the issue. But as an pandas user i would expect to get timezone aware datetimes ( with the UTC timezone info ) if i run unique() on timezone aware datetimes, which is i think the intuitive thought about it.

Anyway, if this is the expected behavior, it should be documented.

Contributor

jreback commented Jul 5, 2016

and I pointed you to the other issue.

@paulgueltekin The problem is that unique returns a numpy array, and numpy does not support timezone aware datetimes.

But indeed, this could be documented somewhere (in the docstring? and in the tutorial docs on date functionality)

@jorisvandenbossche jorisvandenbossche modified the milestone: 0.18.2, No action Jul 5, 2016

@paulgueltekin Do you want to add a note to the unique docstring?

@jorisvandenbossche Yes i can do that

sinhrks referenced this issue Aug 13, 2016

Merged

API: change unique to return Index #13979

4 of 4 tasks complete
Member

sinhrks commented Aug 13, 2016

After #13979:

idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], tz='Asia/Tokyo')
idx.unique()
# DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00',
#                '2011-01-03 00:00:00+09:00'],
#               dtype='datetime64[ns, Asia/Tokyo]', freq=None)

pd.Series(idx).unique()
# DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00',
#                '2011-01-03 00:00:00+09:00'],
#               dtype='datetime64[ns, Asia/Tokyo]', freq=None)

lmk if there is anything should be added to docstring.

Given the discussion in #13395, options for this issue are:

  • document the current situation (looses the tz information, as a datetime64 array is returned)
  • change to object array of tz aware Timestamp objects

Personally not really a strong preference for one of both.

Contributor

jreback commented Aug 19, 2016 edited

comment here; should return an object array of tz-aware Timestamps.

@paulgueltekin paulgueltekin added a commit to paulgueltekin/pandas that referenced this issue Aug 19, 2016

@paulgueltekin paulgueltekin unique docstring extend #13565 unique datetime tz issue 8a06e2c

Keep also in mind that changing this would maybe break some existing code.

Contributor

jreback commented Aug 19, 2016

@paulgueltekin that's why this is an API change. Further this will break loudly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment