Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call unique() on a timezone aware datetime series returns non timezone aware result #13565

Closed
paulgueltekin opened this issue Jul 5, 2016 · 11 comments · Fixed by #13979
Closed
Labels
Milestone

Comments

@paulgueltekin
Copy link

paulgueltekin commented Jul 5, 2016

Call unique() on a timezone aware datetime series returns non timezone aware result.

Code Sample

import pandas as pd
import pytz
import datetime

In [242]: ts = pd.Series([datetime.datetime(2011,2,11,20,0,0,0,pytz.utc), datetime.datetime(2011,2,11,20,0,0,0,pytz.utc), datetime.datetime(2011,2,11,21,0,0,0,pytz.utc)])

In [243]: ts
Out[243]:
0 2011-02-11 20:00:00+00:00
1 2011-02-11 20:00:00+00:00
2 2011-02-11 21:00:00+00:00
dtype: datetime64[ns, UTC]

In [244]: ts.unique()
Out[244]: array(['2011-02-11T20:00:00.000000000', '2011-02-11T21:00:00.000000000'], dtype='datetime64[ns]')

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_AT.UTF-8

pandas: 0.18.1
nose: 1.3.4
pip: 8.1.2
setuptools: 22.0.5
Cython: 0.21.1
numpy: 1.11.0
scipy: 0.14.0
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.2.3
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: 2.3.5
xlrd: 0.9.2
xlwt: 0.7.4
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: 1.0b3
httplib2: 0.9
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
jinja2: 2.7.3
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 5, 2016

This is correct and as expected, you get a UTC numpy array back, and numpy displays things in your local timezone.

When this eventually returns an Index, see #13395 then the timezone can be properly attached.

In [6]: ts.unique()
Out[6]: array(['2011-02-11T20:00:00.000000000', '2011-02-11T21:00:00.000000000'], dtype='datetime64[ns]')

In [7]: Index(ts.unique())
Out[7]: DatetimeIndex(['2011-02-11 20:00:00', '2011-02-11 21:00:00'], dtype='datetime64[ns]', freq=None)

@jreback jreback closed this as completed Jul 5, 2016
@jreback jreback added Usage Question Timezones Timezone data dtype labels Jul 5, 2016
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jul 5, 2016
@paulgueltekin
Copy link
Author

Yes the output is allways in UTC even if the input dates are from a different timezone. This isnt the issue. But as an pandas user i would expect to get timezone aware datetimes ( with the UTC timezone info ) if i run unique() on timezone aware datetimes, which is i think the intuitive thought about it.

Anyway, if this is the expected behavior, it should be documented.

@jreback
Copy link
Contributor

jreback commented Jul 5, 2016

and I pointed you to the other issue.

@jorisvandenbossche
Copy link
Member

@paulgueltekin The problem is that unique returns a numpy array, and numpy does not support timezone aware datetimes.

But indeed, this could be documented somewhere (in the docstring? and in the tutorial docs on date functionality)

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.18.2, No action Jul 5, 2016
@jorisvandenbossche
Copy link
Member

@paulgueltekin Do you want to add a note to the unique docstring?

@paulgueltekin
Copy link
Author

@jorisvandenbossche Yes i can do that

@sinhrks
Copy link
Member

sinhrks commented Aug 13, 2016

After #13979:

idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], tz='Asia/Tokyo')
idx.unique()
# DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00',
#                '2011-01-03 00:00:00+09:00'],
#               dtype='datetime64[ns, Asia/Tokyo]', freq=None)

pd.Series(idx).unique()
# DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00',
#                '2011-01-03 00:00:00+09:00'],
#               dtype='datetime64[ns, Asia/Tokyo]', freq=None)

lmk if there is anything should be added to docstring.

@jorisvandenbossche
Copy link
Member

Given the discussion in #13395, options for this issue are:

  • document the current situation (looses the tz information, as a datetime64 array is returned)
  • change to object array of tz aware Timestamp objects

Personally not really a strong preference for one of both.

@jreback
Copy link
Contributor

jreback commented Aug 19, 2016

comment here; should return an object array of tz-aware Timestamps.

@paulgueltekin
Copy link
Author

Keep also in mind that changing this would maybe break some existing code.

@jreback
Copy link
Contributor

jreback commented Aug 19, 2016

@paulgueltekin that's why this is an API change. Further this will break loudly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants