Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timezones are lost in pivot_table #14948

Closed
gte620v opened this issue Dec 21, 2016 · 6 comments

Comments

@gte620v
Copy link
Contributor

commented Dec 21, 2016

Code Sample, a copy-pastable example if possible

# table to pivot
df = pd.DataFrame([{'uid': u'aa',
  'ts': pd.Timestamp('2016-08-12 13:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-12 08:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-12 14:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-25 11:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-25 13:00:00-0700', tz='US/Pacific')}])

# not sure why I need this, but it doesn't work without it....
df = df.set_index('ts').reset_index()

# Series of day timestamps we want to pivot over
mins = df.ts.map(lambda x: x.replace(hour=0, minute=0,
                                                    second=0, microsecond=0))

# pivot
pd.pivot_table(df.set_index('ts').reset_index(),
               values='ts', index=['uid'], columns=[mins], aggfunc=np.min) 
# prints in UTC time and discards timezones

# do the aggregation portion of the pivot
df.groupby(mins).agg(np.min) # works as expected where timezones are maintained

image

Problem description

pivot_table strips timezone information from the aggregated values.

I traced back in the code and it seems that the culprit is unstack (

table = agged.unstack(to_unstack)
).

Specifically, either of these numpy operations will strip the tz:

values = np.array(values)
values = values[:, np.newaxis]

Expected Output

Would expect aggregated values to maintain timezone like they do with groupby.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.11-23.53.amzn1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 32.1.0.post20161217
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

hmm, yes that looks like some older code paths. Need to use is_datetime64tz_dtype in there to handle slightly differently.

want to do a PR?

@gte620v

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2016

Sure; but it might be a few weeks. If anyone else wants to tackle sooner, go for it.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

@gte620v no problem.

@jreback jreback changed the title Timezones are lost in `pivot_table` BUG: Timezones are lost in pivot_table` Dec 22, 2016

@jreback jreback changed the title BUG: Timezones are lost in pivot_table` BUG: Timezones are lost in pivot_table Dec 22, 2016

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2017

this looks fixed in master if someone wants to add a validation test.

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Nov 29, 2017

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.23.0, Next Major Release Mar 29, 2018

@mroeschke

This comment has been minimized.

Copy link
Member

commented Jun 29, 2018

This still looks like an outstanding issue:

In [26]: pd.pivot_table(df.set_index('ts').reset_index(),
    ...:                values='ts', index=['uid'], columns=[mins], aggfunc=min)
    ...:
    ...:
Out[26]:
ts  2016-08-12 00:00:00-07:00 2016-08-25 00:00:00-07:00
uid
aa        2016-08-12 15:00:00       2016-08-25 18:00:00 <---these values lost tz information
@mroeschke

This comment has been minimized.

Copy link
Member

commented Jul 28, 2018

Once #19420 is solved, this should be fixed as well. The main cause of the conversion to UTC is an .unstack() call as described in the linked issue.

@jreback jreback added this to the 0.25.0 milestone Jul 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.