Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timezones are lost in pivot_table #14948

Closed
gte620v opened this issue Dec 21, 2016 · 6 comments · Fixed by #27144
Closed

BUG: Timezones are lost in pivot_table #14948

gte620v opened this issue Dec 21, 2016 · 6 comments · Fixed by #27144
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@gte620v
Copy link
Contributor

gte620v commented Dec 21, 2016

Code Sample, a copy-pastable example if possible

# table to pivot
df = pd.DataFrame([{'uid': u'aa',
  'ts': pd.Timestamp('2016-08-12 13:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-12 08:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-12 14:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-25 11:00:00-0700', tz='US/Pacific')},
 {'uid': u'aa',
  'ts': pd.Timestamp('2016-08-25 13:00:00-0700', tz='US/Pacific')}])

# not sure why I need this, but it doesn't work without it....
df = df.set_index('ts').reset_index()

# Series of day timestamps we want to pivot over
mins = df.ts.map(lambda x: x.replace(hour=0, minute=0,
                                                    second=0, microsecond=0))

# pivot
pd.pivot_table(df.set_index('ts').reset_index(),
               values='ts', index=['uid'], columns=[mins], aggfunc=np.min) 
# prints in UTC time and discards timezones

# do the aggregation portion of the pivot
df.groupby(mins).agg(np.min) # works as expected where timezones are maintained

image

Problem description

pivot_table strips timezone information from the aggregated values.

I traced back in the code and it seems that the culprit is unstack (

table = agged.unstack(to_unstack)
).

Specifically, either of these numpy operations will strip the tz:

values = np.array(values)
values = values[:, np.newaxis]

Expected Output

Would expect aggregated values to maintain timezone like they do with groupby.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.11-23.53.amzn1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 32.1.0.post20161217
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Dec 21, 2016

hmm, yes that looks like some older code paths. Need to use is_datetime64tz_dtype in there to handle slightly differently.

want to do a PR?

@jreback jreback added Bug Difficulty Intermediate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Dec 21, 2016
@jreback jreback added this to the 0.20.0 milestone Dec 21, 2016
@gte620v
Copy link
Contributor Author

gte620v commented Dec 21, 2016

Sure; but it might be a few weeks. If anyone else wants to tackle sooner, go for it.

@jreback
Copy link
Contributor

jreback commented Dec 21, 2016

@gte620v no problem.

@jreback jreback changed the title Timezones are lost in pivot_table BUG: Timezones are lost in pivot_table` Dec 22, 2016
@jreback jreback changed the title BUG: Timezones are lost in pivot_table` BUG: Timezones are lost in pivot_table Dec 22, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
@jreback
Copy link
Contributor

jreback commented Nov 29, 2017

this looks fixed in master if someone wants to add a validation test.

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Nov 29, 2017
@jorisvandenbossche jorisvandenbossche added Testing pandas testing functions or related to the test suite good first issue and removed Difficulty Intermediate labels Nov 30, 2017
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.23.0, Next Major Release Mar 29, 2018
@mroeschke
Copy link
Member

This still looks like an outstanding issue:

In [26]: pd.pivot_table(df.set_index('ts').reset_index(),
    ...:                values='ts', index=['uid'], columns=[mins], aggfunc=min)
    ...:
    ...:
Out[26]:
ts  2016-08-12 00:00:00-07:00 2016-08-25 00:00:00-07:00
uid
aa        2016-08-12 15:00:00       2016-08-25 18:00:00 <---these values lost tz information

@mroeschke mroeschke removed Testing pandas testing functions or related to the test suite good first issue labels Jun 29, 2018
@mroeschke
Copy link
Member

Once #19420 is solved, this should be fixed as well. The main cause of the conversion to UTC is an .unstack() call as described in the linked issue.

@jreback jreback removed this from the Contributions Welcome milestone Jul 1, 2019
@jreback jreback added this to the 0.25.0 milestone Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants