Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_excel Mishandles Mixing of tz-aware datetimes #27008

Closed
shashurup opened this issue Jun 23, 2019 · 10 comments · Fixed by #27129
Closed

to_excel Mishandles Mixing of tz-aware datetimes #27008

shashurup opened this issue Jun 23, 2019 · 10 comments · Fixed by #27129
Labels
Bug IO Excel read_excel, to_excel Timezones Timezone data dtype

Comments

@shashurup
Copy link

shashurup commented Jun 23, 2019

Code Sample, a copy-pastable example if possible

In [14]: import pandas as pd
In [14]: ew = pd.ExcelWriter('test.xlsx')

In [15]: df = pd.DataFrame([pd.to_datetime('2019-06-22 01:11'), pd.to_datetime('2019-06-22 01:11
    ...: Z'), pd.to_datetime('2019-06-22 01:11MSK')])

In [16]: df
Out[16]:
                           0
0        2019-06-22 01:11:00
1  2019-06-22 01:11:00+00:00
2  2019-06-22 01:11:00+03:00

In [17]: df.to_excel(ew, sheet_name='sh1')

In [18]: ew.save()

In [19]: ew.close()

(MSK is +03:00)

Problem description

ExcelWriter seems to incorrectly convert timestamps with timezones. Time part seems to be converted correctly, however date part looks like it was unchanged.

Expected excel sheet content

0 | 2019-06-22 01:11:00
1 | 2019-06-22 01:11:00
2 | 2019-06-21 22:11:00

Actual excel sheet content

0 | 2019-06-22 01:11:00
1 | 2019-06-22 01:11:00
2 | 2019-06-22 22:11:00

pandas is 0.24.2

@WillAyd
Copy link
Member

WillAyd commented Jun 23, 2019

I think this is happening because you are mixing tz-aware and tz-naive timestamps. This seems to work:

df = pd.DataFrame([pd.to_datetime('2019-06-22 01:11Z'), pd.to_datetime('2019-06-22 01:11Z'), pd.to_datetime('2019-06-22 01:11:00+03')])
df.to_excel('foo.xlsx')

If you'd like to take a look would certainly take a PR

@WillAyd WillAyd added Bug IO Excel read_excel, to_excel Timezones Timezone data dtype labels Jun 23, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Jun 23, 2019
@WillAyd WillAyd changed the title ExcelWriter incorrecty handles time zones to_excel Mishandles Mixing of tz-aware and tz-naive datetimes Jun 23, 2019
@shashurup
Copy link
Author

Unfortunately no.

pd.DataFrame([pd.to_datetime('2019-06-22 01:11MSK')])

this one, with timezone, also gives incorrect result - 2019-06-22 22:11:00.

@WillAyd
Copy link
Member

WillAyd commented Jun 23, 2019

Hmm OK. Interestingly enough if you use the xlsxwriter this will fail:

>>> df.to_excel('~/Desktop/foo.xlsx', engine='xlsxwriter')
TypeError: Excel doesn't support timezones in datetimes. Set the tzinfo in the datetime/time object to None or use the 'remove_timezone' Workbook() option

Is your expectation to just have everything converted to UTC?

@WillAyd WillAyd changed the title to_excel Mishandles Mixing of tz-aware and tz-naive datetimes to_excel Mishandles Mixing of tz-aware datetimes Jun 23, 2019
@shashurup
Copy link
Author

Initially I wanted it in my local timezone so it seems that the solution would be to remove tz information so that no conversion takes place.

@WillAyd
Copy link
Member

WillAyd commented Jun 24, 2019

Initially I wanted it in my local timezone so it seems that the solution would be to remove tz information so that no conversion takes place.

The catch is that in your original example not all Timestamps have a timezone, so conversion to a local timezone wouldn't be entirely applicable.

@mroeschke do we have a standard way of dealing with IO conversions for timestamps when timezone information isn't supported?

@mroeschke
Copy link
Member

For SQL, we export datetime64[tz, ns] as naive local time for databases that don't support timezones (I cannot confidently say this is the rule across all IO formats). For the mixed use case in the issue, I agree that conversion to local would result in loss of information and that's not ideal. I would say exporting as text/string is almost the best option in that edge case.

xref Another excel timezone export issue: #7056
xref Generally, parsing non-pytz timezone abbreviations (e.g. MSK) is not fully supported: #18702

@shashurup
Copy link
Author

shashurup commented Jun 25, 2019

Just to clarify - the problem reproduces not only in the mixed use case I've mentioned initially. Just one cell with tz-aware datetime produces incorrect result.

@WillAyd
Copy link
Member

WillAyd commented Jun 26, 2019

Thanks for the ref @mroeschke . Per this comment #7056 (comment) I think this should raise

@shashurup what do you think about that? The problem here is that the intention is rather ambiguous so by raising it would at least indicate that to the end user that further steps would need to be taken to make the intention clearer

@shashurup
Copy link
Author

shashurup commented Jun 26, 2019

I agree that the intention must be clarified, i.e. the timezone we want Excel to display must be specified somehow.
However, first the incorrect timezone conversion must be fixed
2019-06-22 01:11MSK != 2019-06-22 22:11:00UTC
the correct UTC conversion is 2019-06-21 22:11:00UTC !!!!!

@mroeschke
Copy link
Member

Note, the date you are passing gets dispatched to dateutil, and while pandas's current minimum dateutil version supports parsing MSK as a timezone (though dateutil just drops the timezone), this behavior is getting deprecated:

In [5]: pd.DataFrame([pd.to_datetime('2019-06-22 01:11MSK')])
   ...:
   ...:
/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/dateutil/parser/_parser.py:1206: UnknownTimezoneWarning: tzname MSK identified but not understood.  Pass `tzinfos` argument in order to correctly return a timezone-aware datetime.  In a future version, this will raise an exception.
  category=UnknownTimezoneWarning)
Out[5]:
                    0
0 2019-06-22 01:11:00

# A more recent dateutil version
In [6]: dateutil.__version__
Out[6]: '2.8.0'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants