Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_excel can't handle index with time zones #7056

Closed
sboehler opened this issue May 6, 2014 · 17 comments

Comments

@sboehler
Copy link

commented May 6, 2014

Hi

Something is wrong when trying to save a dataframe with tz-aware timestamps to xlsx, using pd.Dataframe.to_excel:

df = pd.DataFrame([1], index=[pd.Timestamp('2014-05-02', tz='CET')])
df.to_excel('test.xlsx')

yields an exception on my system:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-4e18a4be2a71> in <module>()
----> 1 df.to_excel('test.xlsx')

/home/silvio/prod34/lib/python3.4/site-packages/pandas/core/frame.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, cols, header, index, index_label, startrow, startcol, engine, merge_cells)
   1202         formatted_cells = formatter.get_formatted_cells()
   1203         excel_writer.write_cells(formatted_cells, sheet_name,
-> 1204                                  startrow=startrow, startcol=startcol)
   1205         if need_save:
   1206             excel_writer.save()

/home/silvio/prod34/lib/python3.4/site-packages/pandas/io/excel.py in write_cells(self, cells, sheet_name, startrow, startcol)
    771                 wks.write(startrow + cell.row,
    772                           startcol + cell.col,
--> 773                           cell.val, style)
    774 
    775     def _convert_to_style(self, style_dict, num_format_str=None):

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in cell_wrapper(self, *args, **kwargs)
     55             if len(args):
     56                 int(args[0])
---> 57             return method(self, *args, **kwargs)
     58         except ValueError:
     59             # First arg isn't an int, convert to A1 notation.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in write(self, row, col, *args)
    374         # Write datetime objects.
    375         if isinstance(token, date_types):
--> 376             return self.write_datetime(row, col, *args)
    377 
    378         # Write number types.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in cell_wrapper(self, *args, **kwargs)
     55             if len(args):
     56                 int(args[0])
---> 57             return method(self, *args, **kwargs)
     58         except ValueError:
     59             # First arg isn't an int, convert to A1 notation.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in write_datetime(self, row, col, date, cell_format)
    666 
    667         # Convert datetime to an Excel date.
--> 668         number = self._convert_date_time(date)
    669 
    670         # Add the default date format.

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/worksheet.py in _convert_date_time(self, dt_obj)
   3265     def _convert_date_time(self, dt_obj):
   3266         # Convert a datetime object to an Excel serial date and time.
-> 3267         return datetime_to_excel_datetime(dt_obj, self.date_1904)
   3268 
   3269     def _options_changed(self):

/home/silvio/prod34/lib/python3.4/site-packages/xlsxwriter/utility.py in datetime_to_excel_datetime(dt_obj, date_1904)
    577 
    578     # Convert a Python datetime.datetime value to an Excel date number.
--> 579     delta = dt_obj - epoch
    580     excel_time = (delta.days
    581                   + (float(delta.seconds)

/home/silvio/prod34/lib/python3.4/site-packages/pandas/tslib.cpython-34m.so in pandas.tslib._Timestamp.__sub__ (pandas/tslib.c:11918)()

TypeError: can't subtract offset-naive and offset-aware datetimes

Using Python 3.4.0 on Arch Linux, list of installed packages:

~% pip freeze
Cython==0.20.1
Jinja2==2.7.2
MarkupSafe==0.21
Pygments==1.6
SQLAlchemy==0.9.4
XlsxWriter==0.5.3
ecdsa==0.11
ipdb==0.8
ipython==2.0.0
ipython-sql==0.3.1
lxml==3.3.5
matplotlib==1.3.1
mysql-connector-python==1.1.6
nose==1.3.1
numpy==1.8.1
openpyxl==1.8.5
pandas==0.13.1
paramiko==1.13.0
patsy==0.2.1
prettytable==0.7.2
psycopg2==2.5.2
pycrypto==2.6.1
pyodbc==3.0.7
pyparsing==2.0.2
python-dateutil==2.2
pytz==2014.2
requests==2.2.1
scipy==0.13.3
six==1.6.1
sqlparse==0.1.11
statsmodels==0.6.0
tornado==3.2
xlrd==0.9.3

@jreback jreback added Bug labels May 6, 2014

@jreback jreback added this to the 0.15.0 milestone May 6, 2014

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 6, 2014

cc @jmcnamara

does excel support tz's in dates even?

not sure what this should do

@jreback jreback added the API Design label May 6, 2014

@jmcnamara

This comment has been minimized.

Copy link
Contributor

commented May 6, 2014

does excel support tz's in dates even?

No. Excel doesn't have any support for timezones.

I don't really know how this should be handled.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 6, 2014

ok....maybe should just report a better error for now

@sboehler what do you think it should do?

@sboehler

This comment has been minimized.

Copy link
Author

commented May 6, 2014

I'd expect either localized or UTC timestamps (both without tz information)
to be written to the file. Personally I vote for localized timestamps, as
the user by localizing his series explicitly opted for a certain timezone,
and this should be respected. If one wanted UTC output, a simple
tz_localize of the data before writing out solved the issue.

Agree?

2014-05-06 18:36 GMT+02:00 jreback notifications@github.com:

ok....maybe should just report a better error for now

@sboehler https://github.com/sboehler what do you think it should do?


Reply to this email directly or view it on GitHubhttps://github.com//issues/7056#issuecomment-42326246
.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 6, 2014

that makes sense, so choices I think are:

  • raise with helpful message (unambiguous)
  • provide a convert_timezones='local'|'utc' keyword; works but is 'silent', and could be ambiguous if you have multiple-columns/indicies that may need conversion
  • could do a local translation and add another column with the tz
@sboehler

This comment has been minimized.

Copy link
Author

commented May 6, 2014

or:

  • local translation without tz-information, relying on Excel users to know what they're doing (which is generally assumed, given Excel's poor support for date/time handling). If Excel is ever tz-aware, the obvious migration is to add back the tz information.

The global "convert_timezones" argument is problematic with several columns of timestamps - a user might want to output multiple columns with local timestamps and UTC timestamps to address Excels lack of tz handling.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 6, 2014

ok...

@sboehler so this is pretty straightfoward....PR?

@sboehler

This comment has been minimized.

Copy link
Author

commented May 6, 2014

I just noticed, the bug does not occur when I uninstall xlsxwriter! Pandas must be internally choosing the xlsx engine, depending on what's available in terms of packages, and use something else if xlsxwriter is not present.

The default seems to be to output the localized timestamp, omitting the timezone info.

@jreback As you can see from my github profile I have no experience with developing pandas, but I am happy to give it a go. This could take a few days, however...

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 6, 2014

you can specify engine='openpyxl' to get the same behavior (yes , the default is xlsxwriter as its much faster)

give it a go and let us know any issues

@kay1793

This comment has been minimized.

Copy link

commented Oct 13, 2014

from #7060

That sounds right, but then the error message is not informative and it should still be possible to   
output the dates as a text string with tz offset, the same way they appear when they output to the 
terminal. MIgrating pandas types into excel types when outputting is nice but it's not a requirement.

It's not only a xlsxwriter problem actually, xlwt and xlsxwriter fail with this error and openpyxl 
creates the file but throws away the timezone information so it actually silently corrupts the data. 
This is a little broken all around.
@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2014

to clarify here, @kay1793 what would an API for this be like?

e.g. need to have 2 ways of converting a timezone-aware Series/Index (e.g. raise if the underlying engine does not support, which most/all doen't ATM, or stringify to a isoformat).

@kay1793

This comment has been minimized.

Copy link

commented Oct 14, 2014

I can't say about the API, stringify to iso would work perfect for my needs and can still be sorted. Coercing to UTC would also be ok if there was a big fat warning came with it.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

@dacoex

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2016

using
engine='openpyxl'
creates the excel file. But all times have the same hour:
01:00:00

@dacoex

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2016

Could be an intermediate solution to add the warning and a note on the docs?

@jorisvandenbossche jorisvandenbossche changed the title pd.Dataframe.to_excel (xlsx) can't handle index with time zones to_excel can't handle index with time zones Jan 5, 2016

@misterjoa

This comment has been minimized.

Copy link

commented Sep 22, 2016

Hello everyone,

delta = dt_obj - epoch

A solution to this bug would be to localize epoch to the timezone of dt_obj, so that the timedelta is correct whatever the time zone is, and time-zoned datetimes can be handled, even if there's no notion of timezone in the resulting xlsx file.

what do you think about this ?

(this bug also appears with normal columns that are not indexes)

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2016

@misterjoa you actually just want naive zones, IOW, .tz_localize(None).

I maybe could see an option .to_excel(...., tz='local|utc') which would either by default localize or .tz_convert('UTC').tz_localize(None) for tz-aware (which we typically do in other situations, e.g. when going to a numpy array).

@misterjoa

This comment has been minimized.

Copy link

commented Sep 23, 2016

@jreback

thank you for your answer.

In fact, I want the user of an excel file to be shown the correct date and time in an arbitrary time zone.

To do so and avoid the current issue, I do the following before exporting to xlsx using xlsxwriter, using a column with strings containing UTC timestamp data :

(Paris is the target zone)

df['quote_calculation_time'] = pd.to_datetime(
    df['quote_calculation_time'],utc=True) \
    .dt.tz_localize('utc') \
    .dt.tz_convert('Europe/Paris') \
    .dt.tz_localize(None)

So having an option like you say would be really great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.