to_excel incorrectly sets time to midnight for exported datetime column #9139

scls19fr · 2014-12-23T17:18:39Z

Hello,

I try this:

import requests
from pandas.io.json import json_normalize
import json
import pandas as pd
response = requests.get("http://api.openweathermap.org/data/2.5/history/station?start=1356220800&end=1356307200&type=hour&id=5530")
df = json_normalize(json.loads(response.text)['list'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
#df = df.set_index('dt')
print(df)
df.to_excel("out.xls")

It displays:

                    dt  humidity.c      ...       pressure.v  temp.c  temp.ma  \
0  2012-12-23 00:00:00           2      ...           1017.0       2   286.15
1  2012-12-23 01:00:00           2      ...           1017.0       2   286.15
2  2012-12-23 02:00:00           2      ...           1017.0       2   285.15
3  2012-12-23 03:00:00           2      ...           1017.0       2   284.15
4  2012-12-23 04:00:00           2      ...           1017.0       2   283.15
...
19 2012-12-23 19:00:00           1      ...           1016.0       1   282.15
20 2012-12-23 20:00:00           1      ...           1015.0       1   281.15
21 2012-12-23 21:00:00           1      ...           1015.0       1   282.15
22 2012-12-23 22:00:00           1      ...           1015.0       1   281.15
23 2012-12-23 23:00:00           1      ...           1005.0       1   286.15
24 2012-12-24 00:00:00           2      ...           1013.5       2   280.15

but if we look at Excel file dt column looks like

dt
2012-12-23 00:00:00
2012-12-23 00:00:00
2012-12-23 00:00:00
2012-12-23 00:00:00
2012-12-23 00:00:00
...
2012-12-23 00:00:00
2012-12-23 00:00:00
2012-12-23 00:00:00
2012-12-23 00:00:00
2012-12-24 00:00:00

to_csv method is ok

In [1]: pd.__version__
Out[1]: '0.15.2'

Kind regards

The text was updated successfully, but these errors were encountered:

shoyer · 2014-12-24T06:52:31Z

@scls19fr Thanks for the report!

Pandas supports a couple of external modules for writing Excel modules, so this might also depend on which one you're using.

Here is where you can find the relevant source code if you're interested in trying your hand at a fix: https://github.com/pydata/pandas/blob/master/pandas/io/excel.py

jorisvandenbossche · 2014-12-24T11:03:02Z

This was also reported here: http://stackoverflow.com/questions/27548833/pandas-to-excel-in-version-0-15-2-not-working-for-datetime-objects

But, as I posted over there, I cannot reproduce this. @scls19fr Can you show pd.show_versions()? @shoyer can you reproduce this?

scls19fr · 2014-12-24T12:53:12Z

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 14.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.1rc1
statsmodels: 0.6.1
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.3
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: 2.0.3
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.4
lxml: 3.4.1
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.8
apiclient: None
rpy2: None
sqlalchemy: 0.9.8
pymysql: 0.6.2.None
psycopg2: None

scls19fr · 2014-12-24T13:04:57Z

I did

import requests
from pandas.io.json import json_normalize
import json
import pandas as pd
response = requests.get("http://api.openweathermap.org/data/2.5/history/station?start=1356220800&end=1356307200&type=hour&id=5530")
df = json_normalize(json.loads(response.text)['list'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
#df = df.set_index('dt')
df = df[['dt', 'main.temp.ma']]
print(df)


for (ext, engine) in [('xlsx', 'openpyxl'), ('xlsx', 'xlsxwriter'), ('xls', 'xlwt')]:
    with pd.ExcelWriter("out_{}.{}".format(engine, ext), engine=engine) as writer:
        df.to_excel(writer)

out_openpyxl.xlsx is ok but out_xlsxwriter.xlsx and out_xlwt.xls are bad (midnight issue).

jorisvandenbossche · 2014-12-24T13:22:18Z

@scls19fr Thanks, I can now confirm the same: working for openpyxl, wrong for xlsxwriter and xlwt

jorisvandenbossche · 2014-12-24T13:23:00Z

cc @jtratner @neirbowj

jmcnamara · 2014-12-29T18:00:57Z

I am the author of XlsxWriter. Here is my analysis of what is happening in this issue starting with the smaller example above:

import requests
from pandas.io.json import json_normalize
import json
import pandas as pd
response = requests.get("http://api.openweathermap.org/data/2.5/history/station?start=1356220800&end=1356307200&type=hour&id=5530")

df = json_normalize(json.loads(response.text)['list'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
df = df[['dt', 'main.temp.ma']]

If we take one of these values as an example:

>>> timestamp = df['dt'][23]
>>> timestamp
Timestamp('2012-12-23 23:00:00')

An Excel date is stored as a delta of days from an epoch (the integer part of the number) and a percentage of the day for the time in milliseconds (the float part). XlsxWriter and afaik Xlwt use a calculation like the following:

>>> import datetime
>>> epoch = datetime.datetime(1899, 12, 31, 0, 0, 0)
>>> excel_datetime = timestamp - epoch
>>> excel_datetime
Timedelta('41265 days 23:00:00')
>>> excel_datetime.days()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable
>>> excel_datetime.days
41265
>>> excel_datetime.seconds
0
>>> excel_datetime.microseconds
0

As you can see the seconds part of the Timedelta() doesn't look right and the seconds and microseconds are 0. Hence the 0 times in the Excel files.

For comparison here is how a datetime.datatime object behaves:

>>> mydate = datetime.datetime(2012, 12, 23, 23, 0, 0)
>>> excel_datetime = mydate - epoch
>>> excel_datetime
datetime.timedelta(41265, 82800)
>>> excel_datetime.days
41265
>>> excel_datetime.seconds
82800
>>> excel_datetime.microseconds
0

XlsxWriter performs this epoch calculation because the Timestamp object meets the following criteria:

>>> isinstance(timestamp, datetime.datetime)
True

So for me it looks like the issue here is that the pandas Timestamp() object doesn't handle the epoch calculation correctly.

There are a few workable solutions but I'll hold off until someone familiar with the Timestamp class has a chance to look at this and respond.

John

jreback · 2014-12-29T18:35:09Z

So the problem is the conversion is relying upon an implementation detail of timedelta, meaning that .days gets you the days and .seconds the seconds as the remainder. This was asof 0.15.0.
I noted this specifically here: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#new-features.

There are many ways to do this.

In [26]: ts = Timestamp('2012-12-23 23:00:00')

In [27]: epoch = datetime.datetime(1899, 12, 31, 0, 0, 0)

In [28]: td = ts - epoch

In [29]: td
Out[29]: Timedelta('41265 days 23:00:00')

In [32]: (td-Timedelta(days,unit='d')).total_seconds()
Out[32]: 82800.0

In [33]: td.components
Out[33]: Components(days=41265, hours=23, minutes=0, seconds=0, milliseconds=0, microseconds=0, nanoseconds=0)


In [35]: td.hours*3600+td.minutes*60+td.seconds
Out[35]: 82800

In [36]: td.to_pytimedelta().days
Out[36]: 41265

In [37]: td.to_pytimedelta().seconds
Out[37]: 82800

shoyer · 2014-12-29T18:43:39Z

@jmcnamara This is very helpful, thanks!

For an explanation of how pandas.Timedelta works see this section of the docs (especially the warning):
http://pandas.pydata.org/pandas-docs/stable/timedeltas.html#attributes

This is certainly an unfortunate side-effect of overriding super-class behavior in a subclass. I recall raising this issue in the pull request adding Timedelta: #8184. It looks like @jreback convinced me that the overriding the super-class behavior was not so bad, but now I'm not so sure (since it clearly is relied in in external packages).

The simplest fix would be to use to_pydatetime() for coercing datetime64 columns (prior to passing off to the various Excel writing backends)

shoyer · 2014-12-29T18:49:03Z

FWIW, I think pandas should fix this issue on our side (since we broke it with our strange timedelta subclass) rather than asking xlsxwriter and xlwt to fix this upstream.

jorisvandenbossche · 2014-12-29T21:23:44Z

@shoyer @jreback Open a new issue to discuss this behaviour of the timedelta attributes again?

jreback · 2014-12-29T21:33:44Z

this is not an upstream issue
rather a bug in the pandss openpyxl engine

neirbowj · 2014-12-29T23:34:04Z

@jreback wait... wat?

jmcnamara · 2014-12-31T03:13:33Z

I've added a PR for this based on coercing the Timestamp object to a datetime object for xlwt and xlsxwriter with a test.

Even if you choose a different solution you can use the testcase.

I was surprised that the existing testcases didn't pick up this issue but none of them actually test a roundtrip of a date and time. o_O But I am not in a position to throw the first stone on that one.

jreback · 2015-01-02T15:58:03Z

@neirbowj sorry, I meant xlwt/xlsxwriter

drorata · 2015-02-17T14:02:11Z

Was it actually fixed? It seems like I still get this issue. For example:

ts = pd.Timestamp('2015-02-04 12:45:50.240999')
df = pd.DataFrame([('foo', ts),])
writer = ExcelWriter('timestamp-test.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, 'Test')

and the result is:

As you can see the exported timestamp is different than the given one.

BTW: using openpyxl as suggested here solves the issue.

jreback · 2015-02-17T14:03:47Z

well if you use master it is. This will be in the forthcoming 0.16.0 release.

shoyer changed the title ~~to_excel doesn't export Timestamp correctly~~ to_excel incorrectly sets time to midnight for exported datetime column Dec 24, 2014

shoyer added Bug IO Excel read_excel, to_excel labels Dec 24, 2014

shoyer added this to the 0.16.0 milestone Dec 24, 2014

This was referenced Dec 24, 2014

to_excel leaves ExcelWriter Handle Open #9145

Closed

Python Pandas DataFrame output marians/openweather#9

Open

jmcnamara mentioned this issue Dec 31, 2014

BUG: Fix for Timestamp handling in xlwt and xlsxwriter engines. #9175

Closed

shoyer mentioned this issue Jan 2, 2015

API: restore timedelta compability for Timedelta components? #9185

Closed

jmcnamara mentioned this issue Jan 3, 2015

Datetime export not working correctly from pandas jmcnamara/XlsxWriter#207

Closed

jreback mentioned this issue Jan 15, 2015

API: restore full datetime.timedelta compat with Timedelta w.r.t. seconds/microseconds accessors (GH9185, GH9139) #9257

Merged

jreback closed this as completed in #9257 Jan 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_excel incorrectly sets time to midnight for exported datetime column #9139

to_excel incorrectly sets time to midnight for exported datetime column #9139

scls19fr commented Dec 23, 2014

shoyer commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

scls19fr commented Dec 24, 2014

scls19fr commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

jmcnamara commented Dec 29, 2014

jreback commented Dec 29, 2014

shoyer commented Dec 29, 2014

shoyer commented Dec 29, 2014

jorisvandenbossche commented Dec 29, 2014

jreback commented Dec 29, 2014

neirbowj commented Dec 29, 2014

jmcnamara commented Dec 31, 2014

jreback commented Jan 2, 2015

drorata commented Feb 17, 2015

jreback commented Feb 17, 2015

to_excel incorrectly sets time to midnight for exported datetime column #9139

to_excel incorrectly sets time to midnight for exported datetime column #9139

Comments

scls19fr commented Dec 23, 2014

shoyer commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

scls19fr commented Dec 24, 2014

scls19fr commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

jmcnamara commented Dec 29, 2014

jreback commented Dec 29, 2014

shoyer commented Dec 29, 2014

shoyer commented Dec 29, 2014

jorisvandenbossche commented Dec 29, 2014

jreback commented Dec 29, 2014

neirbowj commented Dec 29, 2014

jmcnamara commented Dec 31, 2014

jreback commented Jan 2, 2015

drorata commented Feb 17, 2015

jreback commented Feb 17, 2015