Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read excel decimal seconds #5945

Closed
dershow opened this issue Jan 15, 2014 · 8 comments
Closed

Can't read excel decimal seconds #5945

dershow opened this issue Jan 15, 2014 · 8 comments
Labels
Bug IO Excel read_excel, to_excel Timedelta Timedelta data type
Milestone

Comments

@dershow
Copy link

dershow commented Jan 15, 2014

related to #4332

I have an excel spread sheet (.xls) that contains a time column. The time is displayed in Excel as minutes:seconds.tenths of seconds. Such as "50:59.2" "50:59.4". The raw data contains hours:minutes:seconds.decimalseconds.

It seems that Pandas uses xldate_as_tuple() which apparently rounds all seconds, so the decimal part is dropped. So the above two data points both import as "50:59".

I suggest using a different conversion method, at least as an option. That way the data will not be dropped, as it is now.
I did also post this same issue to xlrd.

@jmcnamara
Copy link
Contributor

Here is some background from my answer to this question on SO:

Pandas used xlrd to read Excel files and the xlrd.xldate_as_tuple() function to get the date components to feed into datetime.time().

However, xlrd.xldate_as_tuple() only returns seconds and not microseconds so that information is lost to pandas.

For example, say you have an Excel file like this (Number is the same as time but without a format):

Time            Number
0:17:51.000     0.012395833
0:17:51.200     0.012398148
0:17:51.400     0.012400463
0:17:51.600     0.012402778
0:17:52.800     0.012416667
0:17:53.000     0.012418981

Then, if you read the data with the following program:

import xlrd

workbook = xlrd.open_workbook('minutes.xls')
worksheet = workbook.sheet_by_name('Sheet1')

cell =  worksheet.cell(2,0)

# Print the A2 cell value as a number.
print cell.value

# Print the seconds part of the A2 cell value.
print (cell.value * (24*60*60)) % 60

# Print the xldate_as_tuple output.
print xlrd.xldate_as_tuple(cell.value, workbook.datemode)

You get the following output:

0.0123981481481
51.2
(0, 0, 0, 0, 17, 51)

So, the decimal part of the seconds is read (51.2) but not returned by xldate_as_tuple() and thus not available to pandas.

This is the documented behaviour of xldate_as_tuple() so it will have to be fixed in xlrd. It could be worked around in Pandas but Excel dates and times are a collection of edge cases so it wouldn't be worth doing.

I'll submit a PR to xlrd with a fix.

@jmcnamara
Copy link
Contributor

I've submitted a PR to xlrd (python-excel/xlrd/pull/77).

However, it may need an API change to fix properly in which case pandas/io/excel.py may need a minor change to accommodate it. I'll update this issue when there is some progress on the xlrd side.

@jmcnamara
Copy link
Contributor

Submitted updated patch to xlrd for this: python-excel/xlrd/pull/78

@jreback
Copy link
Contributor

jreback commented Jan 21, 2014

@jmcnamara

look at the install script here on a 3.3 build. I believe xlsxwriter had a setup error (which didn't cause it to ultimately fail), - could be random too: https://travis-ci.org/jreback/pandas/jobs/17360736

@jmcnamara
Copy link
Contributor

@jreback

It could be a random error or an error in an unrelated thread.

Looking at the full log doesn't make it any clearer.

XlsxWriter has it's own Travis CI build and I haven't seen any issues there. However, if you see another one in Pandas let me know.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2014

will do

@jmcnamara
Copy link
Contributor

Before the patch in #6934:

>>> import pandas as pd
>>> pd.read_excel('decimal_seconds_1900.xls', 'Sheet1')
       Time
0  00:17:51
1  00:17:51
2  00:17:51
3  00:17:52
4  00:17:52
5  00:17:52

After the patch:

>>> import pandas as pd
>>> pd.read_excel('decimal_seconds_1900.xls', 'Sheet1')
              Time
0         00:17:51
1  00:17:51.200000
2  00:17:51.400000
3  00:17:51.600000
4  00:17:51.800000
5         00:17:52

@jreback jreback modified the milestones: 0.14.0, 0.15.0 Apr 23, 2014
@jreback
Copy link
Contributor

jreback commented Apr 24, 2014

closed by #6934

@jreback jreback closed this as completed Apr 24, 2014
jeffreystarr pushed a commit to jeffreystarr/pandas that referenced this issue Apr 28, 2014
Fix to allow decimal seconds to be read from Excel dates and times
into datetime objects. pandas-dev#5945.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

3 participants