pandas.ExcelFile ignore parse_dates=False #10001

a59 · 2015-04-27T19:58:18Z

I am trying to read an Excel file which someone else created and the wrongly formatted a column as "date" when it is not. It has a large integer in it, which triggers an error

OverflowError: normalized days too large to fit in a C int

But I have "parse_dates=False" so I thought pandas.ExcelFile would not try to parse the dates and return a string instead. Is this a bug?

jreback · 2015-04-30T10:09:12Z

you would have to show an example and pd.show_versions()

a59 · 2015-05-03T22:42:26Z

In Excel (2013 Windows 7), I created a new Workbook. In Sheet1, I entered A in A1 and 10000000 in B1. I then formatted B1 to be a Short Date which displays the cell as #################. I saved the files as 'test.xlsx'.

I then ran the following python code

import pandas as pd
pd.show_versions()
xl_file = pd.ExcelFile('test.xlsx')
sheet = xl_file.parse('Sheet1',parse_dates=False)

which gives me the following output

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 14.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0
nose: 1.3.6
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.2
lxml: 3.4.3
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.1
pymysql: None
psycopg2: None
Traceback (most recent call last):
File "test.py", line 5, in
sheet = xl_file.parse('Sheet1',parse_dates=False)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/excel.py", line 287, in parse
**kwds)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/excel.py", line 418, in _parse_excel
row.append(_parse_cell(value,typ))
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/excel.py", line 342, in _parse_cell
epoch1904)
File "/Users/myname/anaconda/lib/python2.7/site-packages/xlrd/xldate.py", line 130, in xldate_as_datetime
return epoch + datetime.timedelta(days, seconds, 0, milliseconds)
OverflowError: date value out of range

jreback · 2015-05-04T11:12:09Z

parse_dates is not implemented. But I think could be in the _should_parse function. pull-requests are welcome.

kwantopia · 2016-03-29T11:55:46Z

Hi, I am adding this patch here, in case it's useful for those who do not want to parse dates from excel file by setting parse_dates=False. Please review. I had trouble parsing the following excel file from Crunchbase Excel Export which had really old dates which gave OverflowError.

From aae19c65e2a4b3a965f91bdffa5bd4595b0b7d7b Mon Sep 17 00:00:00 2001
From: Kwan Lee <kwan@openviewpartners.com>
Date: Tue, 29 Mar 2016 14:52:03 +0300
Subject: [PATCH] parse_dates=False in read_excel should prevent from parsing
 dates.  It was throwing overflowerror when the dates were too old. update
 #10001

---
 pandas/io/excel.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pandas/io/excel.py b/pandas/io/excel.py
index 5656c36..b4b8996 100644
--- a/pandas/io/excel.py
+++ b/pandas/io/excel.py
@@ -321,11 +321,11 @@ class ExcelFile(object):

         epoch1904 = self.book.datemode

-        def _parse_cell(cell_contents, cell_typ):
+        def _parse_cell(cell_contents, cell_typ, parse_dates=True):
             """converts the contents of the cell into a pandas
                appropriate object"""

-            if cell_typ == XL_CELL_DATE:
+            if cell_typ == XL_CELL_DATE and parse_dates:
                 if xlrd_0_9_3:
                     # Use the newer xlrd datetime handling.
                     cell_contents = xldate.xldate_as_datetime(cell_contents,
@@ -405,7 +405,7 @@ class ExcelFile(object):
                         should_parse[j] = self._should_parse(j, parse_cols)

                     if parse_cols is None or should_parse[j]:
-                        row.append(_parse_cell(value, typ))
+                        row.append(_parse_cell(value, typ, parse_dates))
                 data.append(row)

             if sheet.nrows == 0:
-- 
2.7.2

TomAugspurger · 2016-03-29T12:52:59Z

@kwantopia It'll be easier to review that if you put it up as a pull request. Then we can comment inline.

What's the desired behavior here? read_csv seems to silently "fail" to parse the columns that can't be represented as datetime64s.

In [5]: !cat foo.csv
date,val
1500-01-01,1
1600-01-02,2
1700-01-01,3
1800-01-01,4
1900-01-01,5
2000-01-01,6

In [1]: pd.read_csv('foo.csv', parse_dates='date')
Out[1]:
         date  val
0  1500-01-01    1
1  1600-01-02    2
2  1700-01-01    3
3  1800-01-01    4
4  1900-01-01    5
5  2000-01-01    6

jreback · 2016-03-29T13:30:15Z

actually maybe @jorisvandenbossche can comment here. IIRC the parse_dates kw only matters if in excel its NOT a date already (and in your case these are out-of-range, so they are strings and will be parsed to object even in pandas).

jreback · 2016-03-29T13:32:13Z

this is related to the issue in #11544 and looks to be a dupe of these (there is a somewhat convoluted chain as to what the original issues actually though). maybe someone can figure this chain out and we can create a master issue so its more clear.

kwantopia · 2016-03-29T15:07:42Z

@TomAugspurger it's a problem in read_excel, but I guess I was also misunderstanding parse_dates field. I was assuming that parse_dates=True means parse the dates and parse_dates=False means do not parse the dates for pandas.read_excel

ffranchina · 2019-02-23T01:24:09Z

Hello, sorry for writing here but it seems a quite common wish the ability of disabling date parsing from the function pd.read_excel(). Yet another stackoverflow question without a proper answer.
As @kwantopia underlined, I also would expect that setting the flag parse_dates=False could help me to achieve the desired result but, instead, seems to be ignored or to work in a different (and hence obscure) way.
Can anyone tell me which is the situation of this issue and if actually it's an issue? And, in case everything it's regular, can I have a definitive answer that can allow me to read an .xls without having the dates parsed?
If there's need for changing or fixing some code, I'm very willing and ready to help, I'd just like to be pointed in the right direction.
Thank you in advance!

maratbakiev2 · 2019-12-12T11:56:31Z

Hi guys, I've also run into this issue. One workaround for this (given that the name of columns that you don't want to convert or their column positions are fixed) is that you can provide dictionary of names/column numbers as keys and desired type as value. So, if you set "str" as a value, columns in dataframe will have "object" type and won't get parsed.
pd.read_excel(**you_kwargs, dtype={"col_name1": "str", "col_name2": "str"})
btw, you may provide desired column types only for some columns, it's not required for all of them

maratbakiev2 · 2019-12-12T11:58:03Z

@a59 @ffranchina

jreback added the IO Excel read_excel, to_excel label Apr 30, 2015

jreback added the Bug label May 4, 2015

jreback added this to the Next Major Release milestone May 4, 2015

jreback mentioned this issue Nov 7, 2015

ERR: raise NotImplemented error if keywords are passed to read_excel which are not supported #11544

Closed

jreback removed this from the Next Major Release milestone Mar 29, 2016

kordek mentioned this issue Apr 30, 2016

BUG: Convert non-dates in xls date cells to number #13042

Closed

jreback added this to the Next Major Release milestone Apr 30, 2016

jreback modified the milestones: 0.18.2, Next Major Release May 6, 2016

jreback closed this as completed in 1296ab3 May 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas.ExcelFile ignore parse_dates=False #10001

pandas.ExcelFile ignore parse_dates=False #10001

a59 commented Apr 27, 2015

jreback commented Apr 30, 2015

a59 commented May 3, 2015

INSTALLED VERSIONS

jreback commented May 4, 2015

kwantopia commented Mar 29, 2016

TomAugspurger commented Mar 29, 2016

jreback commented Mar 29, 2016

jreback commented Mar 29, 2016

kwantopia commented Mar 29, 2016

ffranchina commented Feb 23, 2019

maratbakiev2 commented Dec 12, 2019

maratbakiev2 commented Dec 12, 2019

pandas.ExcelFile ignore parse_dates=False #10001

pandas.ExcelFile ignore parse_dates=False #10001

Comments

a59 commented Apr 27, 2015

jreback commented Apr 30, 2015

a59 commented May 3, 2015

INSTALLED VERSIONS

jreback commented May 4, 2015

kwantopia commented Mar 29, 2016

TomAugspurger commented Mar 29, 2016

jreback commented Mar 29, 2016

jreback commented Mar 29, 2016

kwantopia commented Mar 29, 2016

ffranchina commented Feb 23, 2019

maratbakiev2 commented Dec 12, 2019

maratbakiev2 commented Dec 12, 2019