read_excel fails to read excel file when last sheet is empty and sheetname=None #11711

Closed
BreitA opened this Issue Nov 27, 2015 · 7 comments

Comments

Projects
None yet
2 participants

BreitA commented Nov 27, 2015

Pandas fails to load an excel file as a dict fo dataframe when the last sheet is empty when sheetname=None.
Deleting the last sheet manually fix the problem.
Maybe Pandas should be improved to be robust to this common case.
Also I post this to raise awareness of this issue in case someone is scratching his head wondering why pandas is not loading his excel file

In [151]:

df_dict=pd.read_excel('D:\trash_test.xlsx',sheetname=None)

print df_dict.keys() # pandas fails to load the file

df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')

print df # pandas loads first sheet normally

df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')

print df # pandas loads second sheet normally

df=pd.read_excel('D:\trash_test.xlsx',sheetname='not_default')

print df # pandas fails to load 'not_default' the last sheet which is empty (to be expected?)

Index([], dtype='object')
Empty DataFrame
Columns: [some_stuff, 1]
Index: []
Empty DataFrame
Columns: [some_stuff, 1]
Index: []


XLRDError Traceback (most recent call last)
in ()
6 df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')
7 print df # pandas loads second sheet normally
----> 8 df=pd.read_excel('D:\trash_test.xlsx',sheetname='not_default')
9 print df # pandas fails to load 'not_default' the last sheet which is empty (to be expected?)

C:\Anaconda\lib\site-packages\pandas\io\excel.pyc in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, *_kwds)
168 date_parser=date_parser, na_values=na_values, thousands=thousands,
169 convert_float=convert_float, has_index_names=has_index_names,
--> 170 skip_footer=skip_footer, converters=converters, *_kwds)
171
172 class ExcelFile(object):

C:\Anaconda\lib\site-packages\pandas\io\excel.pyc in _parse_excel(self, sheetname, header, skiprows, skip_footer, index_col, has_index_names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, verbose, **kwds)
370
371 if isinstance(asheetname, compat.string_types):
--> 372 sheet = self.book.sheet_by_name(asheetname)
373 else: # assume an integer if not a string
374 sheet = self.book.sheet_by_index(asheetname)

C:\Anaconda\lib\site-packages\xlrd\book.pyc in sheet_by_name(self, sheet_name)
439 sheetx = self._sheet_names.index(sheet_name)
440 except ValueError:
--> 441 raise XLRDError('No sheet named <%r>' % sheet_name)
442 return self.sheet_by_index(sheetx)
443

XLRDError: No sheet named <'not_default'>

Contributor

jreback commented Nov 27, 2015

pls show pd.show_versions() and use an example that is created / known to exist with 1 or more sheets (e.g. create it programatically).

sheetname=None returns ALL the sheets in a dict (but IIRC this is after 0.16.0), which should be indicated in the doc-string but is not.

jreback added the IO Excel label Nov 27, 2015

BreitA commented Nov 27, 2015

Sorry it's my first time reporting issues on GitHub here is a full example with :

1)- a case where I generate a file with 2 sheets with data and an empty one at the end
2)- a case where I generate a file with 2 sheets with data and NO empty one at the end.
in case 2) the read_excel function works well, but in case 1) it returns an empty dict as I mentionned earlier.
Here is the full code and the print of the output (with pd.show_versions)

import pandas as pd
import numpy as np
from pandas import ExcelWriter

​# generate a file with an empty sheet at the end
path='D:\\test.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_end')
writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_end')
print 'df/empty_sheet_at_end:', df # pandas loads the empty sheets that is empty

# generate a file WITHOUT an empty sheet at the end
path='D:\\test2.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
writer.save()
df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df_dict.keys: Index([], dtype='object')
Contributor

jreback commented Nov 27, 2015

df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_end: Empty DataFrame
Columns: []
Index: []
df_dict.keys: [u'sheet1', u'sheet2']
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Contributor

jreback commented Nov 27, 2015

can you check if saving an empty frame in-the-middle (e.g. not the end) work?

BreitA commented Nov 27, 2015

import pandas as pd
import numpy as np
from pandas import ExcelWriter

print 'empty sheet end'
# generate a file with an empty sheet at the end
path='D:\\test.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_end')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_end')
print 'df/empty_sheet_at_end:', df # pandas loads the empty sheets that is empty

# generate a file with an empty sheet in the middle
print 'empty sheet middle'
path='D:\\testmiddle.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_middle')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_middle')
print 'df/empty_sheet_at_middle:', df # pandas loads the empty sheets that is empty


# generate a file with an empty sheet at the start
print 'empty sheet start'
path='D:\\teststart.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_start')
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_start')
print 'df/empty_sheet_at_start:', df # pandas loads the empty sheets that is empty

print 'without empty sheet'
# generate a file WITHOUT an empty sheet at the end
path='D:\\test2.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
print pd.show_versions()
# This is the Output 

empty sheet end
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_end: Empty DataFrame
Columns: []
Index: []
empty sheet middle
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_middle: Empty DataFrame
Columns: []
Index: []
empty sheet start
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_start: Empty DataFrame
Columns: []
Index: []
without empty sheet
df_dict.keys: [u'sheet1', u'sheet2']
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
None
# Comment 

Ok so it appears the problem touch any excel file with an empty sheet 
Contributor

jreback commented Nov 29, 2015

@BreitA pls use triple-back ticks to format the output as this is very difficult to read.

I guess this is a bug, pull-requests to put in a proper test and fix are welcomed.

jreback added this to the Next Major Release milestone Nov 29, 2015

@jreback jreback modified the milestone: 0.18.0, Next Major Release Dec 11, 2015

@jreback jreback added a commit that referenced this issue Dec 12, 2015

@jreback jreback Merge pull request #11819 from sxwang/read_excel_empty
BUG: read_excel fails when empty sheets exist and sheetname=None #11711
4979f7a
Contributor

jreback commented Jan 30, 2016

closed by #11819

jreback closed this Jan 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment