Feature Request: read_excel sheet_name argument #19842

trespitt · 2018-02-22T15:08:37Z

I would love to be able to only read in un-hidden sheets. this could be a new possible value that can be passed to the sheet_name argument.

alternatively, what would also solve my problem is some way to filter the columns (again, probably for sheet_name). ideally be able to use regex or string comparison to choose which sheets to read.

also, what about being able to specify which sheets to be ignored?

The text was updated successfully, but these errors were encountered:

celsopneto · 2018-02-22T15:22:03Z

Wouldn't usecols solve your problem?

usecols : int or list, default None

If None then parse all columns,
If int then indicates last column to be parsed
If list of ints then indicates list of column numbers to be parsed
If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.

then you can define a function to pass the list or string as you want.

trespitt · 2018-02-22T15:32:39Z

i dont think that would solve my problem. the equivalent 'usesheets' would, however.

My problem is that one of the sheets, a hidden one, has different columns than all the others. (this prevents the use of names=[x,y,..] in the initial call to read_excel(), forcing me to iterate through the dict of DFs and rename the columns individually

let me know if there's something i'm missing. thanks for your suggestion

WillAyd · 2018-02-22T18:45:52Z

sheet_name does accept a list returning an OrderedDict of DataFrame objects - does that not cover your use case?

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

trespitt · 2018-02-22T21:43:12Z

@WillAyd thanks for the suggestion

My problem is that one of the sheets, a hidden one, has different columns than all the others. (this prevents the use of names=[x,y,..] in the initial call to read_excel(), forcing me to iterate through the dict of DFs and rename the columns individually

WillAyd · 2018-02-22T22:01:55Z

I'm not entirely clear on what you are looking for, but I think you have two options:

Option 1
This will essentially union all column names encountered, filling in NaN where appropriate

# Change sheet_name as required
df_dict = pd.read_excel('your_file.xlsx', sheet_name=None)  
df = pd.concat(df_dict)

Option 2
This will ignore the first row of data altogether, allowing you to set the names you want.

df_dict = pd.read_excel('test.xlsx', sheet_name=None, header=None, skiprows=1, 
                        names=['foo', 'bar', 'baz'])
df = pd.concat(df_dict)

gfyoung · 2018-02-23T09:03:02Z

@tres-pitt : Thanks for the report! Let me dissect this a bit:

Checking for whether a sheet is hidden would be tricky unless we were somehow able to surface an attribute for visibility. Not sure if that's something we have available, but open to investigation.
Your suggestion of regex wouldn't be particularly hard to add, though I'm not sure how great the use-case would be for such an enhancement. Regex is not a very user-friendly option for most.
I'm more a fan of your suggestion to skip sheets. We do this with columns, so I don't see why we couldn't do the same with Excel sheets.

So IMO 1 and 3 would be the best ways to go, with 3 being the easiest to implement.

AlistairMills · 2018-04-11T14:04:36Z

This is probably not the correct place to report what seems to me like a bug. Please direct me to the appropriate place.

I am having trouble with read_excel sheet_name. Pandas reads the spreadsheet, but it always reads sheet 0. When I try sheet_name=None, it reads the excel file too.

WillAyd · 2018-04-11T17:02:08Z

@AlistairMills if you are seeing what you think is a bug then you should open a separate issue for it. Below is documentation on the most effective way to do that:

https://github.com/pandas-dev/pandas/blob/master/.github/CONTRIBUTING.md#filing-issues

AlistairMills · 2018-04-12T08:55:15Z

Thank you for this advice. I shall do as you suggest.

ahawryluk · 2021-06-09T03:32:22Z

I'd be interested in tackling the original issue by adding a new argument to ExcelFile and read_excel: ignore_hidden=False. I sometimes have to work with a large collection of spreadsheets with hidden sheets, so it's a feature I would absolutely use. (I'm already checking for hidden xlrd/openpyxl sheets to save time in my particular use case.)

Do you think this would be worth adding yet another argument? If so, is that a good name?

ahawryluk · 2021-07-05T01:25:15Z

@rhshadrach curious for your opinion on this one. Would it be worth adding a default argument to pd.read_excel ignore_hidden=False? The benefit is more control over which sheets are loaded and (at least in my own case) no time wasted on unwanted sheets. The cost is another argument and a bit more code. The xlrd and openpyxl engines would get just a couple of lines extra, while the ods engine might be a little more involved. The xlsb engine wouldn't support this option without changes upstream.

rhshadrach · 2021-07-13T04:44:19Z

@ahawryluk - I believe one can perform introspection into the contents of an excel workbook using a third party engine (e.g. openpyxl) to determine the sheets and which are hidden. Doing so, a user can specify which sheets they want to read using the existing API. If this is correct, then it seems to me it'd be more appropriate for the cookbook, maintaining a separation of what is in pandas vs third party excel-engine libraries. In particular, the line I think this is crossing is an introspection into sheets within a workbook and their state when determining what to read.

ahawryluk · 2021-08-03T20:40:19Z

@rhshadrach yes, at least for .xls and .xlsx/.xlsm files, checking for hidden sheets is really easy ahead of time:

wb = xlrd.open_workbook('myfile.xls')
sheets = [sheet.name for sheet in wb.sheets() if sheet.visibility == 0]
dfs = pd.read_excel(wb, sheets)

wb = openpyxl.open('myfile.xlsx', read_only=True, data_only=True)
sheets = [sheet.title for sheet in wb if sheet.sheet_state == 'visible']
dfs = pd.read_excel(wb, sheets)
wb.close()

I like your cookbook suggestion; I'll make a very small PR.

gfyoung added IO Excel read_excel, to_excel Experimental Enhancement and removed Experimental labels Feb 23, 2018

WillAyd added this to the Contributions Welcome milestone Jan 21, 2019

ahawryluk mentioned this issue Aug 4, 2021

DOC: add cookbook link about hidden sheets #42874

Merged

2 tasks

jreback closed this as completed in #42874 Aug 4, 2021

ahawryluk mentioned this issue Jan 23, 2022

ENH: Support for Excel features when writing #45572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: read_excel sheet_name argument #19842

Feature Request: read_excel sheet_name argument #19842

trespitt commented Feb 22, 2018

celsopneto commented Feb 22, 2018

trespitt commented Feb 22, 2018

WillAyd commented Feb 22, 2018

trespitt commented Feb 22, 2018

WillAyd commented Feb 22, 2018

gfyoung commented Feb 23, 2018 •

edited

Loading

AlistairMills commented Apr 11, 2018

WillAyd commented Apr 11, 2018

AlistairMills commented Apr 12, 2018

ahawryluk commented Jun 9, 2021

ahawryluk commented Jul 5, 2021

rhshadrach commented Jul 13, 2021

ahawryluk commented Aug 3, 2021 •

edited

Loading

Feature Request: read_excel sheet_name argument #19842

Feature Request: read_excel sheet_name argument #19842

Comments

trespitt commented Feb 22, 2018

celsopneto commented Feb 22, 2018

trespitt commented Feb 22, 2018

WillAyd commented Feb 22, 2018

trespitt commented Feb 22, 2018

WillAyd commented Feb 22, 2018

gfyoung commented Feb 23, 2018 • edited Loading

AlistairMills commented Apr 11, 2018

WillAyd commented Apr 11, 2018

AlistairMills commented Apr 12, 2018

ahawryluk commented Jun 9, 2021

ahawryluk commented Jul 5, 2021

rhshadrach commented Jul 13, 2021

ahawryluk commented Aug 3, 2021 • edited Loading

gfyoung commented Feb 23, 2018 •

edited

Loading

ahawryluk commented Aug 3, 2021 •

edited

Loading