Some problems with list in read_excel() #15133

ArtyomKaltovich · 2017-01-15T12:32:06Z

Code Sample, a copy-pastable example if possible

train = read_excel(
    'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
    sheetname=3, header=[0, 1], skip_footer=11)

OK

train = read_excel(
    'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
    sheetname=3, header=[0, 1], skip_footer=11,  names=[0] + list(range(2, 12)))

ValueError: Length of new names must be 1, got 2

train = read_excel('Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
                   sheetname=2, header=[0, 1],  skip_footer=6)

ValueError: Length of new names must be 1, got 2

train = read_excel('Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
                   sheetname=[1, 3], header=[0, 1],  skip_footer=3)

ValueError: Length of new names must be 1, got 2

Problem description

I am trying to read excel file ( https://github.com/ArtyomKaltovich/ikantam/blob/master/lab1/Wilson_Schistosome%20Esophagus_differentially%20expressed%20transcripts.xlsx ). But sometimes pandas does not accept list as parameters for read_excel.

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 32.3.1
Cython: None
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2017-01-16T12:53:42Z

Sorry, I don't entirely understand the issue. Could you include the tracebacks so we can see what's happening? Can you narrow down what the problem is?

ArtyomKaltovich · 2017-01-17T17:08:13Z

train = read_excel(
... 'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
... sheetname=3, header=[0, 1], skip_footer=11)
OK

train = read_excel(
'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
sheetname=3, header=[0, 1], skip_footer=11, names=[0] + list(range(2, 12)))
Traceback (most recent call last):
File "", line 3, in
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 200, in read_excel
**kwds)
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 511, in _parse_excel
asheetname].columns.set_names(header_names)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 1010, in set_names
idx._set_names(names, level=level)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 950, in _set_names
len(values))
ValueError: Length of new names must be 1, got 2

train = read_excel('Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
... sheetname=2, header=[0, 1], skip_footer=6)
Traceback (most recent call last):
File "", line 2, in
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 200, in read_excel
**kwds)
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 511, in _parse_excel
asheetname].columns.set_names(header_names)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 1010, in set_names
idx._set_names(names, level=level)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 950, in _set_names
len(values))
ValueError: Length of new names must be 1, got 2

ArtyomKaltovich · 2017-01-17T17:09:47Z

As documentation says I can use list as argument in read_excel().
Am I right? Or are there some restriction?

ArtyomKaltovich · 2017-01-19T10:39:17Z

Hello. Do tracebacks help understand problem?
As i understand sometimes MultiIndex does not create, but why?

TomAugspurger · 2017-01-19T13:53:04Z

IIRC, to pass sheetname=[list of sheets] requires that each sheet have the same structure, since the header, skipfooter, etc. are applied to each sheet. Is that true of your data?

An easy way to test is to see if

train = read_excel(
    'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
    sheetname=[3], header=[0, 1], skip_footer=11)

works, since sheetname is a list with one element.

dorvak · 2017-05-04T17:33:34Z

I got this error as well, this seems to be due to some changes in version 0.19., because the code works in version 0.18.0

jeanbaptisteb · 2017-10-17T11:56:17Z

I got this error too, no matter how I set the header and sheetname parameters. I also tried on a one-sheet file, the same error is raised.

billtubbs · 2018-10-15T17:03:21Z

I have this error too. Here is a demonstration of the error with a simple spreadsheet. The error only occurs when there is an empty column to the left of the first index col.

Test 1 (fails):

File 'excel_test1.xlsx' looks like this.

input_data_filename = 'excel_test1.xlsx'
df = pd.read_excel(
    os.path.join(data_dir, input_data_filename), 
    header=[3, 4],
    index_col=[1, 2]
)

Traceback:

-----------------------------------------------------------------------
ValueError                            Traceback (most recent call last)
<ipython-input-56-a7da75b33a63> in <module>()
      3     os.path.join(data_dir, input_data_filename),
      4     header=[3, 4],
----> 5     index_col=[1, 2]
      6 )

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/util/_decorators.pyc in wrapper(*args, **kwargs)
    176                 else:
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper
    180     return _deprecate_kwarg

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/util/_decorators.pyc in wrapper(*args, **kwargs)
    176                 else:
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper
    180     return _deprecate_kwarg

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
    327         skipfooter=skipfooter,
    328         convert_float=convert_float,
--> 329         **kwds)
    330 
    331 

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
    454                                  skipfooter=skipfooter,
    455                                  convert_float=convert_float,
--> 456                                  **kwds)
    457 
    458     def _should_parse(self, i, usecols):

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in _parse_excel(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
    690                 if not squeeze or isinstance(output[asheetname], DataFrame):
    691                     output[asheetname].columns = output[
--> 692                         asheetname].columns.set_names(header_names)
    693             except EmptyDataError:
    694                 # No Data, return an empty DataFrame

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in set_names(self, names, level, inplace)
   1419         else:
   1420             idx = self._shallow_copy()
-> 1421         idx._set_names(names, level=level)
   1422         if not inplace:
   1423             return idx

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in _set_names(self, values, level)
   1350         if len(values) != 1:
   1351             raise ValueError('Length of new names must be 1, got %d' %
-> 1352                              len(values))
   1353 
   1354         # GH 20527

ValueError: Length of new names must be 1, got 2

Test 2 (works):

File 'excel_test2.xlsx' looks like this.

input_data_filename = 'excel_test2.xlsx'
df = pd.read_excel(
    os.path.join(data_dir, input_data_filename), 
    header=[3, 4],
    index_col=[0, 1]
)

Output of test 2 is here.

Would be happy to see a solution to this or advice on how to fix!

mroeschke · 2021-05-08T00:27:39Z

Unfortunately it appears that this issue no longer has a excel file available for testing this bug. (Would be great if a reproducible example could be generated using pandas itself!). Going to close for now, but happy to reopen once we have a reproducible example

billtubbs · 2021-05-08T01:05:26Z

I uploaded a couple of Excel files called excel_test1.xls and excel_test2.xls (I used .xls files this time instead of .xlsx because xlrd has removed support for .xlsx files) to my GitHub here* and the test scripts (test-read-excel-1.py and test-read-excel-2.py).

I can still reproduce this error with

Pandas 1.1.1 and Python 3.8.3
Pandas 1.2.4 and Python 3.9.0

* Note: the folder in the GitHub repo linked above should not be named 'pandas' rename it locally to something else.

skannan-maf · 2023-07-28T17:47:22Z

I have as similar issue in 1.1.5 pandas version. But 1.5.3 works fine.
I dont pass "names" parameter though

eyaler · 2024-03-11T18:16:08Z

i think the issue happened in the case the first row is missing entries for some columns. in that case using the default index_col=None would get confused. you can overcome this by setting index_col=0 and later retrieving the index as your first column

TomAugspurger added the IO Excel read_excel, to_excel label Jan 16, 2017

mroeschke added the Bug label May 7, 2020

mroeschke closed this as completed May 8, 2021

mroeschke added the Needs Info Clarification about behavior needed to assess issue label May 8, 2021

mroeschke removed the Needs Info Clarification about behavior needed to assess issue label May 8, 2021

mroeschke reopened this May 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some problems with list in read_excel() #15133

Some problems with list in read_excel() #15133

ArtyomKaltovich commented Jan 15, 2017

TomAugspurger commented Jan 16, 2017

ArtyomKaltovich commented Jan 17, 2017

ArtyomKaltovich commented Jan 17, 2017 •

edited

ArtyomKaltovich commented Jan 19, 2017 •

edited

TomAugspurger commented Jan 19, 2017

dorvak commented May 4, 2017

jeanbaptisteb commented Oct 17, 2017

billtubbs commented Oct 15, 2018

mroeschke commented May 8, 2021

billtubbs commented May 8, 2021 •

edited

skannan-maf commented Jul 28, 2023

eyaler commented Mar 11, 2024

Some problems with list in read_excel() #15133

Some problems with list in read_excel() #15133

Comments

ArtyomKaltovich commented Jan 15, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

TomAugspurger commented Jan 16, 2017

ArtyomKaltovich commented Jan 17, 2017

ArtyomKaltovich commented Jan 17, 2017 • edited

ArtyomKaltovich commented Jan 19, 2017 • edited

TomAugspurger commented Jan 19, 2017

dorvak commented May 4, 2017

jeanbaptisteb commented Oct 17, 2017

billtubbs commented Oct 15, 2018

mroeschke commented May 8, 2021

billtubbs commented May 8, 2021 • edited

skannan-maf commented Jul 28, 2023

eyaler commented Mar 11, 2024

Output of `pd.show_versions()`

ArtyomKaltovich commented Jan 17, 2017 •

edited

ArtyomKaltovich commented Jan 19, 2017 •

edited

billtubbs commented May 8, 2021 •

edited