Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some problems with list in read_excel() #15133

Open
ArtyomKaltovich opened this issue Jan 15, 2017 · 12 comments
Open

Some problems with list in read_excel() #15133

ArtyomKaltovich opened this issue Jan 15, 2017 · 12 comments
Labels
Bug IO Excel read_excel, to_excel

Comments

@ArtyomKaltovich
Copy link

Code Sample, a copy-pastable example if possible

train = read_excel(
    'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
    sheetname=3, header=[0, 1], skip_footer=11)

OK

train = read_excel(
    'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
    sheetname=3, header=[0, 1], skip_footer=11,  names=[0] + list(range(2, 12)))

ValueError: Length of new names must be 1, got 2

train = read_excel('Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
                   sheetname=2, header=[0, 1],  skip_footer=6)

ValueError: Length of new names must be 1, got 2

train = read_excel('Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
                   sheetname=[1, 3], header=[0, 1],  skip_footer=3)

ValueError: Length of new names must be 1, got 2

Problem description

I am trying to read excel file ( https://github.com/ArtyomKaltovich/ikantam/blob/master/lab1/Wilson_Schistosome%20Esophagus_differentially%20expressed%20transcripts.xlsx ). But sometimes pandas does not accept list as parameters for read_excel.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 32.3.1
Cython: None
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Sorry, I don't entirely understand the issue. Could you include the tracebacks so we can see what's happening? Can you narrow down what the problem is?

@TomAugspurger TomAugspurger added the IO Excel read_excel, to_excel label Jan 16, 2017
@ArtyomKaltovich
Copy link
Author

train = read_excel(
... 'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
... sheetname=3, header=[0, 1], skip_footer=11)
OK

train = read_excel(
'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
sheetname=3, header=[0, 1], skip_footer=11, names=[0] + list(range(2, 12)))
Traceback (most recent call last):
File "", line 3, in
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 200, in read_excel
**kwds)
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 511, in _parse_excel
asheetname].columns.set_names(header_names)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 1010, in set_names
idx._set_names(names, level=level)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 950, in _set_names
len(values))
ValueError: Length of new names must be 1, got 2

train = read_excel('Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
... sheetname=2, header=[0, 1], skip_footer=6)
Traceback (most recent call last):
File "", line 2, in
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 200, in read_excel
**kwds)
File "/home/my/.local/lib/python3.5/site-packages/pandas/io/excel.py", line 511, in _parse_excel
asheetname].columns.set_names(header_names)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 1010, in set_names
idx._set_names(names, level=level)
File "/home/my/.local/lib/python3.5/site-packages/pandas/indexes/base.py", line 950, in _set_names
len(values))
ValueError: Length of new names must be 1, got 2

@ArtyomKaltovich
Copy link
Author

ArtyomKaltovich commented Jan 17, 2017

As documentation says I can use list as argument in read_excel().
Am I right? Or are there some restriction?

@ArtyomKaltovich
Copy link
Author

ArtyomKaltovich commented Jan 19, 2017

Hello. Do tracebacks help understand problem?
As i understand sometimes MultiIndex does not create, but why?

@TomAugspurger
Copy link
Contributor

IIRC, to pass sheetname=[list of sheets] requires that each sheet have the same structure, since the header, skipfooter, etc. are applied to each sheet. Is that true of your data?

An easy way to test is to see if

train = read_excel(
    'Wilson_Schistosome Esophagus_differentially expressed transcripts.xlsx',
    sheetname=[3], header=[0, 1], skip_footer=11)

works, since sheetname is a list with one element.

@dorvak
Copy link

dorvak commented May 4, 2017

I got this error as well, this seems to be due to some changes in version 0.19., because the code works in version 0.18.0

@jeanbaptisteb
Copy link

I got this error too, no matter how I set the header and sheetname parameters. I also tried on a one-sheet file, the same error is raised.

@billtubbs
Copy link

I have this error too. Here is a demonstration of the error with a simple spreadsheet. The error only occurs when there is an empty column to the left of the first index col.

Test 1 (fails):

File 'excel_test1.xlsx' looks like this.

input_data_filename = 'excel_test1.xlsx'
df = pd.read_excel(
    os.path.join(data_dir, input_data_filename), 
    header=[3, 4],
    index_col=[1, 2]
)

Traceback:

-----------------------------------------------------------------------
ValueError                            Traceback (most recent call last)
<ipython-input-56-a7da75b33a63> in <module>()
      3     os.path.join(data_dir, input_data_filename),
      4     header=[3, 4],
----> 5     index_col=[1, 2]
      6 )

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/util/_decorators.pyc in wrapper(*args, **kwargs)
    176                 else:
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper
    180     return _deprecate_kwarg

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/util/_decorators.pyc in wrapper(*args, **kwargs)
    176                 else:
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper
    180     return _deprecate_kwarg

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
    327         skipfooter=skipfooter,
    328         convert_float=convert_float,
--> 329         **kwds)
    330 
    331 

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
    454                                  skipfooter=skipfooter,
    455                                  convert_float=convert_float,
--> 456                                  **kwds)
    457 
    458     def _should_parse(self, i, usecols):

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in _parse_excel(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
    690                 if not squeeze or isinstance(output[asheetname], DataFrame):
    691                     output[asheetname].columns = output[
--> 692                         asheetname].columns.set_names(header_names)
    693             except EmptyDataError:
    694                 # No Data, return an empty DataFrame

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in set_names(self, names, level, inplace)
   1419         else:
   1420             idx = self._shallow_copy()
-> 1421         idx._set_names(names, level=level)
   1422         if not inplace:
   1423             return idx

/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in _set_names(self, values, level)
   1350         if len(values) != 1:
   1351             raise ValueError('Length of new names must be 1, got %d' %
-> 1352                              len(values))
   1353 
   1354         # GH 20527

ValueError: Length of new names must be 1, got 2

Test 2 (works):

File 'excel_test2.xlsx' looks like this.

input_data_filename = 'excel_test2.xlsx'
df = pd.read_excel(
    os.path.join(data_dir, input_data_filename), 
    header=[3, 4],
    index_col=[0, 1]
)

Output of test 2 is here.

Would be happy to see a solution to this or advice on how to fix!

@mroeschke mroeschke added the Bug label May 7, 2020
@mroeschke
Copy link
Member

Unfortunately it appears that this issue no longer has a excel file available for testing this bug. (Would be great if a reproducible example could be generated using pandas itself!). Going to close for now, but happy to reopen once we have a reproducible example

@mroeschke mroeschke added the Needs Info Clarification about behavior needed to assess issue label May 8, 2021
@billtubbs
Copy link

billtubbs commented May 8, 2021

I uploaded a couple of Excel files called excel_test1.xls and excel_test2.xls (I used .xls files this time instead of .xlsx because xlrd has removed support for .xlsx files) to my GitHub here* and the test scripts (test-read-excel-1.py and test-read-excel-2.py).

I can still reproduce this error with

  • Pandas 1.1.1 and Python 3.8.3
  • Pandas 1.2.4 and Python 3.9.0

* Note: the folder in the GitHub repo linked above should not be named 'pandas' rename it locally to something else.

@mroeschke mroeschke removed the Needs Info Clarification about behavior needed to assess issue label May 8, 2021
@mroeschke mroeschke reopened this May 8, 2021
@skannan-maf
Copy link

I have as similar issue in 1.1.5 pandas version. But 1.5.3 works fine.
I dont pass "names" parameter though

@eyaler
Copy link

eyaler commented Mar 11, 2024

i think the issue happened in the case the first row is missing entries for some columns. in that case using the default index_col=None would get confused. you can overcome this by setting index_col=0 and later retrieving the index as your first column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

8 participants