bug using `read_excel` and `index_col` option #18792

Leobouloc · 2017-12-15T14:50:59Z

Code Sample, a copy-pastable example if possible

Example 1 (weird column/value assignment)

import pandas as pd

# Writing excel file with no column name in first column
tab = pd.DataFrame([['a', 'x'], ['b', 'y']], columns=['col1', 'col2'])
print('Original:\n', tab)
tab.to_excel('tmp_can_del.xlsx')

# The table read using read_excel assigns values of col1 to col2
new_tab = pd.read_excel('tmp_can_del.xlsx', index_col=2)
print('Read:\n', new_tab)

Example 2 (index_col=None seems not to work)

import pandas as pd

# Writing excel file with no column name in first column
tab = pd.DataFrame([['a', 'x'], ['b', 'y']], columns=['col1', 'col2'], index=['i1', 'i2'])
print('Original:\n', tab)
tab.to_excel('tmp_can_del.xlsx')

# The table read using read_excel should have three columns (`Unnamed: 0`, `col1`, `col2`) but just has two
new_tab = pd.read_excel('tmp_can_del.xlsx', index_col=None)
print('Read:\n', new_tab)
print(new_tab.columns)

Problem description

The index_col option in pd.read_excel seems not to behave properly when the first column name is missing (for example, when a file is written using to_excel(..., index=True)).

Example 1
We end up with a DataFrame where the data for col2 was originally the data below col1. This does not seem intuitive.

Original:
   col1 col2
0    a    x
1    b    y
Read:
    col1 col2
x     0    a
y     1    b

Example 2
It seems like the index_col=None option does not work. The first column of the file is treated as the DataFrame index.

Original:
    col1 col2
i1    a    x
i2    b    y
Read:
    col1 col2
i1    a    x
i2    b    y
Index(['col1', 'col2'], dtype='object')

Expected Output

Example 1
If column names are not specified, I would expect the data assigned to col1 to be that below col1 in the original excel file (same for col2). Something like this:

Read:
    col1 col2
x     a    x
y     b    y

Example 2
I would expect the newly read tab to look somewhat like this (as is the case when doing the same operations using write_csv and read_csv):

   Unnamed: 0 col1 col2
0           i1    a    x
1           i2    b    y

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0.21.1
pytest: 3.1.2
pip: 9.0.1
setuptools: 37.0.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.7.3
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

Leobouloc · 2018-02-15T15:16:52Z

Re-read my issue; I believe this should be tagged as a bug.

Cheers!

Closes pandas-devgh-18792.

slavanorm · 2021-05-24T14:49:19Z

hi guys, i have this issue, but only when running coverage.

i used [ python -m pytest ] which eventually reads attached file without index imported from file.

then i tried to use [ coverage -m pytest ], which does import it from file.
Example:

df = pd.read_excel('index_bug.xlsx')
try:
df.drop([0]) # index i awaited
except:
raise Exception(df.index)

Result:

E Exception: Index(['Sr', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
E 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
E 24, 25, 26, 27, 28, 29],
E dtype='object')

pip freeze:
pandas=1.1.0
xlrd=1.1.0
coverage=5.1

index_bug.xlsx

gfyoung added the IO Excel read_excel, to_excel label Dec 15, 2017

Leobouloc mentioned this issue Dec 15, 2017

File upload with missing columns entrepreneur-interet-general/the-magical-csv-merge-machine#59

Open

Leobouloc changed the title ~~read_excel weird behavior using index_col~~ bug using read_excel and index_col option Feb 15, 2018

chris-b1 mentioned this issue May 14, 2018

Pandas read_excel uses blank column heading as index #21032

Closed

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018

TST: Test unnamed columns with index_col for Excel

6245b59

Closes pandas-devgh-18792.

gfyoung added Bug Testing pandas testing functions or related to the test suite and removed Bug labels Nov 23, 2018

gfyoung mentioned this issue Nov 23, 2018

TST: Test unnamed columns with index_col for Excel #23874

Merged

jreback added this to the 0.24.0 milestone Nov 23, 2018

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018

TST: Test unnamed columns with index_col for Excel

26729ba

Closes pandas-devgh-18792.

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018

TST: Test unnamed columns with index_col for Excel

f622d45

Closes pandas-devgh-18792.

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 25, 2018

TST: Test unnamed columns with index_col for Excel

9238348

Closes pandas-devgh-18792.

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2018

TST: Test unnamed columns with index_col for Excel

00d369b

Closes pandas-devgh-18792.

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2018

TST: Test unnamed columns with index_col for Excel

7153e5f

Closes pandas-devgh-18792.

jreback closed this as completed in #23874 Nov 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug using `read_excel` and `index_col` option #18792

bug using `read_excel` and `index_col` option #18792

Leobouloc commented Dec 15, 2017

Leobouloc commented Feb 15, 2018

slavanorm commented May 24, 2021 •

edited

Loading

bug using read_excel and index_col option #18792

bug using read_excel and index_col option #18792

Comments

Leobouloc commented Dec 15, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Leobouloc commented Feb 15, 2018

slavanorm commented May 24, 2021 • edited Loading

bug using `read_excel` and `index_col` option #18792

bug using `read_excel` and `index_col` option #18792

Output of `pd.show_versions()`

slavanorm commented May 24, 2021 •

edited

Loading