Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug using read_excel and index_col option #18792

Closed
Leobouloc opened this issue Dec 15, 2017 · 2 comments
Closed

bug using read_excel and index_col option #18792

Leobouloc opened this issue Dec 15, 2017 · 2 comments
Labels
IO Excel read_excel, to_excel Testing pandas testing functions or related to the test suite
Milestone

Comments

@Leobouloc
Copy link

Code Sample, a copy-pastable example if possible

Example 1 (weird column/value assignment)

import pandas as pd

# Writing excel file with no column name in first column
tab = pd.DataFrame([['a', 'x'], ['b', 'y']], columns=['col1', 'col2'])
print('Original:\n', tab)
tab.to_excel('tmp_can_del.xlsx')

# The table read using read_excel assigns values of col1 to col2
new_tab = pd.read_excel('tmp_can_del.xlsx', index_col=2)
print('Read:\n', new_tab)

Example 2 (index_col=None seems not to work)

import pandas as pd

# Writing excel file with no column name in first column
tab = pd.DataFrame([['a', 'x'], ['b', 'y']], columns=['col1', 'col2'], index=['i1', 'i2'])
print('Original:\n', tab)
tab.to_excel('tmp_can_del.xlsx')

# The table read using read_excel should have three columns (`Unnamed: 0`, `col1`, `col2`) but just has two
new_tab = pd.read_excel('tmp_can_del.xlsx', index_col=None)
print('Read:\n', new_tab)
print(new_tab.columns)

Problem description

The index_col option in pd.read_excel seems not to behave properly when the first column name is missing (for example, when a file is written using to_excel(..., index=True)).

Example 1
We end up with a DataFrame where the data for col2 was originally the data below col1. This does not seem intuitive.

Original:
   col1 col2
0    a    x
1    b    y
Read:
    col1 col2
x     0    a
y     1    b

Example 2
It seems like the index_col=None option does not work. The first column of the file is treated as the DataFrame index.

Original:
    col1 col2
i1    a    x
i2    b    y
Read:
    col1 col2
i1    a    x
i2    b    y
Index(['col1', 'col2'], dtype='object')

Expected Output

Example 1
If column names are not specified, I would expect the data assigned to col1 to be that below col1 in the original excel file (same for col2). Something like this:

Read:
    col1 col2
x     a    x
y     b    y

Example 2
I would expect the newly read tab to look somewhat like this (as is the case when doing the same operations using write_csv and read_csv):

   Unnamed: 0 col1 col2
0           i1    a    x
1           i2    b    y

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0.21.1
pytest: 3.1.2
pip: 9.0.1
setuptools: 37.0.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.7.3
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added the IO Excel read_excel, to_excel label Dec 15, 2017
@Leobouloc Leobouloc changed the title read_excel weird behavior using index_col bug using read_excel and index_col option Feb 15, 2018
@Leobouloc
Copy link
Author

Re-read my issue; I believe this should be tagged as a bug.

Cheers!

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018
@gfyoung gfyoung added Bug Testing pandas testing functions or related to the test suite and removed Bug labels Nov 23, 2018
@jreback jreback added this to the 0.24.0 milestone Nov 23, 2018
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 25, 2018
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2018
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2018
@slavanorm
Copy link

slavanorm commented May 24, 2021

hi guys, i have this issue, but only when running coverage.

i used [ python -m pytest ] which eventually reads attached file without index imported from file.

then i tried to use [ coverage -m pytest ], which does import it from file.
Example:

df = pd.read_excel('index_bug.xlsx')
try:
df.drop([0]) # index i awaited
except:
raise Exception(df.index)

Result:

E Exception: Index(['Sr', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
E 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
E 24, 25, 26, 27, 28, 29],
E dtype='object')

pip freeze:
pandas=1.1.0
xlrd=1.1.0
coverage=5.1

index_bug.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

4 participants