Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug using `read_excel` and `index_col` option #18792

Closed
Leobouloc opened this issue Dec 15, 2017 · 1 comment

Comments

Projects
None yet
3 participants
@Leobouloc
Copy link

commented Dec 15, 2017

Code Sample, a copy-pastable example if possible

Example 1 (weird column/value assignment)

import pandas as pd

# Writing excel file with no column name in first column
tab = pd.DataFrame([['a', 'x'], ['b', 'y']], columns=['col1', 'col2'])
print('Original:\n', tab)
tab.to_excel('tmp_can_del.xlsx')

# The table read using read_excel assigns values of col1 to col2
new_tab = pd.read_excel('tmp_can_del.xlsx', index_col=2)
print('Read:\n', new_tab)

Example 2 (index_col=None seems not to work)

import pandas as pd

# Writing excel file with no column name in first column
tab = pd.DataFrame([['a', 'x'], ['b', 'y']], columns=['col1', 'col2'], index=['i1', 'i2'])
print('Original:\n', tab)
tab.to_excel('tmp_can_del.xlsx')

# The table read using read_excel should have three columns (`Unnamed: 0`, `col1`, `col2`) but just has two
new_tab = pd.read_excel('tmp_can_del.xlsx', index_col=None)
print('Read:\n', new_tab)
print(new_tab.columns)

Problem description

The index_col option in pd.read_excel seems not to behave properly when the first column name is missing (for example, when a file is written using to_excel(..., index=True)).

Example 1
We end up with a DataFrame where the data for col2 was originally the data below col1. This does not seem intuitive.

Original:
   col1 col2
0    a    x
1    b    y
Read:
    col1 col2
x     0    a
y     1    b

Example 2
It seems like the index_col=None option does not work. The first column of the file is treated as the DataFrame index.

Original:
    col1 col2
i1    a    x
i2    b    y
Read:
    col1 col2
i1    a    x
i2    b    y
Index(['col1', 'col2'], dtype='object')

Expected Output

Example 1
If column names are not specified, I would expect the data assigned to col1 to be that below col1 in the original excel file (same for col2). Something like this:

Read:
    col1 col2
x     a    x
y     b    y

Example 2
I would expect the newly read tab to look somewhat like this (as is the case when doing the same operations using write_csv and read_csv):

   Unnamed: 0 col1 col2
0           i1    a    x
1           i2    b    y

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0.21.1
pytest: 3.1.2
pip: 9.0.1
setuptools: 37.0.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.7.3
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added the IO Excel label Dec 15, 2017

@Leobouloc Leobouloc changed the title read_excel weird behavior using index_col bug using `read_excel` and `index_col` option Feb 15, 2018

@Leobouloc

This comment has been minimized.

Copy link
Author

commented Feb 15, 2018

Re-read my issue; I believe this should be tagged as a bug.

Cheers!

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018

@gfyoung gfyoung added Bug Testing and removed Bug labels Nov 23, 2018

@jreback jreback added this to the 0.24.0 milestone Nov 23, 2018

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 23, 2018

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 25, 2018

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2018

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.