KeyError shows incorrect column name when DataFrame has duplicate columns #13822

Wilfred · 2016-07-27T19:02:37Z

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [1.], 'y': [2.], 'z': [3.]})
>>> df.columns = ['x', 'x', 'z']
>>> df[['x', 'y', 'z']]
KeyError: "['z'] not in index"

I expected to see KeyError: "['y'] not in index".

I've tested this on the latest code in master (and on 0.16):

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-400.1.1.el5
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB
LOCALE: None.None

pandas: 0.18.1+279.g31f8e4d
nose: None
pip: 1.3.1
setuptools: 0.6
Cython: 0.22
numpy: 1.9.2
scipy: None
statsmodels: None
xarray: None
IPython: 3.2.0-1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2016-07-27T19:26:09Z

Thanks for the report.

The bug is here. The code assumes that they're the same length, which is true if there aren't any duplicates.

Shouldn't be too hard to fix.

annkia · 2018-09-08T12:39:42Z

I've got the same problem as you. I've observed that if I change the data in .csv format in OpenOffice program then the error occurs. Instead of that I've downloaded the data from the Internet and I edited the data in simple Notepad++ editor. Then it works normally. I know that perhaps this solution doesn't help in you case, but maybe you should change the text editor or program that supports .csv files.

nsharma981 · 2020-02-14T10:34:14Z

Hi, I am new to Python and i am Using Python 3.7.3 64-bit | Qt 5.9.6 | PyQt5 5.9.2 | Windows 10
PANDAS(0.24.2).
I was practicing Data/File Handling things. I Was loading file and post which i wanted to convert float columns in to specific 2 decimal . I did google and found this below code which works
pd.options.display.float_format = lambda x : '{:.0f}'.format(x) if round(x,0) == x else '{:,.2f}'.format(x)
but after this code even opening new file and loading new data by default all the float data is coming in 2 decimal format.. Is there any environment i have to change in spyder to set it to default ? or what should i do ?

TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas Effort Low labels Jul 27, 2016

TomAugspurger added this to the 0.19.0 milestone Jul 27, 2016

shawnheide mentioned this issue Jul 29, 2016

BUG: fixes 13822, incorrect KeyError string with non-unique columns w… #13845

Closed

4 tasks

jreback closed this as completed in 768bf49 Aug 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError shows incorrect column name when DataFrame has duplicate columns #13822

KeyError shows incorrect column name when DataFrame has duplicate columns #13822

Wilfred commented Jul 27, 2016

TomAugspurger commented Jul 27, 2016

annkia commented Sep 8, 2018

nsharma981 commented Feb 14, 2020

KeyError shows incorrect column name when DataFrame has duplicate columns #13822

KeyError shows incorrect column name when DataFrame has duplicate columns #13822

Comments

Wilfred commented Jul 27, 2016

TomAugspurger commented Jul 27, 2016

annkia commented Sep 8, 2018

nsharma981 commented Feb 14, 2020