Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError shows incorrect column name when DataFrame has duplicate columns #13822

Closed
Wilfred opened this issue Jul 27, 2016 · 3 comments
Closed
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@Wilfred
Copy link
Contributor

Wilfred commented Jul 27, 2016

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [1.], 'y': [2.], 'z': [3.]})
>>> df.columns = ['x', 'x', 'z']
>>> df[['x', 'y', 'z']]
KeyError: "['z'] not in index"

I expected to see KeyError: "['y'] not in index".

I've tested this on the latest code in master (and on 0.16):

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-400.1.1.el5
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB
LOCALE: None.None

pandas: 0.18.1+279.g31f8e4d
nose: None
pip: 1.3.1
setuptools: 0.6
Cython: 0.22
numpy: 1.9.2
scipy: None
statsmodels: None
xarray: None
IPython: 3.2.0-1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas Effort Low labels Jul 27, 2016
@TomAugspurger
Copy link
Contributor

Thanks for the report.

The bug is here. The code assumes that they're the same length, which is true if there aren't any duplicates.

Shouldn't be too hard to fix.

@annkia
Copy link

annkia commented Sep 8, 2018

I've got the same problem as you. I've observed that if I change the data in .csv format in OpenOffice program then the error occurs. Instead of that I've downloaded the data from the Internet and I edited the data in simple Notepad++ editor. Then it works normally. I know that perhaps this solution doesn't help in you case, but maybe you should change the text editor or program that supports .csv files.

@nsharma981
Copy link

Hi, I am new to Python and i am Using Python 3.7.3 64-bit | Qt 5.9.6 | PyQt5 5.9.2 | Windows 10
PANDAS(0.24.2).
I was practicing Data/File Handling things. I Was loading file and post which i wanted to convert float columns in to specific 2 decimal . I did google and found this below code which works
pd.options.display.float_format = lambda x : '{:.0f}'.format(x) if round(x,0) == x else '{:,.2f}'.format(x)
but after this code even opening new file and loading new data by default all the float data is coming in 2 decimal format.. Is there any environment i have to change in spyder to set it to default ? or what should i do ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants