Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
read_csv clobbers values of columns with duplicate names #9424
xref #10577 (has test for duplicates with empty data)
I don't expect this is the correct behavior, although it's always possible I'm doing something wrong. Importing data using the
from StringIO import StringIO import pandas as pd data = """a,1 b,2 c,3""" names = ['field', 'field'] print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=True) print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=False)
However, this produces the correct result:
df = pd.read_csv(StringIO(data), header=None) df.columns = names print df
Interestingly, it works if the field names are in the header:
data_with_header = "field,field\n" + data print pd.read_csv(StringIO(data_with_header))
Is this a bug or am I doing something wrong?
I've came across something similar. When using
import pandas as pd from StringIO import StringIO data = """A,A,B,B,B 1,2,3,4,5 6,7,8,9,10 11,12,13,14,15 """ df1 = pd.read_table(StringIO(data), sep=',', mangle_dupe_cols=True) df2 = pd.read_table(StringIO(data), sep=',', mangle_dupe_cols=False)
which has the original data but non-duplicate column names;
which has duplicate column names but their respecrive data has been overriden.
Reproducible bug in IPython notebook: http://nbviewer.ipython.org/github/yoavram/ipython-notebooks/blob/master/pandas%20duplicate%20column%20bug.ipynb
Pandas version 0.16.0. Python 2.7.