read_table / read_csv with duplicate names leads to column duplication #10496

jens-k · 2015-07-03T13:35:55Z

When reading a table while specifying duplicate column names - let's say two different names - pandas 0.16.1 will copy the last two columns of the data over and over again.

I opened a thread on this here:
http://stackoverflow.com/questions/31207560/pandas-read-table-with-duplicate-names

Is this a bug or an intended behavior?

jreback · 2015-07-03T13:38:59Z

pls show a self-contained example reproducing the behavior

jens-k · 2015-07-03T14:00:02Z

tbl.csv:
https://klinzing.blaucloud.de/index.php/s/gfkz8Za41tQ6E12

[In:]
df = pd.read_table('tbl.csv', header=0, names=['one','two','one','two','one']) 
df

...gives you:

[Out:]
one two one two one
0   0.132846    0.120522    0.132846    0.120522    0.132846

...rather than:

[Out:]
one two one two one
0   0.117766   0.058881   0.127572   0.120522   0.13286

i.e. it repeats the number of columns equal to the number of unique given names, starting from the right.

jreback · 2015-07-03T14:06:04Z

@forodin23 by self-contained I mean simple code, not having to download a file etc.

In [4]: df = DataFrame({'A' : [1,2], 'B' : [3,4], 'C' : [5,6]})

In [5]: df.to_csv('test.csv',mode='w')     

In [6]: pd.read_csv('test.csv',index_col=0)
Out[6]: 
   A  B  C
0  1  3  5
1  2  4  6

In [7]: pd.read_csv('test.csv',index_col=0,names=['one','two','one'])
Out[7]: 
    one two one
NaN   C   B   C
 0    5   3   5
 1    6   4   6

In answer to your question there is nothing pandas can do about this. Its not obvious that what you are doing is wrong.

If you want to actually have a column hierarchy, much better to use a MultiIndex, or if you really really want duplicate columns (this is not recommended and use at your own risk), then simply assign them after.

jens-k · 2015-07-03T14:07:34Z

Why don't you just give an error message if someone tries to use duplicate names? The problem here is that pandas silently changes your data.

jreback · 2015-07-03T14:14:12Z

In [2]: df = DataFrame({'A' : [1,2], 'B' : [3,4], 'C' : [5,6]})

In [3]: df.columns=['one','two','one']

In [4]: df.to_csv('test.csv',mode='w')

In [5]: !cat test.csv
,one,two,one
0,1,3,5
1,2,4,6

In [8]: pd.read_csv('test.csv',index_col=0,names=['one','two','one'],header=0)
Out[8]: 
   one  two  one
0    5    3    5
1    6    4    6

hmm also a problem here.

Ok this might be an older bug. I suppose a case could be made for raising here (or rather assigning by position if the names match up). There are several cases to investigate.

pull-request?

jens-k · 2015-07-03T14:20:54Z

A pull request by me? Sorry, I'm using python for about a day (coming from matlab). I don't feel equipped to fix that myself :)

jreback · 2015-07-03T14:24:12Z

ok, np. always a good way to learn though :)

see here for guidelines

jreback · 2015-07-17T16:15:55Z

dupe of #9424

jreback closed this as completed Jul 3, 2015

jreback added Usage Question IO CSV read_csv, to_csv labels Jul 3, 2015

jreback reopened this Jul 3, 2015

jreback added Difficulty Novice Error Reporting Incorrect or improved errors from pandas and removed Usage Question labels Jul 3, 2015

jreback added this to the Next Major Release milestone Jul 3, 2015

jreback closed this as completed Jul 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_table / read_csv with duplicate names leads to column duplication #10496

read_table / read_csv with duplicate names leads to column duplication #10496

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jreback commented Jul 17, 2015

read_table / read_csv with duplicate names leads to column duplication #10496

read_table / read_csv with duplicate names leads to column duplication #10496

Comments

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jens-k commented Jul 3, 2015

jreback commented Jul 3, 2015

jreback commented Jul 17, 2015