Skip to content

BUG: read_csv() does strange things if data rows have trailing delimiter #14124

@jzwinck

Description

@jzwinck

Given a file t.txt like this:

A,B,C
1,4,7,
2,5,8,
3,6,9,

pandas.read_csv('t.txt') produces:

   A  B   C
1  4  7 NaN
2  5  8 NaN
3  6  9 NaN

In other words, it implies a nonexistent header column for what it believes is the index. I would expect it to do the same as this code:

pd.read_csv('t.txt', index_col=False).set_index('A')

Which is:

   B  C
A      
1  4  7
2  5  8
3  6  9

The first example has the same problem even if you specify index_col=0, which is confusing because saying the first column should be used as the index is pretty much what's being done in the second example.

Files like this are commonly produced by vendors like Bloomberg whose formats we cannot expect to change.

Pandas 0.18.1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions