Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dates are parsed with read_csv thousand seperator #4678

Closed
hayd opened this issue Aug 26, 2013 · 5 comments · Fixed by #4945

Comments

@hayd
Copy link
Contributor

commented Aug 26, 2013

When reading a csv with a date column, the date is sometimes parsed as a number:

In [1]: s = '06.02.2013;13:00;1.000,215;0,215;0,185;0,205;0,00'

In [2]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, index_col=0, decimal=',', thousands='.')
Out[2]:
                        2      3      4      5  6
Dates
6022013 13:00   1.000,215  0.215  0.185  0.205  0

Here 06.02.2013 is read as a number 0602013 before the date is parsed (which fails)... I think dates are sometimes written this way on the continent (along with . thousands).

This was found in #4322 (but that issue was more about . being ignored), I guess another test case would be with -:

In [3]: s = '06-02-2013;13:00;1.000,215;0,215;0,185;0,205;0,00'

In [4]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, decimal=',', thousands='-')
Out[4]: 
           Dates          2      3      4      5  6
0  6022013 13:00  1.000,215  0.215  0.185  0.205  0

@jreback suggests:

but it should ignore dates columns entirely (for thousands parsing...)

cc #4598 @guyrt

@guyrt

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2013

I'm not an expert on this IO code just yet, but it would seem that maybe the numeric parser is running first? In that case, we wouldn't even try the datetime converter, would we?

https://github.com/pydata/pandas/blob/master/pandas/parser.pyx#L1648

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2013

things are parsed (with thousands/decimal substituions) then passed to the dtype converter (and na converter), so I think this would have to change based on if parse_dates is True for a particular column; might be tricky (or not)

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2013

@guyrt having a look at this?

@guyrt

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2013

@jreback I am. Got sidetracked on a few other things, but I'll carve out some time to look at it over the next few days. What I know so far is that the second example works on the python parser. It's not clear yet what is causing it to fail on the c parser but I'll keep digging.

The first example is a problem with the date parser, which doesn't parse the day part correctly.

@guyrt

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2013

Fix for C parser submitted, but I found an error in Python parser as well. That one will come in next commit.

#4945

guyrt added a commit to guyrt/pandas that referenced this issue Sep 23, 2013
BUG: Conflict between thousands sep and date parser.
Fixes issue where thousands separator could conflict with date
parsing.

This is only fixed in the C parser.

Closes issue pandas-dev#4678
guyrt added a commit to guyrt/pandas that referenced this issue Sep 24, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.