read_csv: European numbers do not work with dates #14066

dacoex · 2016-08-22T21:42:16Z

Code Sample, a copy-pastable example if possible

Using the following reader leads to omission of the dates resulting in no index:

file_, index_col=0,
                   header=4, sep=';', quoting=1, parse_dates=True,
                   encoding='latin-1', dayfirst=True, 
#                   decimal=',', thousands='.'

Expected Output

dataframe with pyhon numeric data and datecol as index

output of `pd.show_versions()`


pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-31-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2016-08-22T21:44:29Z

Can you post some example data and the actual read_csv, and the output you're getting? You can use the StringIO module https://docs.python.org/3/library/io.html#io.StringIO to paste the data in. Thanks.

dacoex · 2016-08-23T20:42:14Z

@TomAugspurger thanks for the response.

See also:
https://github.com/balzer82/DKB-Kontoauszug-Visualizer/blob/master/DKB-Kontoauszug-Visualizer.py#L35

Here goes the example:

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd


TESTDATA=StringIO("""
 ; 
 ; 
 ; 
 ; 
 ; 
 ; 
day;transaction value (EUR) anon
04.08.2016;33043,5217026458
03.08.2016;52542,924322369
01.08.2016;2470,86389648009
01.08.2016;124462,590366738
01.08.2016;82283,2571462275
01.08.2016;5508278,98359258
01.08.2016;154264,547309073
01.08.2016;38068,8061252655
01.08.2016;10444,4806476402
01.08.2016;13958,4753522405
01.08.2016;4834682,43989406
29.07.2016;-1247752,81935378
29.07.2016;157975,468518703
    """)

df = pd.read_csv(TESTDATA, index_col=0,
                   header=4, sep=';', quoting=1, parse_dates=True,
                   encoding='latin-1', dayfirst=True, 
                   decimal=',',
                   thousands='.',
                   )

gfyoung · 2016-08-24T06:27:31Z

A couple of things:

from pandas.compat import StringIO takes care of the Python 2/3 thing FYI.
I believe you have the header parameter wrong. I think it should be header=6.

However, even with that parameter fixed, the reason you're seeing the index_col not being parsed as a date is because your thousands='.' is confusing the parser into parsing the '.' as a numeric value.

That is a bug, so thank you for pointing it out!

The issue actually has nothing to do with European date formats. You can see the bug surfaced here with a much more simplified example:

>>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>>
>>> data = 'a\n04.15.2016'
>>> read_csv(StringIO(data), index_col=0, parse_dates=True, thousands='.')
Empty DataFrame
Columns: []
Index: [4152016]  # WRONG
>>>
>>> read_csv(StringIO(data), index_col=0, parse_dates=True)
Empty DataFrame
Columns: []
Index: [2016-04-15 00:00:00]  # RIGHT

Note that this bug does not affect non-index columns:

>>> read_csv(StringIO(data), parse_dates=['a'], thousands='.')
           a
0 2016-04-15

Similar observations can be made with the Python parser.

dacoex · 2016-08-24T06:43:05Z

@gfyoung thanks for confirmation.

minor clarification:

The issue actually has nothing to do with European date formats.

I was referring to European data as under: Quoting, Compression, and File Format

No idea how to go on from here but looks like the processing priorities need to be changed in the parser.

gfyoung · 2016-08-24T06:44:37Z

@dacoex : Ah, okay. Good to know that my minimal example is capturing the issue you were seeing!

When a thousands parameter is specified, if the index column data contains that thousands value for date purposes (e.g. '.'), do not interpret those characters as the thousands parameter. Closes pandas-devgh-14066.

When a thousands parameter is specified, if the index column data contains that thousands value for date purposes (e.g. '.'), do not interpret those characters as the thousands parameter. Closes gh-14066.

dacoex · 2016-08-27T14:18:10Z

Big thanks to @jorisvandenbossche & @gfyoung FOSS is great!

dacoex · 2016-08-27T14:22:43Z

BTW, to we need to add this to:
http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#v0-19-0-august-2016

jreback · 2016-08-27T14:23:48Z

9d10b76

jreback · 2016-08-27T14:24:10Z

it's takes a while to actually generate those docs

dacoex · 2016-09-13T19:45:06Z

I tested with the v0.19 RC and it works with the original data. Thanks again!

TomAugspurger added the IO CSV read_csv, to_csv label Aug 22, 2016

gfyoung mentioned this issue Aug 24, 2016

BUG: Don't parse index column as numeric when parse_dates=True #14077

Merged

jreback added the Bug label Aug 25, 2016

jreback added this to the 0.19.0 milestone Aug 25, 2016

jorisvandenbossche closed this as completed in #14077 Aug 27, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv: European numbers do not work with dates #14066

read_csv: European numbers do not work with dates #14066

dacoex commented Aug 22, 2016 •

edited

Loading

TomAugspurger commented Aug 22, 2016 •

edited

Loading

dacoex commented Aug 23, 2016 •

edited

Loading

gfyoung commented Aug 24, 2016 •

edited

Loading

dacoex commented Aug 24, 2016 •

edited

Loading

gfyoung commented Aug 24, 2016

dacoex commented Aug 27, 2016

dacoex commented Aug 27, 2016

jreback commented Aug 27, 2016

jreback commented Aug 27, 2016

dacoex commented Sep 13, 2016

read_csv: European numbers do not work with dates #14066

read_csv: European numbers do not work with dates #14066

Comments

dacoex commented Aug 22, 2016 • edited Loading

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

TomAugspurger commented Aug 22, 2016 • edited Loading

dacoex commented Aug 23, 2016 • edited Loading

gfyoung commented Aug 24, 2016 • edited Loading

dacoex commented Aug 24, 2016 • edited Loading

gfyoung commented Aug 24, 2016

dacoex commented Aug 27, 2016

dacoex commented Aug 27, 2016

jreback commented Aug 27, 2016

jreback commented Aug 27, 2016

dacoex commented Sep 13, 2016

dacoex commented Aug 22, 2016 •

edited

Loading

output of `pd.show_versions()`

TomAugspurger commented Aug 22, 2016 •

edited

Loading

dacoex commented Aug 23, 2016 •

edited

Loading

gfyoung commented Aug 24, 2016 •

edited

Loading

dacoex commented Aug 24, 2016 •

edited

Loading