BUG: pd.read_csv not correctly parsing date/month field when set parse_date = ['column name'] #13063

itsmesaranya · 2016-05-03T07:22:43Z

I ran in to this bug while trying to parse the few dates through parse_dates of pandas.read_csv(). In the following code snippet, I'm trying to parse dates that have format dd/mm/yy which is resulting me an improper conversion. For some cases, the date field is considered as month and vice versa.

To keep it simple, for some cases dd/mm/yy get converted to yyyy-dd-mm instead of yyyy-mm-dd.

Case 1:

      04/10/96 is parsed as 1996-04-10, which is wrong.

Case 2:

      15/07/97 is parsed as 1997-07-15, which is correct.

Case 3:

      10/12/97 is parsed as 1997-10-12, which is wrong.

Code Sample

import pandas as pd

df = pd.read_csv('date_time.csv') 
print 'Data in csv:'
print df
print df['start_date'].dtypes

print '----------------------------------------------'

df = pd.read_csv('date_time.csv', parse_dates = ['start_date'])
print 'Data after parsing:'
print df
print df['start_date'].dtypes

Current Output

----------------------
Data in csv:
----------------------
  start_date
0   04/10/96
1   15/07/97
2   10/12/97
3   06/03/99
4     //1994
5   /02/1967
object
----------------------
Data after parsing:
----------------------
  start_date
0 1996-04-10
1 1997-07-15
2 1997-10-12
3 1999-06-03
4 1994-01-01
5 1967-02-01
datetime64[ns]

Expected Output

----------------------
Data in csv:
----------------------
  start_date
0   04/10/96
1   15/07/97
2   10/12/97
3   06/03/99
4     //1994
5   /02/1967
object
----------------------
Data after parsing:
----------------------
  start_date

0 1996-10-04
1 1997-07-15
2 1997-12-10
3 1999-03-06
4 1994-01-01
5 1967-02-01
datetime64[ns]

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 18.5
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None

More Comments:

I could use date_parser or pandas.to_datetime() to specify the proper format for date. But in my case, I have few date fields like ['//1997', '/02/1967'] for which I need to convert ['01/01/1997','01/02/1967']. The parse_dates helps me converting those type of date fields to the expected format without making me to write extra line of code.

parse_date_bug.zip

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-05-03T09:02:37Z

I suppose this related to the SO question: http://stackoverflow.com/questions/36998540/pd-read-csv-not-correctly-parsing-date-month-field-when-set-parse-date-colum

As was commented over there by ajcr, this is not a bug. The date format you are using ("dd/mm/yy") is not an international standard format (but a European one), and the default of the parser is to try to parse such dates as "mm/dd/yy" (more an American format I think). However, if this gives an error, it will try the other format. This is the reason you get inconsistent results between the rows in the csv.

You can pass dayfirst=True to to_datetime, and then the preference will be given to "dd/mm/yy" instead of to "mm/dd/yy".

Typically you can also use format="%d/%m/%y", however (as you note) this will not work with the "//1994" dates (since this is another format).

itsmesaranya changed the title ~~read_csv not correctly parsing date/month field when set parse_date = ['column name']~~ pd.read_csv not correctly parsing date/month field when set parse_date = ['column name'] May 3, 2016

itsmesaranya changed the title ~~pd.read_csv not correctly parsing date/month field when set parse_date = ['column name']~~ BUG: pd.read_csv not correctly parsing date/month field when set parse_date = ['column name'] May 3, 2016

jorisvandenbossche added Timeseries Usage Question labels May 3, 2016

jorisvandenbossche modified the milestones: 0.18.1, No action May 3, 2016

jorisvandenbossche closed this as completed May 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pd.read_csv not correctly parsing date/month field when set parse_date = ['column name'] #13063

BUG: pd.read_csv not correctly parsing date/month field when set parse_date = ['column name'] #13063

itsmesaranya commented May 3, 2016

jorisvandenbossche commented May 3, 2016 •

edited

BUG: pd.read_csv not correctly parsing date/month field when set parse_date = ['column name'] #13063

BUG: pd.read_csv not correctly parsing date/month field when set parse_date = ['column name'] #13063

Comments

itsmesaranya commented May 3, 2016

Case 1:

Case 2:

Case 3:

Code Sample

Current Output

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

More Comments:

jorisvandenbossche commented May 3, 2016 • edited

Output of `pd.show_versions()`

jorisvandenbossche commented May 3, 2016 •

edited