You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran in to this bug while trying to parse the few dates through parse_dates of pandas.read_csv(). In the following code snippet, I'm trying to parse dates that have format dd/mm/yy which is resulting me an improper conversion. For some cases, the date field is considered as month and vice versa.
To keep it simple, for some cases dd/mm/yy get converted to yyyy-dd-mm instead of yyyy-mm-dd.
Case 1:
04/10/96 is parsed as 1996-04-10, which is wrong.
Case 2:
15/07/97 is parsed as 1997-07-15, which is correct.
I could use date_parser or pandas.to_datetime() to specify the proper format for date. But in my case, I have few date fields like ['//1997', '/02/1967'] for which I need to convert ['01/01/1997','01/02/1967']. The parse_dates helps me converting those type of date fields to the expected format without making me to write extra line of code.
The text was updated successfully, but these errors were encountered:
itsmesaranya
changed the title
read_csv not correctly parsing date/month field when set parse_date = ['column name']
pd.read_csv not correctly parsing date/month field when set parse_date = ['column name']
May 3, 2016
itsmesaranya
changed the title
pd.read_csv not correctly parsing date/month field when set parse_date = ['column name']
BUG: pd.read_csv not correctly parsing date/month field when set parse_date = ['column name']
May 3, 2016
As was commented over there by ajcr, this is not a bug. The date format you are using ("dd/mm/yy") is not an international standard format (but a European one), and the default of the parser is to try to parse such dates as "mm/dd/yy" (more an American format I think). However, if this gives an error, it will try the other format. This is the reason you get inconsistent results between the rows in the csv.
You can pass dayfirst=True to to_datetime, and then the preference will be given to "dd/mm/yy" instead of to "mm/dd/yy".
Typically you can also use format="%d/%m/%y", however (as you note) this will not work with the "//1994" dates (since this is another format).
I ran in to this bug while trying to parse the few dates through
parse_dates
ofpandas.read_csv()
. In the following code snippet, I'm trying to parse dates that have formatdd/mm/yy
which is resulting me an improper conversion. For some cases, the date field is considered as month and vice versa.To keep it simple, for some cases
dd/mm/yy
get converted toyyyy-dd-mm
instead ofyyyy-mm-dd
.Case 1:
Case 2:
Case 3:
Code Sample
Current Output
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US
pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 18.5
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None
More Comments:
I could use
date_parser
orpandas.to_datetime()
to specify the proper format for date. But in my case, I have few date fields like ['//1997', '/02/1967'] for which I need to convert ['01/01/1997','01/02/1967']. Theparse_dates
helps me converting those type of date fields to the expected format without making me to write extra line of code.parse_date_bug.zip
The text was updated successfully, but these errors were encountered: