Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make easier to parse european-style dates in to_csv #854

Closed
adamklein opened this issue Mar 4, 2012 · 7 comments
Closed

make easier to parse european-style dates in to_csv #854

adamklein opened this issue Mar 4, 2012 · 7 comments
Labels
Datetime Datetime data dtype Enhancement
Milestone

Comments

@adamklein
Copy link
Contributor

Again from mailing list:

Example:
data1 = '''date;value
01/05/2010;1
15/05/2010;2
31/05/2010;3
'''

df = pd.read_csv(StringIO(data1),sep=";",converters={'date':pd.datetools.dateutil.parser.parse})
print df.to_string()

Returns the first row wrong:
date value
0 2010-01-05 00:00:00 1
1 2010-05-15 00:00:00 2
2 2010-05-31 00:00:00 3

df = pd.read_csv(StringIO(data1),sep=";",converters={'date':lambda x: pd.datetools.dateutil.parser.parse(x, dayfirst=True)})
print df.to_string()

Returns the correct output
date value
0 2010-05-01 00:00:00 1
1 2010-05-15 00:00:00 2
2 2010-05-31 00:00:00 3

When using index_col and parse_dates, you can get also some of the dates that are wrong, some of them correct.

df = pd.read_csv(StringIO(data1),sep=";",index_col=[0],parse_dates=True)
print df.to_string()

Returns the first line also wrong:
value
date
2010-01-05 1
2010-05-15 2
2010-05-31 3

@timmie
Copy link
Contributor

timmie commented Mar 5, 2012

Sometimes, there are also separators such as ".", "-" but I assume this doesn't play a role in here.

@timmie
Copy link
Contributor

timmie commented Mar 5, 2012

Do you want us to supply more date formats as examples to enhance the parser?

@adamklein
Copy link
Contributor Author

@timmie, absolutely, any examples you want supported would be great

@timmie
Copy link
Contributor

timmie commented Mar 7, 2012

Here are some commonly encountered formats:

1/1/2012,0:00:00,180.8
1/1/2012,0:04:00,180.6
1/1/2012,0:08:00,180.8
1/1/2012,0:12:00,180.8
1/1/2012,0:16:00,180.6
1/1/2012,0:20:00,180.5
1/1/2012,0:24:00,180.2
1/1/2012,0:28:00,180.5
1/1/2012,0:32:00,180.2
1/1/2012,0:36:00,180.1



#begindata
date "YYYY-MM-DD" time "hh:mm"
2012-01-01 00:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 01:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 02:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 03:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 04:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 05:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 06:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 07:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 08:00     0.0     0.0     0.0     0.0     0.0     0.0 
2012-01-01 09:00     0.0     0.0     0.0     0.0     0.0     0.0




dd.mm.yy;hh:mm:ss;
********;**************;*******;
13.1.2012;18:00:00;0;
13.1.2012;18:10:00;0;
13.1.2012;18:20:00;0;
13.1.2012;18:30:00;0;
13.1.2012;18:40:00;0;
13.1.2012;18:50:00;0;
13.1.2012;19:00:00;0;
13.1.2012;19:10:00;0;
13.1.2012;19:20:00;0;
13.1.2012;19:30:00;0;
13.1.2012;19:40:00;0;
13.1.2012;19:50:00;0;



Year;Month;Day;Hour;Value;
2012;3;1;01.0000;0;
2012;3;1;03.0000;0;
2012;3;1;03.0000;0;
2012;3;1;04.0000;0;
2012;3;1;05.0000;0;
2012;3;1;06.0000;0;
2012;3;1;07.0000;1;
2012;3;1;08.0000;1;
2012;3;1;09.0000;2;
2012;3;1;10.0000;3;
2012;3;1;11.0000;3;

@timmie
Copy link
Contributor

timmie commented Mar 7, 2012

Often I also see that instead of using hour numbers from 0-23 (like Python datetime) some use 1-24 for storage:

07.03.12 23:20:00      
07.03.12 23:30:00      
07.03.12 23:40:00      
07.03.12 23:50:00      
07.03.12 24:00:00      
08.03.12 00:03:00      
08.03.12 00:20:00      
08.03.12 00:30:00      

The reason is that people what to show with this that the value belongs to the end of the described data interval and not the beginning or middle.
This is important in some natural science observations or engineering when the data represents information integrated over the inverval (e.g. 10min.) period.

@wesm wesm closed this as completed in fc56b64 Apr 15, 2012
@timmie
Copy link
Contributor

timmie commented May 7, 2012

Is #854 (comment) an extra issue?

@timmie
Copy link
Contributor

timmie commented Jun 3, 2012

see also #1296

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement
Projects
None yet
Development

No branches or pull requests

3 participants