New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infer datetime format #6021
Infer datetime format #6021
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -500,6 +500,40 @@ a single date rather than the entire array. | |
|
||
.. _io.dayfirst: | ||
|
||
|
||
Inferring Datetime Format | ||
~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
If you have `parse_dates` enabled for some or all of your columns, and your | ||
datetime strings are all formatted the same way, you may get a large speed | ||
up by setting `infer_datetime_format=True`. If set, pandas will attempt | ||
to guess the format of your datetime strings, and then use a faster means | ||
of parsing the strings. 5-10x parsing speeds have been observed. Pandas | ||
will fallback to the usual parsing if either the format cannot be guessed | ||
or the format that was guessed cannot properly parse the entire column | ||
of strings. So in general, `infer_datetime_format` should not have any | ||
negative consequences if enabled. | ||
|
||
Here are some examples of datetime strings that can be guessed (All | ||
representing December 30th, 2011 at 00:00:00) | ||
|
||
"20111230" | ||
"2011/12/30" | ||
"20111230 00:00:00" | ||
"12/30/2011 00:00:00" | ||
"30/Dec/2011 00:00:00" | ||
"30/December/2011 00:00:00" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you make a list of this? (just |
||
|
||
`infer_datetime_format` is sensitive to `dayfirst`. With `dayfirst=True`, it | ||
will guess "01/12/2011" to be December 1st. With `dayfirst=False` (default) | ||
it will guess "01/12/2011" to be January 12th. | ||
|
||
.. ipython:: python | ||
|
||
# Try to infer the format for the index column | ||
df = pd.read_csv('foo.csv', index_col=0, parse_dates=True, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. foo.csv has been removed before (under 'Specifying date columns'), so you will have to move that remove below this. |
||
infer_datetime_format=True) | ||
|
||
|
||
International Date Formats | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
While US date formats tend to be MM/DD/YYYY, many international formats use | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,6 +107,20 @@ Enhancements | |
result | ||
result.loc[:,:,'ItemA'] | ||
|
||
- Added optional `infer_datetime_format` to `read_csv`, `Series.from_csv` and | ||
`DataFrame.read_csv` (:issue:`5490`) | ||
|
||
If `parse_dates` is enabled and this flag is set, pandas will attempt to | ||
infer the format of the datetime strings in the columns, and if it can | ||
be inferred, switch to a faster method of parsing them. In some cases | ||
this can increase the parsing speed by ~5-10x. | ||
|
||
.. ipython:: python | ||
|
||
# Try to infer the format for the index column | ||
df = pd.read_csv('foo.csv', index_col=0, parse_dates=True, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
infer_datetime_format=True) | ||
|
||
Experimental | ||
~~~~~~~~~~~~ | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous sections, parse_dates is always written with double backtick quotation (
parse_dates
). This will render as code, while single backtick as italic.