Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
to_datetime 1000x slower for timezone-aware strings vs timezone-agnostic #9714
Comments
|
Indeed, this is a known issue: pandas does not have a timezone aware |
|
This is actually a different issue no TZ specified
What you gave; this fallsback to dateutil parsing, which is why its so slow, going
Try this
The space before the fixed TZ designation throws this off. Its actually an easy fix if you want to look (see Further it is slowed down relative to non-tz strings because the TZ has to be interpreted for each string. This could be cached actually. So this is point 2 of speedups. |
jreback
added Enhancement Performance Timezones
labels
Mar 24, 2015
jreback
added this to the
Next Major Release
milestone
Mar 24, 2015
wetchler
commented
Mar 24, 2015
|
Interesting -- thanks for the tips. Removing the space works, though I'll have to just be vigilant for now about what format csvs are automatically dumped to (in my case I believe the dataset is from a mysql dump). Cheers. |
wetchler commentedMar 24, 2015
When converting a string date column to datetime, if the string has a GMT timezone suffix (e.g. "-0800"), it takes 1000x longer to parse:
Note microseconds vs milliseconds. 3 orders of magnitude... seems unnecessary. This can make loading CSVs into correctly-typed dataframes very, very, very slow for large datasets.