Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime.strptime slow #59533

Closed
LarsNordin mannequin opened this issue Jul 11, 2012 · 4 comments
Closed

datetime.strptime slow #59533

LarsNordin mannequin opened this issue Jul 11, 2012 · 4 comments
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage

Comments

@LarsNordin
Copy link
Mannequin

LarsNordin mannequin commented Jul 11, 2012

BPO 15328
Nosy @abalkin, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-07-13.20:40:40.896>
created_at = <Date 2012-07-11.14:00:20.692>
labels = ['extension-modules', 'performance']
title = 'datetime.strptime slow'
updated_at = <Date 2012-07-14.12:50:28.054>
user = 'https://bugs.python.org/LarsNordin'

bugs.python.org fields:

activity = <Date 2012-07-14.12:50:28.054>
actor = 'eric.araujo'
assignee = 'none'
closed = True
closed_date = <Date 2012-07-13.20:40:40.896>
closer = 'r.david.murray'
components = ['Extension Modules']
creation = <Date 2012-07-11.14:00:20.692>
creator = 'Lars.Nordin'
dependencies = []
files = []
hgrepos = []
issue_num = 15328
keywords = []
message_count = 4.0
messages = ['165256', '165257', '165258', '165418']
nosy_count = 4.0
nosy_names = ['belopolsky', 'r.david.murray', 'tshepang', 'Lars.Nordin']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue15328'
versions = ['Python 3.4']

@LarsNordin
Copy link
Mannequin Author

LarsNordin mannequin commented Jul 11, 2012

The datetime.strptime works well enough for me it is just slow.

I recently added a comparison to a log parsing script to skip log lines earlier than a set date. After doing so my script ran much slower.
I am processing 4,784,212 log lines in 1,746 files.

Using Linux "time", the measured run time is:
real 5m12.884s
user 4m54.330s
sys 0m2.344s

Altering the script to cache the datetime object if the date string is the same, reduces the run time to:
real 1m3.816s
user 0m49.635s
sys 0m1.696s

# --- code snippet ---
# start_dt calculated at script start
...
day_dt = datetime.datetime.strptime(day_str, "%Y-%m-%d")
if day_dt < start_dt:
...

$ python
import platform
print 'Version      :', platform.python_version()
print 'Version tuple:', platform.python_version_tuple()
print 'Compiler     :', platform.python_compiler()
print 'Build        :', platform.python_build()

Version : 2.7.2+
Version tuple: ('2', '7', '2+')
Compiler : GCC 4.6.1
Build : ('default', 'Oct 4 2011 20:03:08')

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 11.10
Release:        11.10
Codename:       oneiric

@LarsNordin LarsNordin mannequin added the performance Performance or resource usage label Jul 11, 2012
@LarsNordin
Copy link
Mannequin Author

LarsNordin mannequin commented Jul 11, 2012

Running the script without any timestamp comparison (and parsing more log lines), gives these performance numbers:

log lines: 7,173,101

time output:
real 1m9.892s
user 0m53.563s
sys 0m1.592s

@bitdancer
Copy link
Member

Thanks for the report. However, do you have a patch to propose? Otherwise I'm not sure there is a reason to keep this issue open...one can always say various things are slow; that by itself is not a bug. Performance enhancement patches are welcome, though.

If you are proposing adding an LRU cache, I think it may be that that should be left up to the application, as you did in your case. I'm not convinced there would be enough general benefit to make it worth adding to the stdlib, since the characteristics of date parsing workloads probably vary widely.

@bitdancer
Copy link
Member

If someone wants to propose a patch we can reopen the issue.

@merwok merwok added the extension-modules C modules in the Modules dir label Jul 14, 2012
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants