Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Lib/tokenize.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
from builtins import open as _builtin_open
from codecs import lookup, BOM_UTF8
import collections
import functools
from io import TextIOWrapper
import itertools as _itertools
import re
Expand Down Expand Up @@ -95,6 +96,7 @@ def _all_string_prefixes():
result.add(''.join(u))
return result

@functools.lru_cache
Copy link
Member

@pablogsal pablogsal Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should limit explicitly (I am aware there is a default of 128) the max size of the cache in this call so this doesn't blow up for some super intensive cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the re module we use the same trick and we have 512 as the max size of the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on the other PR, there's only 5 possible regex strings that go through this call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger. Then is fine, although I would have preferred to manually pre-compile those and avoid importing functools

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my original patch, it increased import time of this module by ~20ms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is importing functools? Also, we can still go that way if we do the compilation lazily, but I don't think is worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is already imported via tokenize => re => functools so 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, then I retract my concern. This is an absolutely better approach. Thanks for following with me :)

def _compile(expr):
return re.compile(expr, re.UNICODE)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve performance of :mod:`tokenize` by 20-30%. Patch by Anthony Sottile.