-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenize spends a lot of time in re.compile(...)
#87180
Comments
I did some profiling (attached a few files here with svgs) of running this script: import io
import tokenize
# picked as the second longest file in cpython
with open('Lib/test/test_socket.py', 'rb') as f:
bio = io.BytesIO(f.read())
def main():
for _ in range(10):
bio.seek(0)
for _ in tokenize.tokenize(bio.readline):
pass
if __name__ == '__main__':
exit(main()) the first profile is before the optimization, the second is after the optimization The optimization takes the execution from ~6300ms to ~4500ms on my machine (representing a 28% - 39% improvement depending on how you calculate it) (I'll attach the pstats and svgs after creation, seems I can only attach one file at once) |
admittedly anecdotal but here's another data point in addition to the profiles attached test.test_tokenize suite before: $ ./python -m test.test_tokenize
.............................................................................. Ran 78 tests in 77.148s OK test.test_tokenize suite after: $ ./python -m test.test_tokenize
.............................................................................. Ran 78 tests in 61.269s OK |
attached out3.pstats / out3.svg which represent the optimization using lru_cache instead |
Just for the record:
The correct answer is 28%, which uses the initial value as the base: (6300-4500)/6300 ≈ 28%. You are starting at 6300ms and speeding it up by 28%: >>> 6300 - 28/100*6300
4536.0 Using 4500 as the base would only make sense if you were calculating a slowdown from 4500ms to 6300ms: we started at 4500 and *increase* the time by 39%: >>> 4500 + 39/100*4500
6255.0 Hope this helps. |
re.compile() already uses caching. But it is less efficient for some reasons. To Steven: the time is *reduced* by 28%, but the speed is *increased* by 39%. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: