New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextIOWrapper.tell extremely slow #55323
Comments
file.tell() has become extremely slow in version 3.2, both rc1 and rc2. This problem did not exist in version 2.7.1, nor in version 3.1. It could be reproduced both on mac and windows xp. |
Do you have a benchmark program you can post? |
Correction: the problem also exists in version 3.1. I created a benchmark program an ran it on my machine (iMac, snow leopard 10.6), with the following results: ------------------------------------------ Removing the line containing "tell" gives the following results: ------------------------------------------ (Apparently, reading a file became a lot faster from 3.1 to 3.2.) Conclusion: Execution of file.tell() makes the program about 10000 times slower. Remark: the file mdutch.txt is a dummy text file containing 1000 lines with one word on each line. |
I found that adding "infile._CHUNK_SIZE = 20" makes the test much faster - 'only' 5 times slower than 2.7. |
With a similar setup (OS X 10.6) I see the same problem. It seems to go away if the file is opened in binary mode for reading. @laurens, can you confirm? |
(By "go away" I mean "stop being pathological", not "stop differing": I still see a factor of 2.) |
That's expected. seek() and tell() on text (unicode) files are slow by construction. You should open your file in binary mode instead, if you want to do any seeking. Maybe I should add a note in http://docs.python.org/dev/library/io.html#performance |
That said, I think it is possible to make algorithmic improvements to TextIOWrapper.tell() so that at least performance becomes acceptable. |
First of all, thanks to all for your cooperation, it is very much appreciated. I made some minor changes to the benchmark program. Conclusions are:
|
Hint: b'' is not equal to '' ;) |
Here is a proof-of-concept patch for the pure Python version of TextIOWrapper.tell(). It turns the O(CHUNK_SIZE) operation into an O(1) operation most of time (still O(CHUNK_SIZE) worst-case - weird decoders and/or crazy input). |
Actually, that's wrong. The patch is still O(CHUNK_SIZE) but with a |
New patch also optimizing the C version. tell() can be more than 100x faster now (still much slower than binary tell()). |
All, thanks for your help. Opening the file in binary mode worked immediately in the toy program (that is, the benchmark code I sent you). (Antoine, thanks for the hint.) In my real world program, I solved the problem by reading a line from a binary input file, and decode it explicitly and immediately to a string. The performance has become acceptable now, and I propose to close this issue. Thanks again, cheers, Laurens |
Committed in r88607 (3.3). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: