-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-44691: efficient backward read operations for IOBase [WIP] #120728
base: main
Are you sure you want to change the base?
Conversation
@nineteendo Ok, I won't force-push anymore :') Heavens told me that it wasn't worth the effort :D |
This comment was marked as resolved.
This comment was marked as resolved.
I would just document to use from io import SEEK_END
file.seek(0, SEEK_END)
data = file.backread() |
Mmh... ok! that's what I'm using currently in the tests but I'm fine with that (though, maybe people would not be really happy in the usability?) |
NB: Test failures are known (it's just because I don't have the C-implementation yet) |
Ah sure. I'm actually writing incrementally and only pushing to save the work. I always forget to update the global objects also... |
The functionalities that need to be implemented in C will be a bit more complicate, so for now, I'll leave the PR as it is. I'd like to have some feedback on the functionality in Python and its implementation so that I don't implement something in C that is not in line with should be decided. @pitrou You were the one that suggested implementing simpler primitives. Here, I decided to implement the same primitives as the |
Lib/_pyio.py
Outdated
bytes_rev_res = bytes(rev_res)[::-1] | ||
if eol is None: | ||
return bytes_rev_res | ||
# reverse the characters in the line, except the new line character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already reversed.
bytes_rev_res = bytes(rev_res)[::-1] | |
if eol is None: | |
return bytes_rev_res | |
# reverse the characters in the line, except the new line character | |
# reverse the characters in the line, except the new line character | |
bytes_rev_res = bytes(rev_res)[::-1] | |
if eol is None: | |
return bytes_rev_res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oupsi, the comment was not updated!
The implementation of |
Per Guido's comment (#44691 (comment)):
This PR is a WIP where I wanted to revive an old issue, namely reverse reading for IO objects in general (I've added a minimal blurb entry just for the CI/CD but I'll update it when the PR is done).
While the original issue is only for lines, #44691 (comment) suggested to have the interface on
BufferedReader
rather thanTextIOWrapper
since [for] full-speed scanning of log files, you probably want to open them in binary mode. Similarly, it was suggested, instead of a full-blown iterator, to implement simpler primitives.It might make sense to have the possibility for backward iteration independently of whether it is a FileIO, BytesIO or BufferedIO. The Python implementation is complete and documented (at the code level) and the C interface is complete only for simple primitives.
The design choices are as follows:
The
backread
method behaves likeread
but reads bytes from right to left (e.g.,abcdef -> fedbca
). It should be roughly equivalent to read the whole buffer in memory, and read from the end (obviously, I don't read the whole buffer first...). A similar observation holds for thebackreadinto
method.There is an equivalent of
readall
that I calledbackreadall
which simply reads the whole buffer from the right. It's mostly used as a shortcut to avoid duplications, e.g., when you callbackread()
without a limit.There is also an equivalent of
readline()
namedbackreadline()
which is a bit different thanbackread()
andbackreadall()
. I'll illustrate the behaviour with an example. Say that the text is"abc\n1234\nthis is the last line"
. Then,backreadline(9)
would return"last line"
andbackreadline()
would return"this is the last line\n"
. As you can see, it's similar to read characters from the right and put them in a LIFO queue, stopping just before we encounter a new line character, process that queue and print '\n' if needed.In practice,
backreadline()
would be used to retrieve the lines of a log file for instance, so I think it would make sense that it behaves like that. I don't really know whether it's preferrable, when halting in the middle of the line, to return the last bytes of the line or to return the first bytes. In the first case, I don't need to read the entire line if the number of bytes I want to read are less than the line's length but in the second case, I need to first know where my line ended in order to read its first n bytes.Tasks
backread()
& co (seek, cursor behavior, etc)1io.IOBase.backreadline
io.IOBase.__reversed__
io.RawIOBase.backread
(should use the abstract backreadinto())io.RawIOBase.backreadall
(should use the abstract backreadinto())io.RawIOBase.backreadinto
(should be abstract)io.BufferedIOBase.backread
(should be abstract)io.BufferedIOBase.backreadinto
(should use the abstract backread())io.BytesIO.backread
(specialized)io.BytesIO.backreadinto
(specialized)io.BytesIO.backreadline
(optional?)io.BufferedReader.backread
(specialized)io.BufferedReader.backreadinto
(specialized)io.BufferedRandom.backread
(should by handled by insideBufferedReader.backread
by flushing writable streams)io.BufferedRandom.backreadinto
(should be handled byBufferedReader.backreadinto()
by flushing writable streams)io.FileIO.backread
(should work only if seekable, specialized)io.FileIO.backreadall
(should work only if seekable, specialized)io.FileIO.backreadinto
(should work only if seekable, specialized)io.TextIOBase.backread
(should be un-supported for now)io.TextIOBase.backreadline
(should be un-supported for now)io.TextIOWrapper.backread
(should be un-supported for now)io.TextIOWrapper.backreadline
(should be un-supported for now)📚 Documentation preview 📚: https://cpython-previews--120728.org.readthedocs.build/
Footnotes
https://github.com/python/cpython/pull/120728#discussion_r1646047980 ↩