Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BufReader should provide a way to seek without dumping the buffer #31100

Open
taralx opened this issue Jan 22, 2016 · 30 comments

Comments

Projects
None yet
@taralx
Copy link
Contributor

commented Jan 22, 2016

BufReader's seek() implementation always dumps the buffer. This is not good for performance, where users may want to skip a variable amount of buffered data -- only BufReader really knows if it's reasonable to move the pointer or not.

Additionally, only BufReader can know if the pointer can be reversed or not -- so there's absolutely no cheap way to "unget" data from BufReader, even if you know it's seekable.

I'd recommend removing this behavior and moving it to unwrap() or another method (sync?), but it's now baked into a stable API.

What are the options now?

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented Jan 22, 2016

(I'm in the process of writing my own BufReader implementation that doesn't do this.)

@steveklabnik steveklabnik added the A-libs label Jan 26, 2016

@neokril

This comment has been minimized.

Copy link

commented Jan 27, 2016

Currently BufReader saves only the offset within buffered data.
As I understand to keep buffer during seek it needs to save "global offset" within inner Reader and update it in all read/seek operations.
It needs this "global offset" because seek operation should return new position in underlying Reader which is not currently saved anywhere.

Another variant is to make a separate function (e.g. buffered_seek) which will use only buffered data, will move only relative to current position and return true/false (true if it was able to move current position inside the buffer; false - nothing happened and user needs to make a separate call to "seek" to actually move the pointer).

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented Jan 27, 2016

I'm not trying to avoid seeking the underlying stream, I'm trying to avoid dumping the buffer. In my implementation, small seeks that fit in the existing buffer cause a call to self.inner.seek(SeekFrom::Current(0)).

@taralx taralx changed the title BufReader's seek implementation is somewhat irritating for performance BufReader should provide a way to seek without dumping the buffer Jan 27, 2016

@neokril

This comment has been minimized.

Copy link

commented Jan 28, 2016

@taralx Could you please share your implementation?

You use self.inner.seek(SeekFrom::Current(0)) to get current file offset. Am I correct?

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2016

Not without getting source approval from my employer, sorry.

Yes, I implement pub fn pos(&mut self) -> io::Result<u64> and then in seek I have snippets like:

let p = try!(self.pos());
self.pos -= back;
return p - back;
@gcarq

This comment has been minimized.

Copy link

commented Oct 22, 2016

Some time ago I had a similiar requirement, I extracted the implementation into a own library: seek_bufread, maybe its useful for you.

@lolgesten

This comment has been minimized.

Copy link

commented Oct 14, 2017

I stumbled upon this today. When decoding mp4 movies, the decoder will do small seeks forward to skip in the region of 100 bytes at a time. I used a BufReader with 65k capacity to avoid my underlying reader doing http requests for every seek, only to find that BufReader isn't helping at all.

I think BufReader could be more useful if it worked from the buffer also when seek fits.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Nov 18, 2017

I would like to see a PR that implements seek(SeekFrom::Current(n)) without dumping the buffer if the offset n is within the buffered data. I may be missing something but I don't think BufReader would need to store a global offset in order to make this work.

bors added a commit that referenced this issue Jan 14, 2018

Auto merge of #46832 - Diggsey:bufread-cheaper-seek, r=alexcrichton
BufRead: Only flush the internal buffer if seeking outside of it.

Fixes #31100

r? @dtolnay

@bors bors closed this in #46832 Jan 14, 2018

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented Jan 14, 2018

FWIW, this solution requires R:Seek. A try_seek_relative function would not have such a requirement.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 16, 2018

This should not be closed, the new method is still unstable. It’s been a couple months though:

@rfcbot fcp merge

@SimonSapin SimonSapin reopened this Mar 16, 2018

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Mar 16, 2018

@SimonSapin to confirm, this is for the seek_relative method, right?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2018

Yes, that is the one item with issue = "31100".

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2018

Hmm, maybe I needed to reopen the issue before making the command:

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

commented Mar 17, 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot

This comment has been minimized.

Copy link

commented Mar 18, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Copy link

commented Mar 28, 2018

The final comment period is now complete.

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented May 30, 2018

I agree with @dtolnay and don't think a separate method is needed for this. This can just be the behavior when calling seek(SeekFrom::Current(n)).

@tmccombs

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2018

I agree with @dtolnay and don't think a separate method is needed for this. This can just be the behavior when calling seek(SeekFrom::Current(n)).

I think it could be done something like this:

fn seek(&mut self, pos: SeekFrom) -> io::Result<u64> {
        let result: u64;
        if let SeekFrom::Current(n) = pos {
            if let Some(new_pos) = (self.pos as i64).checked_add(n) {
                if new_pos >= 0 && new_pos <= self.cap {
                    self.pos = new_pos as usize;
                    return self.inner.seek(SeekFrom::Current(0)).map(|p| p + (self.cap - self.pos));
                }
            }
            // .. the rest is the same as before
@abonander

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2018

I don't have any real cases in mind but changing the existing seek semantics could cause subtle bugs for people who are relying on it. E.g. if the user is expecting the data at an in-buffer offset to change in the underlying reader and is using seek to refresh the buffer, this optimization would cause it to yield stale data.

@t-rapp

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

I stumbled over this ticket when implementing a library that parses media files. What makes me uneasy about using seek_relative (besides being available in nightly only) is the fact that often I do not have the current position. From reading the source code of BufReader seek it seems that the internal buffer is flushed even when doing seek(SeekFrom::Current(0)) to get the current position.

It would be great to add an optimization path for this case (get current position) to the Seek trait implementation of BufReader.

@funchal

This comment has been minimized.

Copy link

commented Feb 9, 2019

Hi I'm new to rust, I'm surprised by this behaviour. Can I ask why BufReader should flush the buffer on seek within the buffer, like ever? I understand this is documented behaviour of a stable api, but I can't think of any reason to write code that would rely on this intentionally, and also can't think of any code that could break unintentionally. I could be wrong but it feels like this was perhaps excessive documentation that now constrains the behaviour of this api to remain inefficient.

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented Feb 9, 2019

Because that's the way to sync the BufReader's idea of where the file pointer is with the underlying reader. The other alternative would be to make into_inner do it, but it can't because it doesn't depend on R:Seek.

@cgburgess

This comment has been minimized.

Copy link

commented Apr 20, 2019

Because that's the way to sync the BufReader's idea of where the file pointer is with the underlying reader. The other alternative would be to make into_inner do it, but it can't because it doesn't depend on R:Seek.

It seems like the major issue then is that we are worried people using seek_relative will lose track of where they are in the file/stream and try to seek past the end? The function already returns a result so why is this really an issue? This is my first time stumbling upon a nightly feature so I am just curious.

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented Apr 20, 2019

No, the major issue is one of compatibility guarantees. Right now, if you want to sync the BufReader's position with the underlying Reader and extract it, you seek and then into_inner. So that pattern needs to continue to work.

@funchal

This comment has been minimized.

Copy link

commented Apr 20, 2019

Can't we just leave seek alone then and add a separate api seek2 or whatever which has the little enhancement of not losing the buffer when you seek within it?

@sfackler

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented Apr 20, 2019

I've been advocating for try_seek_relative because the existing method requires R: Seek where that's not necessary for a BufReader.

@czipperz

This comment has been minimized.

Copy link
Contributor

commented May 25, 2019

@taralx do you want to write up a pr? I doubt it would be very difficult. Just refactor seek_relatives code that already does this out into the method?

@czipperz

This comment has been minimized.

Copy link
Contributor

commented May 25, 2019

I do find it mildly ironic that the original implementation explicitly documented that it would always flush the buffer and why.

@taralx

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2019

Life is a little busy right now, so if someone else is excited to do it, go for it. Otherwise I'll see if I can't find some time to put together a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.