Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file:position wipes read_ahead buffer unconditionally #4706

Open
lhoguin opened this issue Apr 2, 2021 · 8 comments
Open

file:position wipes read_ahead buffer unconditionally #4706

lhoguin opened this issue Apr 2, 2021 · 8 comments
Assignees
Labels
enhancement help wanted Issue not worked on by OTP; help wanted from the community team:VM Assigned to OTP team VM

Comments

@lhoguin
Copy link
Contributor

lhoguin commented Apr 2, 2021

Is your feature request related to a problem? Please describe.

When using file:position on a handle that has read_ahead enabled, and moving to a position that would land inside the read_ahead buffer (for example, {cur, 128} likely would), the buffer is wiped and a system seek is done instead.

position(Fd, {cur, Offset}) ->
try
%% Adjust our current position according to how much we've read ahead.
#{ r_buffer := RBuf } = get_fd_data(Fd),
position_1(Fd, cur, Offset - prim_buffer:size(RBuf))
catch
error:badarg -> {error, badarg}
end;
position(Fd, {Mark, Offset}) ->
try
position_1(Fd, Mark, Offset)
catch
error:badarg -> {error, badarg}
end;
position(Fd, cur) -> position(Fd, {cur, 0});
position(Fd, bof) -> position(Fd, {bof, 0});
position(Fd, eof) -> position(Fd, {eof, 0});
position(Fd, Offset) -> position(Fd, {bof, Offset}).
position_1(Fd, Mark, Offset) ->
#{ handle := FRef, r_buffer := RBuf } = get_fd_data(Fd),
prim_buffer:wipe(RBuf),
seek_nif(FRef, Mark, Offset).

(Click the link for the full context.)

Describe the solution you'd like

The buffer should not be wiped. Only the beginning of the buffer should be skipped via prim_buffer:skip.

Describe alternatives you've considered

Reimplementing much of the same functionality in my project.

Additional context

Something curious I've noticed while looking if read_ahead would be benefitial to my use case. I may send a PR if I am moving forward with this solution for my implementation.

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Apr 12, 2021
@rickard-green rickard-green added the help wanted Issue not worked on by OTP; help wanted from the community label Apr 12, 2021
@jhogberg
Copy link
Contributor

We'd like to have this but don't feel we have the time to finish it before OTP 24, so a PR would be welcome.

@essen
Copy link
Contributor

essen commented Apr 12, 2021

What's the timeline for OTP-24? How much time do I have to do this?

I am close to the point where I can work on this but I would like to be sure I can make it on time.

@jhogberg
Copy link
Contributor

jhogberg commented Apr 12, 2021

How much time do I have to do this?

The final release candidate will be frozen on Friday so we'll need to have it by Wednesday evening, but there's no need to rush. We can always add it in OTP 24.1 instead.

@lhoguin
Copy link
Contributor Author

lhoguin commented Apr 13, 2021

I've written lhoguin@7d95212 but I am not happy with it.

It would be possible to get an even better result by attaching the position of the start of the buffer to the fd. That way we do not need to do a seek call to return the new position, and we could avoid wipe/seek calls in more scenarios than just {cur,Offset}. For example an absolute position might still fall within the buffer, and in that case we could skip and return the new position without calling seek. And absolute file:position is what I need, really, so the patch I wrote is not ideal.

I think it would be possible to use an atomics of size 1 that would sit in the fd data, and keep the position there. Then we can add/sub whenever the buffer changes. We would then use this position to decide whether we fall within the buffer (if {bof,Pos} then it'd be Pos >= BufPos, Pos < BufPos+BufSize.

What do you think?

@lhoguin
Copy link
Contributor Author

lhoguin commented Apr 13, 2021

Also allowing pread to non-destructively read from the buffer, and pwrite to avoid wiping the buffer if writes happen outside of it, would be more possibilities enabled by this. But I don't think this can make it into OTP-24. Happy to look at that later on when I will be doing optimisations.

The one about pwrite would be especially useful for my case because I could benefit from read_ahead when reading and do my small writes over the entries I have already finished reading. Only in some specific cases would the pwrites occur in the read buffer, and only in some specific cases would the reads require a seek before reading again.

@jhogberg
Copy link
Contributor

jhogberg commented Apr 14, 2021

It would be possible to get an even better result by attaching the position of the start of the buffer to the fd. That way we do not need to do a seek call to return the new position, and we could avoid wipe/seek calls in more scenarios than just {cur,Offset}. For example an absolute position might still fall within the buffer, and in that case we could skip and return the new position without calling seek. And absolute file:position is what I need, really, so the patch I wrote is not ideal.

Yes, though that requires a seek(2) when we read into the buffer which adds unnecessary overhead for most uses of read_ahead, and I don't think that's ideal either. (IIRC there are some nasty edge cases that prevent us from merely bumping the saved position every time we read.)

I think it would be possible to use an atomics of size 1 that would sit in the fd data, and keep the position there. Then we can add/sub whenever the buffer changes. We would then use this position to decide whether we fall within the buffer (if {bof,Pos} then it'd be Pos >= BufPos, Pos < BufPos+BufSize.

What do you think?
...
Also allowing pread to non-destructively read from the buffer, and pwrite to avoid wiping the buffer if writes happen outside of it, would be more possibilities enabled by this. But I don't think this can make it into OTP-24. Happy to look at that later on when I will be doing optimisations.

The one about pwrite would be especially useful for my case because I could benefit from read_ahead when reading and do my small writes over the entries I have already finished reading. Only in some specific cases would the pwrites occur in the read buffer, and only in some specific cases would the reads require a seek before reading again.

I fear it might make things a bit too complicated. Do you think something along the lines of a memory-mapped file would work well for you? (Leaving read_ahead to the OS)

@lhoguin
Copy link
Contributor Author

lhoguin commented Apr 14, 2021

One of the goals is to allow using as little memory as possible so mmap sounds inadequate. Anyway I have to first implement a write buffer so that there aren't so many read/write calls, allowing reading from that write buffer directly, and then see what would make sense to do with the read/write that remain. I'll revisit this when that time comes. Cheers!

@lhoguin
Copy link
Contributor Author

lhoguin commented May 7, 2021

Happy to open a PR for lhoguin@7d95212 if you think it's worth it. On my end I have a lot of pread calls now so I am unlikely to be needing it, but it's still an improvement. If you don't think the linked commit is good then we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Issue not worked on by OTP; help wanted from the community team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

4 participants