New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/added fsync fdsync and flush #85
Conversation
@@ -189,6 +189,12 @@ def seek(self, offset: int) -> None: | |||
def tell(self) -> int: | |||
return self._offset | |||
|
|||
async def flush(self, sync_metadata: bool = False) -> None: | |||
if sync_metadata: | |||
await self.file.fsync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked the under the hood implementation of these (here or in caio
lib), but my high level comments would be that flush and f[d]sync are different operations. Under normal write
the data first hits the user-space buffers before eventually hitting the kernel space buffers. The fsync
asks the OS to persist the contents of the kernel space buffer for the given file descriptor to the disk. However if the data hasn't made it there, nothing would be fsync'd
. That is why it's a 2 step process. flush
helps flush the contents of the user-space buffer linked to the file descriptor to the OS. Then fsync
works as expected. Which is why the python docs also claim that you should call flush
before calling fsync
IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These calls are guaranteed by the underlying library. You were right, there was a bug in caio, it's fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, thanks for that. What I think should happen now is that this flush
API should end up calling the underlying fflush
(or its async equivalent) and fsync
should end up calling the underlying fsync
.
They serve very different purpose - flush
will flush the contents of user-space buffers to OS and fsync
will sync the OS buffers to the disk (or equivalent). So this async flush
should not call fsync
inside it - it should call whatever ends up calling fflush
.
Currently the diff above shows that you are calling fsync
from a flush
. That will lead to incorrect behaviour because fsync
will not touch the user space buffers (which is what fflush
is meant to do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aio.h
has no user-space buffer, all operations that are marked as performed are already in kernel space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, what I meant was the user-space buffers used by your library - for eg. like the C std-libs fwrite
normally use buffers in the user-space before doing the actual write
syscall to the kernel. Does your library use user-space buffers that need to be flushed (in which case flush
and fsync
need to be different operations)?
If that is not the case, ie. there's no buffering involved in your library, then all good and I'm marking this as approved.
Can you please clarify one last thing (probably also in your documentation/README): The aio_fsync page says that the fsync
request is only enqueued/requested and not actually performed. I'm guessing that's the behaviour to be expected of this library as well. If so, is there anyway to wait for when it's done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aio.h
calls just sends requiests as a syscall, and in my implementation the operation completion is triggering by the kernel using special EventFD file descriptor, fsync is not exludes this.
When to be honest caio
contains three another implemetations of asynchronous IO. This discussion is relevant only for linux-specific aio.h
, excludes another two.
I'd be glad if you could do a review in caio
regarding thread-pool based and pure-python implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if/when I get a chance in the future I'll take a look.
(Just as a note I've not used anything but linux for as long as I can remember so if there's any code that needs an understanding of other OSes, I'll be a total noob there and likely not be able to contribute much).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with comments - please clarify before merging if possible.
Also thank you so much for engaging and doing all this - really appreciate it :)
@ustulation please review