Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/added fsync fdsync and flush #85

Merged
merged 3 commits into from Aug 22, 2023

Conversation

mosquito
Copy link
Owner

@ustulation please review

@@ -189,6 +189,12 @@ def seek(self, offset: int) -> None:
def tell(self) -> int:
return self._offset

async def flush(self, sync_metadata: bool = False) -> None:
if sync_metadata:
await self.file.fsync()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked the under the hood implementation of these (here or in caio lib), but my high level comments would be that flush and f[d]sync are different operations. Under normal write the data first hits the user-space buffers before eventually hitting the kernel space buffers. The fsync asks the OS to persist the contents of the kernel space buffer for the given file descriptor to the disk. However if the data hasn't made it there, nothing would be fsync'd. That is why it's a 2 step process. flush helps flush the contents of the user-space buffer linked to the file descriptor to the OS. Then fsync works as expected. Which is why the python docs also claim that you should call flush before calling fsync IIRC.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These calls are guaranteed by the underlying library. You were right, there was a bug in caio, it's fixed now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, thanks for that. What I think should happen now is that this flush API should end up calling the underlying fflush (or its async equivalent) and fsync should end up calling the underlying fsync.

They serve very different purpose - flush will flush the contents of user-space buffers to OS and fsync will sync the OS buffers to the disk (or equivalent). So this async flush should not call fsync inside it - it should call whatever ends up calling fflush.

Currently the diff above shows that you are calling fsync from a flush. That will lead to incorrect behaviour because fsync will not touch the user space buffers (which is what fflush is meant to do).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aio.h has no user-space buffer, all operations that are marked as performed are already in kernel space.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what I meant was the user-space buffers used by your library - for eg. like the C std-libs fwrite normally use buffers in the user-space before doing the actual write syscall to the kernel. Does your library use user-space buffers that need to be flushed (in which case flush and fsync need to be different operations)?

If that is not the case, ie. there's no buffering involved in your library, then all good and I'm marking this as approved.

Can you please clarify one last thing (probably also in your documentation/README): The aio_fsync page says that the fsync request is only enqueued/requested and not actually performed. I'm guessing that's the behaviour to be expected of this library as well. If so, is there anyway to wait for when it's done?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aio.h calls just sends requiests as a syscall, and in my implementation the operation completion is triggering by the kernel using special EventFD file descriptor, fsync is not exludes this.

When to be honest caio contains three another implemetations of asynchronous IO. This discussion is relevant only for linux-specific aio.h, excludes another two.

I'd be glad if you could do a review in caio regarding thread-pool based and pure-python implementation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if/when I get a chance in the future I'll take a look.

(Just as a note I've not used anything but linux for as long as I can remember so if there's any code that needs an understanding of other OSes, I'll be a total noob there and likely not be able to contribute much).

Copy link

@ustulation ustulation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments - please clarify before merging if possible.

Also thank you so much for engaging and doing all this - really appreciate it :)

@mosquito mosquito merged commit eeb401c into master Aug 22, 2023
19 of 20 checks passed
@mosquito mosquito deleted the feature/added-fsync-fdsync-and-flush branch August 22, 2023 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants