Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReFS parallel file cloning silent failure bug #12

Closed
erikmav opened this issue Sep 15, 2022 · 1 comment
Closed

ReFS parallel file cloning silent failure bug #12

erikmav opened this issue Sep 15, 2022 · 1 comment

Comments

@erikmav
Copy link
Contributor

erikmav commented Sep 15, 2022

Windows-only.

Symptoms: When cloning a single source file in parallel (multiple threads or processes) to multiple destinations, sometimes success is returned but the file region assignment is never completed. This results in the destination file having all zeroes for its content.

Cause: When a file is not completely flushed to disk when the region clone operation starts, ReFS tries to flush the file to disk first. There is a race on multiple threads where failure to flush is ignored and the region clone proceeds anyway.

Workarounds:

  • Serialize cloning system-wide per source path. This library takes this approach by default for single-process cloning by using an in-memory dictionary. You can opt into system-wide serialization using kernel mutexes by specifying useCrossProcessLocksWhereApplicable = true when calling CopyOnWriteFilesystemFactory.GetInstance().
  • Ensure the source file is completely flushed to disk before cloning. This can be accomplished through one of the approaches below. Note that if you use these approaches, you can increase performance of cloning by using CloneFileFlags.NoSerializedCloning on your CloneFile calls.
    • Using the FlushFileBuffers API to force the file to be flushed from memory. This should be called at the end of writing the source file to disk while the file write handle is still open. Alternately it could be called on a new handle to the file opened with GENERIC_WRITE.
    • When writing the source file, open the file handle with FILE_FLAG_NO_BUFFERING. However, note this requires the code writing to the file to deal with writing chunks aligned with the sector size of the underlying volume, and using chunks that are a multiple of the sector size.
    • When writing the source file, open the file handle with FILE_FLAG_WRITE_THROUGH. This forces a flush on every write, which can decrease performance significantly.

Resolution: We currently have only workarounds (see above) and are awaiting resolution with the Windows team.

This issue tracks resolution in the Windows codebase and other approaches to work around the problem. Related to #1. PRs that have added workarounds:

@erikmav
Copy link
Contributor Author

erikmav commented Jan 30, 2023

Not repro on Win11 2H22 with Jan 2023 patches. #20 to remove the serialization paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant