-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-place staging mode #146
Comments
That makes sense. It would require a bit of modification to the way Mutagen's change application algorithm works (and some failure handling, because it could leave files in a corrupt state in the event of a network disconnect), but it's certainly theoretically possible (and I can see the value for large files). I'll put it on the roadmap for a future release. |
This issue is indirectly related to guard/guard#924, in the sense that implementing this would likely solve that issue, though that issue might also be solved in other ways. |
Just a minor update here: Although it's not an "in-place" staging mode, Mutagen v0.13 did add an |
tldr; I’ve been looking for a real-time bidirectional syncing solution to use between two ZFS pools (with automated ZFS snapshotting on one pool). However, because ZFS operates on a block level (Copy on Write), most syncing solutions negate the storage efficiency of ZFS snapshots (as they update via copy/rename instead of in-place).
This example is with ZFS & Syncthing, but I believe the same principle applies with Mutagen:
From my research, there isn’t a clear bidirectional file syncing solution that’s compatible with CoW filesystems.
|
@matthewtraughber Thanks for the additional discussion points and links. I definitely think your argument is one of the strongest motivating factors for an in-place staging mode (though, of course, there are many other valid reasons). In fact, CoW filesystems are actually one of the few places where I think it might be "trivial" to implement in-place staging because you can make a cheap copy of the base file using For other filesystems (e.g. ext4) you have to keep track of which rsync blocks in a file have been invalidated by being overwritten or shifted, and making mid-file insertions work efficiently is extremely difficult since you have to watch for overlapping writes/reads and update the tracking of block indices and so on. In fact, I think that somewhere in the rsync documentation, there's a caveat stating that the Anyway, I agree, though I still think it will be really tough to implement and validate. |
Appreciate the additional context; I wasn't aware of If the level of effort is significantly greater for non-CoW filesystems, I'd propose (albeit somewhat selfishly) there's benefit in a "CoW In full disclosure though, this is quite out of my area of expertise; I just wanted to add more data points on why this functionality would be needed. |
@matthewtraughber Understood. And your links are definitely appreciated! For reference, can you tell me what types of files you're looking to sync on these filesystems? Are these large, append-only files like logs? Code? Media files? I'm curious about the use-case drivers a little bit, because it might inform some of the heuristics for doing in-place staging in a more optimal fashion. |
Realistically, it would be any file type (I know that's not particularly helpful). Text (code) / media (H.265/264 primarily) / containers, etc... My use case is still being fleshed out, but essentially: I have a local server that hosts numerous applications, along with acting as a central backup for all devices on my network/VPN. The data on the server resides on multiple ZFS mirror pools, taking regular snapshots with Sanoid. I'm transitioning from using the server directly for all computing needs to a new laptop (M1 macbook). I'd like to retain R/W access to all data on the server as if it was on the local machine (laptop). Initially I was looking into FUSE/SSHFS as a solution to access data on the primary server. However, there's bandwidth constraints for large files / streaming local media (and it requires a constant network connection). Naturally, the alternative is then to maintain a copy of the data on both devices. If synchronization was one way, then Hopefully that gives a bit more context to the workflow I'm envisioning. |
Feature Request Here.
Wondering if you would consider adding an inplace option, similar to rsync's --inplace option.
This would be a really useful feature for handling large files and for handling real-time files, like logs
The text was updated successfully, but these errors were encountered: