Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to add files to the compressed filesystem without having to decompress/recompress the whole thing #18

Open
MasterDuke17 opened this issue Dec 6, 2020 · 7 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@MasterDuke17
Copy link

I'm experimenting with using DwarFS for a very similar use case as yours (a build of each Rakudo commit). However, given there are several new commits every day, creating a new DwarFS image from scratch each time doesn't make sense. Would it be possible to add new data to the compressed filesystem without having to decompress/recompress the whole thing?

@mhx
Copy link
Owner

mhx commented Dec 6, 2020

I'm not saying it wouldn't make sense, but doing it in a good way would likely be hard.

Ideally, when adding a new set of files, you'd like to end up with a similar result than if you had recompressed the whole file system. But that would mean having to insert the files in between other files that are already in the file system in order to be able to make good use of the available redundancy. I'm not saying this can't be done, just that it's probably too much work to get done any time soon.

What I'd probably do is something like:

  • create an initial DwarFS image
  • create a writable overlay and install new builds to this overlay
  • once per week/month or so, build a new DwarFS image directly from the overlay so it includes both old and new builds

I don't know if that would would help in your case, but it's something that could probably be automated quite easily.

@mhx mhx added the enhancement New feature or request label Dec 6, 2020
@MasterDuke17
Copy link
Author

Yeah, I didn't imagine it was a five-min fix. If we do end up using DwarFS in the near future we'd very likely do something pretty similar to what you suggest. However, we already have an automated system with zstd archives of single recent commits and lrzip archives of bundles of older commits. The appeal of DwarFS (assuming it has a compression ratio similar enough) is that we could rip out all the code we have to handle decompressing the different versions when wanting to use a specific commit and just run something at a known path in the filesystem.

Thanks for the quick response, and I wish we'd known about DwarFS back in 2016 when we were first creating our system!

@AlexDaniel
Copy link

Just to clarify the last comment, the reason lrzip was chosen is that zstd didn't have long-range mode back then. Now it does and I even have some code to do the transition from lrzip to zstd only, but dwarfs looks so cool… :)

FWIW here's our journey: Raku/whateverable#23

@mhx
Copy link
Owner

mhx commented Dec 6, 2020

Thanks for the quick response, and I wish we'd known about DwarFS back in 2016 when we were first creating our system!

Well, in 2016 it was still sitting on my laptop, I sadly didn't have the time (and energy) to publish it back then.

@mhx
Copy link
Owner

mhx commented Nov 15, 2022

Just a quick update: I'm planning to add support for "snapshots" (or whatever the feature will ultimately be called), which would definitely address this issue; in fact, it will go a lot further. You'll be able to not only mount/extract the latest update, but also all previous updates. Each update would only store the changes relative to the previous state, which would e.g. allow you to use DwarFS for incremental backups. There's no timeline, so don't hold your breath, but it'll hopefully happen before v1.0.0. :)

@mhx mhx added this to the v0.9.0 milestone Nov 15, 2022
@mhx mhx self-assigned this Nov 15, 2022
@Phantop
Copy link

Phantop commented Nov 15, 2022

Just a quick update: I'm planning to add support for "snapshots" (or whatever the feature will ultimately be called), which would definitely address this issue; in fact, it will go a lot further. You'll be able to not only mount/extract the latest update, but also all previous updates. Each update would only store the changes relative to the previous state, which would e.g. allow you to use DwarFS for incremental backups. There's no timeline, so don't hold your breath, but it'll hopefully happen before v1.0.0. :)

If this gets implemented, what's the intention with regards to writeability? Obviously incremental backups would require providing both a source directory and existing DwarFS image, but would mounting an modifying an existing one be a consideration, too? Would that become the default behavior if that were the case or would images remain read-only by default?

@mhx
Copy link
Owner

mhx commented Nov 15, 2022

If this gets implemented, what's the intention with regards to writeability? Obviously incremental backups would require providing both a source directory and existing DwarFS image, but would mounting an modifying an existing one be a consideration, too? Would that become the default behavior if that were the case or would images remain read-only by default?

I have no plans for making the file system writable at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants