Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FUSE: reflect deduplication in allocated blocks #184

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dnnr
Copy link

@dnnr dnnr commented Jan 23, 2015

Instead of giving all files a fixed block count of 1, this assigns each
deduplicated chunk to a certain file. In effect, the cumulative file
size that is shown in the mountpoint accurately reflects the amount of
actual disk space needed for the repository (barring metadata overhead).

Although the block assignment is done arbitrarily, depending on the
user's access pattern, the sizes will be consistent within the entire
mount point. This facilitates the use of tools like du and ncdu for
inspecting the actual disk usage in a repository as opposed to just
looking at the original, uncompressed, non-deduplicated file sizes.

Instead of giving all files a fixed block count of 1, this assigns each
deduplicated chunk to a certain file. In effect, the cumulative file
size that is shown in the mountpoint accurately reflects the amount of
actual disk space needed for the repository (barring metadata overhead).

Although the block assignment is done arbitrarily, depending on the
user's access pattern, the sizes will be consistent within the entire
mount point. This facilitates the use of tools like du and ncdu for
inspecting the actual disk usage in a repository as opposed to just
looking at the original, uncompressed, non-deduplicated file sizes.
@ThomasWaldmann
Copy link
Contributor

can we have some opinions here about this PR?

is there a chance that this might confuse users, if the blocks are more or less random compared to the original filesize?

@dnnr
Copy link
Author

dnnr commented Mar 6, 2015

On the one hand, yes. But on the other hand, those values are currently simply set to 1, i.e., they're mostly wrong and meaningless anyway. And more importantly: I'd say that the semantics of that field are actually correct this way. It's supposed represent the "size used on disk" and therefore supposed to be potentially arbitrarily different from the nominal file size exactly because of the effects caused by compression, deduplication, sparse files, or whatever else is going on in the underlying file system.

So of course someone might claim to be confused by those values, but I actually can't think of any better way of populating st_blocks that wouldn't be at least equally confusing. At least this way it's consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants