-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restic stats: print uncompressed size in mode raw-data #3915
Conversation
10c7e0d
to
837b816
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR could also use a changelog entry once we've finished deciding on what exactly to count for TotalUncompressedSize
.
b38a06a
to
42a94e0
Compare
When I was writing a changelog entry I realised that printing the number of compressed blobs could be useful too, especially in cases where the repository has been migrated from version 1 to version 2 but not yet repacked to compress the old data or to track the progress if it's being compressed in several steps. Unconditionally printing the number of compressed blobs clutters the output a little though, because in some cases now and most cases in the future (since compression is default on for new repositories) the number of compressed blobs is the same as the total number of blobs, so I added a condition that skips it when that's the case. We should probably discuss it though. |
Wouldn't the size of the remaining uncompressed blobs be more useful to judge how much work is left to fully compress a repository? E.g. |
I considered that but since the compressed size of a pack is generally unrelated to the uncompressed size the ratio of compressed to uncompressed size didn't feel as informative as the ratio of the number of compressed blobs to the total number of blobs. Also "total size" and "total uncompressed size" can already be a little confusing and adding "total size of uncompressed blobs" didn't really make it any better. Maybe it could be better to just add a new mode where all the compression related statistics may go, because there are a lot of things that could be useful to a user who specifically cares about them but not to everybody. I'm thinking about at least some of:
And some of these only make sense if a repository/snapshot is only partially compressed. Also better names would be probably needed. |
But what does that ratio tell me? As the blob sizes can vary wildly, it's a bit random whether that ratio is representative or not. My feeling here is that the information a user is interested in, is rather how much data still has to be compressed. Or which fraction of the overall data size has already been compressed (size of still uncompressed blobs / total uncompressed size). Although that information might be more useful in the prune command. Btw, the repack size given to
I'd prefer to keep the number of different statistics somewhat limited. After all each new value means more code to maintain.
A partially compressed repository is intended to be a transitional state, thus we shouldn't add too many statistics which are only useful for such repositories. Judging from the discussion in https://forum.restic.net/t/how-to-check-if-files-were-compressed/5392 However, I'm still not convinced by |
4470a19
to
4d2dcbe
Compare
Yeah, it's better to only display what most users will want to see. I just pushed some new commits that calculate and display the compression progress, calculated as percentage of data in the repository that has been compressed. I also added the compression ratio, which is not easily obtained because it must be calculated on the actual compressed data and we were not providing the necessary information, and the space saving, calculated on the actual state of the repository. The latter is just the ratio of uncompressed data and compressed data but we might as well print it and save the users some time. |
This is the output on a partially compressed test repository:
|
4d2dcbe
to
4e35e2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information provided to the user looks fine by now. However, I'd like to reduce the amount of new JSON fields a bit (see comments). Adding 6 new fields with varying degrees of redundancy is too much.
4e35e2b
to
8fe1b9c
Compare
Calculate and display compression ratio, space saving and progress
8fe1b9c
to
a6f83e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for keeping up with all the review comments!
What does this PR change? What problem does it solve?
This PR prints the uncompressed size of a snapshot/repository when
restic stats
is called in moderaw-data
. For example:Was the change previously discussed in an issue or on the forum?
No
Checklist
changelog/unreleased/
that describes the changes for our users (see template).gofmt
on the code in all commits.