New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print out the backup size when listing snapshots (enhancement) #693

Open
yhafri opened this Issue Dec 10, 2016 · 14 comments

Comments

Projects
None yet
@yhafri

yhafri commented Dec 10, 2016

Output of restic version

Any.

Expected behavior

Adding an extra column to list the size of the backup (in bytes) can be very useful.
It'll help distinguish between different backups just by checking their size.

$ restic snapshots
ID        Date                 Host        Tags        Directory    Size
--------------------------------------------------------------------------
5b969a0e  2016-12-09 15:10:32  localhost               myfile       390865

Actual behavior

$ restic snapshots
ID        Date                 Host        Tags        Directory
----------------------------------------------------------------------
5b969a0e  2016-12-09 15:10:32  localhost               myfile
@fd0

This comment has been minimized.

Show comment
Hide comment
@fd0

fd0 Dec 10, 2016

Member

Thanks for the suggestion. What would you expect the size to be? Since all data is deduplicated, a "size" for a particular snapshot is not that easy to determine. Would that be the size of all data referenced in that snapshot? Or the data that was not yet stored in the repo when the snapshot was taken (new data)?

Member

fd0 commented Dec 10, 2016

Thanks for the suggestion. What would you expect the size to be? Since all data is deduplicated, a "size" for a particular snapshot is not that easy to determine. Would that be the size of all data referenced in that snapshot? Or the data that was not yet stored in the repo when the snapshot was taken (new data)?

@fd0 fd0 added the feature label Dec 10, 2016

@zcalusic

This comment has been minimized.

Show comment
Hide comment
@zcalusic

zcalusic Dec 10, 2016

Member

This is a very good proposal. The number on the right should be the cumulative size of blobs added to the repo. It is the most interesting quantitative parameter of any backup run.

How much space did my incremental wasted this night? Oops, it's 10x more than last night, I left some junk somehere (or forgot to put some excludes), I better clean it up. ;)

Member

zcalusic commented Dec 10, 2016

This is a very good proposal. The number on the right should be the cumulative size of blobs added to the repo. It is the most interesting quantitative parameter of any backup run.

How much space did my incremental wasted this night? Oops, it's 10x more than last night, I left some junk somehere (or forgot to put some excludes), I better clean it up. ;)

@yhafri

This comment has been minimized.

Show comment
Hide comment
@yhafri

yhafri Dec 10, 2016

+1 for @zcalusic suggestion

yhafri commented Dec 10, 2016

+1 for @zcalusic suggestion

@fd0

This comment has been minimized.

Show comment
Hide comment
@fd0

fd0 Dec 11, 2016

Member

The problem with the size of "new" blobs (added by that particular snapshot) becomes less relevant over time, because those blobs will be referenced by later snapshots. In addition, when earlier snapshots are removed, the number of blobs referenced by a particular snaphot will grow.

I think it's valuable to print this information right after the backup is complete, and we can also record it in the snapshot data structure in the repo. I've planned to add some kind of 'detail' view for a particular snapshot, and I think it is a good idea to display the number and size of new blobs there, but in the overview (command snapshots) it's not relevant enough. There, I think restic should display the whole size of a particular snapshot (what you get if you were to restore it), because that doesn't change.

Member

fd0 commented Dec 11, 2016

The problem with the size of "new" blobs (added by that particular snapshot) becomes less relevant over time, because those blobs will be referenced by later snapshots. In addition, when earlier snapshots are removed, the number of blobs referenced by a particular snaphot will grow.

I think it's valuable to print this information right after the backup is complete, and we can also record it in the snapshot data structure in the repo. I've planned to add some kind of 'detail' view for a particular snapshot, and I think it is a good idea to display the number and size of new blobs there, but in the overview (command snapshots) it's not relevant enough. There, I think restic should display the whole size of a particular snapshot (what you get if you were to restore it), because that doesn't change.

@mgumz

This comment has been minimized.

Show comment
Hide comment
@mgumz

mgumz May 9, 2017

i was instantly reminded of the statistics flag of rdiff-backup (see https://www.systutorials.com/docs/linux/man/1-rdiff-backup-statistics/ ). sometimes it's nice to see some sort of delta between 2 snapshots.

mgumz commented May 9, 2017

i was instantly reminded of the statistics flag of rdiff-backup (see https://www.systutorials.com/docs/linux/man/1-rdiff-backup-statistics/ ). sometimes it's nice to see some sort of delta between 2 snapshots.

@fd0

This comment has been minimized.

Show comment
Hide comment
@fd0

fd0 May 14, 2017

Member

Indeed, but that's a different thing: It's computed live and compares two snapshots. We may add something like that, but doing that for the snapshots overview list is too expensive (at least with the information we have available in the data structures right now).

Member

fd0 commented May 14, 2017

Indeed, but that's a different thing: It's computed live and compares two snapshots. We may add something like that, but doing that for the snapshots overview list is too expensive (at least with the information we have available in the data structures right now).

@bj0

This comment has been minimized.

Show comment
Hide comment
@bj0

bj0 Oct 20, 2017

it could be useful to know the size of the data 'unique' to the snapshot vs the total size (including dedup'd data) of the snapshot.

bj0 commented Oct 20, 2017

it could be useful to know the size of the data 'unique' to the snapshot vs the total size (including dedup'd data) of the snapshot.

@alexeymuranov

This comment has been minimized.

Show comment
Hide comment
@alexeymuranov

alexeymuranov Nov 13, 2017

IMO it would be quite useful to have an idea of how much extra space was used for a new snapshot. This could be even just physical storage space computed during backup and stored in snapshot's metadata. If some snapshot is removed, this metadata should be then invalidated in all future snapshots.

I think i would appreciate such a feature even if nothing else is done in this direction. However, an option of recalculating this "extra size" after some previous backups were removed would also be nice. I think this is what BackupLoupe does for Time Machine on Mac OS. (The deduplication in Time Machine is very basic, but the problem of defining the "size of a snapshot" is the same).

alexeymuranov commented Nov 13, 2017

IMO it would be quite useful to have an idea of how much extra space was used for a new snapshot. This could be even just physical storage space computed during backup and stored in snapshot's metadata. If some snapshot is removed, this metadata should be then invalidated in all future snapshots.

I think i would appreciate such a feature even if nothing else is done in this direction. However, an option of recalculating this "extra size" after some previous backups were removed would also be nice. I think this is what BackupLoupe does for Time Machine on Mac OS. (The deduplication in Time Machine is very basic, but the problem of defining the "size of a snapshot" is the same).

@rawtaz

This comment has been minimized.

Show comment
Hide comment
@rawtaz

rawtaz Feb 6, 2018

Contributor

The most fundamental thing I'd like to know off the bat is how much disk space would the contents of snapshot X consume on the target disk if I restored it.

Preferrably I would also be able to get this information for only a subset of the files, e.g. if there was a size command that took the same type of include/exclude options as the restore command. Or if the restore command has an option that makes it just report statistics like this instead of actually restoring.

Contributor

rawtaz commented Feb 6, 2018

The most fundamental thing I'd like to know off the bat is how much disk space would the contents of snapshot X consume on the target disk if I restored it.

Preferrably I would also be able to get this information for only a subset of the files, e.g. if there was a size command that took the same type of include/exclude options as the restore command. Or if the restore command has an option that makes it just report statistics like this instead of actually restoring.

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Feb 6, 2018

Thanks @rawtaz for pointing me at this issue.

I'm storing backups in metered storage (Backblaze B2). I want to know how much new data I'm creating every time I run a backup. It seems like this ought to be easy to calculate during the backup process; I would be happy if restic would simply log that as part of concluding a backup...but it seems like it might also be useful to store this as an attribute of the snapshot (so it can be queried in the future).

I am not really interested in anything that requires extensive re-scanning of the repository, since that will simply incur additional charges.

larsks commented Feb 6, 2018

Thanks @rawtaz for pointing me at this issue.

I'm storing backups in metered storage (Backblaze B2). I want to know how much new data I'm creating every time I run a backup. It seems like this ought to be easy to calculate during the backup process; I would be happy if restic would simply log that as part of concluding a backup...but it seems like it might also be useful to store this as an attribute of the snapshot (so it can be queried in the future).

I am not really interested in anything that requires extensive re-scanning of the repository, since that will simply incur additional charges.

@er1z

This comment has been minimized.

Show comment
Hide comment
@er1z

er1z Feb 13, 2018

Any news?

er1z commented Feb 13, 2018

Any news?

@simeydk

This comment has been minimized.

Show comment
Hide comment
@simeydk

simeydk Jul 4, 2018

Hello

I would like to second this suggestion. In addition to 'How big would this snapshot be if I restored it' for any existing snapshot and 'how much did this snapshot add' when a snapshot is created, I have a third suggestion:

It would also help to be able to answer the question: 'By how much would my repo size reduce if I remove the following snapshot(s)?' This would be useful in restic forget --prune --dry-run when deciding whether to drop snapshots. For example, I recently dropped 20 of the 40 snapshots in a repo, and it reduced the size from 1.1GB to 1.0GB. Had I known this would only have saved 100MB, I likely would have kept the older snapshots.

simeydk commented Jul 4, 2018

Hello

I would like to second this suggestion. In addition to 'How big would this snapshot be if I restored it' for any existing snapshot and 'how much did this snapshot add' when a snapshot is created, I have a third suggestion:

It would also help to be able to answer the question: 'By how much would my repo size reduce if I remove the following snapshot(s)?' This would be useful in restic forget --prune --dry-run when deciding whether to drop snapshots. For example, I recently dropped 20 of the 40 snapshots in a repo, and it reduced the size from 1.1GB to 1.0GB. Had I known this would only have saved 100MB, I likely would have kept the older snapshots.

@dimejo

This comment has been minimized.

Show comment
Hide comment
@dimejo

dimejo Jul 4, 2018

Contributor

@mholt made #1729 to show some stats. Maybe he can chime in to say something about the progress of this PR.

Contributor

dimejo commented Jul 4, 2018

@mholt made #1729 to show some stats. Maybe he can chime in to say something about the progress of this PR.

@mholt

This comment has been minimized.

Show comment
Hide comment
@mholt

mholt Jul 4, 2018

Contributor

@dimejo It's done -- just waiting for it to be reviewed/merged. :)

Contributor

mholt commented Jul 4, 2018

@dimejo It's done -- just waiting for it to be reviewed/merged. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment