New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on actual backup data I/O #1033

Open
lloeki opened this Issue Jun 15, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@lloeki
Contributor

lloeki commented Jun 15, 2017

Output of restic version

restic 0.6.1

How did you start restic exactly? (Include the complete command line)

restic backup / --exclude-file $HOME/.restic.excluded

What backend/server/service did you use?

S3

Expected behavior

Restic should output statistics about data actually sent (and possibly received) in addition to processed data. Could be live feedback, could be a summary (rsync style), could be both.

Example:

[16:16] 100.00%  7.349 MiB/s  6.983 GiB / 6.983 GiB  548143 / 548144 items  17MiB in  123.4MiB out  0 errors  ETA 0:02

Actual behavior

[16:16] 100.00%  7.349 MiB/s  6.983 GiB / 6.983 GiB  548143 / 548144 items  0 errors  ETA 0:02

Restic outputs progress based on data processed but lacks progress on data actually sent. When evaluating restic as a backup solution I had to do some third party monitoring to estimate the actual savings and eventual bandwidth cost. As it is it gives the initial impression it does a full backup every time.

Steps to reproduce the behavior

Do a backup.

@fd0

This comment has been minimized.

Show comment
Hide comment
@fd0

fd0 Jun 15, 2017

Member

Thanks for raising this issue explicitly. I'm aware of the problem, and it'll be address when I come around to reworking the archiver code. I'll then build a new reporting framework so we can have better stats.

Member

fd0 commented Jun 15, 2017

Thanks for raising this issue explicitly. I'm aware of the problem, and it'll be address when I come around to reworking the archiver code. I'll then build a new reporting framework so we can have better stats.

@fd0 fd0 added the enhancement label Jun 15, 2017

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Aug 17, 2017

Something similar to Obnam's summary would be nice:

Backed up 12002 files (of 269574 found), containing 29.0 GiB.
Uploaded 389.0 MiB file data in 1h5m15s at 101.9 KiB/s average speed.
Total download amount 450.0 MiB.
Total upload amount 430.0 MiB. Overhead was 491.0 MiB (114.1 %).

alphapapa commented Aug 17, 2017

Something similar to Obnam's summary would be nice:

Backed up 12002 files (of 269574 found), containing 29.0 GiB.
Uploaded 389.0 MiB file data in 1h5m15s at 101.9 KiB/s average speed.
Total download amount 450.0 MiB.
Total upload amount 430.0 MiB. Overhead was 491.0 MiB (114.1 %).
@kurin

This comment has been minimized.

Show comment
Hide comment
@kurin

kurin Oct 23, 2017

Contributor

This just came up in the IRC channel. I'm wondering if there's anything that can be put into place prior to a new reporting framework, but which the framework can take advantage of.

Contributor

kurin commented Oct 23, 2017

This just came up in the IRC channel. I'm wondering if there's anything that can be put into place prior to a new reporting framework, but which the framework can take advantage of.

@kurin

This comment has been minimized.

Show comment
Hide comment
@kurin

kurin Oct 23, 2017

Contributor

It looks like all the interaction between restic and the backend happen through a Repository type:

https://godoc.org/github.com/restic/restic/internal/restic#Repository

I don't know the life cycle of a Repository but it seems reasonable to me that a single invocation of restic might talk to the same Repository for two or more logically different tasks (e.g. backup AND prune), even if that's not the norm.

So as an interface, it makes sense to me that what we'd want are like a Session type, so that bw usage can be apportioned to the appropriate session. A couple ways to do this occur to me:

  • Add a new method to the Repository interface and type that allows for stats introspection.
    • Least intrusive change.
    • Conflates "repo" and "session".
  • Add a wrapper type that takes a Repository and returns a Session, and use the Session everywhere.
    • Requires more changes, but separates responsibility.
    • Repository and Session are still pretty tightly coupled.
    • Repo/Session are 1:1.
  • Create a Session type independently, and then use it to create Repository types.
    • Fairly de-coupled implementation.
    • 1:N repo/session mapping.
  • Create a Session type and attach it to a Repository as needed.
    • N:M repo/session mapping.

I'm leaning toward the last one myself. I don't think switching Session types midstream would buy us anything; just make a new Repository instead.

Below the covers, all the Backends would need to be modified to use the Session to create custom net.Conn types. This is fairly trivial for the backends that are HTTP clients, since http.Transport has facilities for that, but it wouldn't shock me if there are less flexible backends.

Contributor

kurin commented Oct 23, 2017

It looks like all the interaction between restic and the backend happen through a Repository type:

https://godoc.org/github.com/restic/restic/internal/restic#Repository

I don't know the life cycle of a Repository but it seems reasonable to me that a single invocation of restic might talk to the same Repository for two or more logically different tasks (e.g. backup AND prune), even if that's not the norm.

So as an interface, it makes sense to me that what we'd want are like a Session type, so that bw usage can be apportioned to the appropriate session. A couple ways to do this occur to me:

  • Add a new method to the Repository interface and type that allows for stats introspection.
    • Least intrusive change.
    • Conflates "repo" and "session".
  • Add a wrapper type that takes a Repository and returns a Session, and use the Session everywhere.
    • Requires more changes, but separates responsibility.
    • Repository and Session are still pretty tightly coupled.
    • Repo/Session are 1:1.
  • Create a Session type independently, and then use it to create Repository types.
    • Fairly de-coupled implementation.
    • 1:N repo/session mapping.
  • Create a Session type and attach it to a Repository as needed.
    • N:M repo/session mapping.

I'm leaning toward the last one myself. I don't think switching Session types midstream would buy us anything; just make a new Repository instead.

Below the covers, all the Backends would need to be modified to use the Session to create custom net.Conn types. This is fairly trivial for the backends that are HTTP clients, since http.Transport has facilities for that, but it wouldn't shock me if there are less flexible backends.

@kurin

This comment has been minimized.

Show comment
Hide comment
@kurin

kurin Oct 23, 2017

Contributor

Er, that should read "toward the third one".

Contributor

kurin commented Oct 23, 2017

Er, that should read "toward the third one".

@kurin

This comment has been minimized.

Show comment
Hide comment
@kurin

kurin Oct 24, 2017

Contributor

I have a working implementation. It's a little rough though.

kurin@peridot:~/src/github.com/restic/restic$ ./restic -r b2:rustic2 backup .
scan [/home/kurin/src/github.com/restic/restic]
scanned 1060 directories, 4849 files in 0:00
[0:21] 100.00%  8.304 MiB/s  174.391 MiB / 174.391 MiB (up: 176.165 MiB down: 292.854 KiB)  5909 / 5909 ite... ETA 0:00 
duration: 0:21, 7.98MiB/s
snapshot 02500797 saved
kurin@peridot:~/src/github.com/restic/restic$ ./restic -r b2:rustic2 backup .
using parent snapshot 02500797
scan [/home/kurin/src/github.com/restic/restic]
scanned 1060 directories, 4849 files in 0:00
[0:01] 100.00%  174.391 MiB/s  174.391 MiB / 174.391 MiB (up: 0B down: 0B)  5909 / 5909 items  0 errors  ETA 0:00 
duration: 0:01, 88.16MiB/s
snapshot 770ba416 saved
kurin@peridot:~/src/github.com/restic/restic$ echo what > okay
kurin@peridot:~/src/github.com/restic/restic$ ./restic -r b2:rustic2 backup .
using parent snapshot 770ba416
scan [/home/kurin/src/github.com/restic/restic]
scanned 1060 directories, 4850 files in 0:00
[0:03] 100.00%  58.130 MiB/s  174.391 MiB / 174.391 MiB (up: 6.916 KiB down: 22.178 KiB)  5910 / 
5910 items... ETA 0:00 
duration: 0:03, 54.99MiB/s
snapshot 221d49cf saved
Contributor

kurin commented Oct 24, 2017

I have a working implementation. It's a little rough though.

kurin@peridot:~/src/github.com/restic/restic$ ./restic -r b2:rustic2 backup .
scan [/home/kurin/src/github.com/restic/restic]
scanned 1060 directories, 4849 files in 0:00
[0:21] 100.00%  8.304 MiB/s  174.391 MiB / 174.391 MiB (up: 176.165 MiB down: 292.854 KiB)  5909 / 5909 ite... ETA 0:00 
duration: 0:21, 7.98MiB/s
snapshot 02500797 saved
kurin@peridot:~/src/github.com/restic/restic$ ./restic -r b2:rustic2 backup .
using parent snapshot 02500797
scan [/home/kurin/src/github.com/restic/restic]
scanned 1060 directories, 4849 files in 0:00
[0:01] 100.00%  174.391 MiB/s  174.391 MiB / 174.391 MiB (up: 0B down: 0B)  5909 / 5909 items  0 errors  ETA 0:00 
duration: 0:01, 88.16MiB/s
snapshot 770ba416 saved
kurin@peridot:~/src/github.com/restic/restic$ echo what > okay
kurin@peridot:~/src/github.com/restic/restic$ ./restic -r b2:rustic2 backup .
using parent snapshot 770ba416
scan [/home/kurin/src/github.com/restic/restic]
scanned 1060 directories, 4850 files in 0:00
[0:03] 100.00%  58.130 MiB/s  174.391 MiB / 174.391 MiB (up: 6.916 KiB down: 22.178 KiB)  5910 / 
5910 items... ETA 0:00 
duration: 0:03, 54.99MiB/s
snapshot 221d49cf saved
@fd0

This comment has been minimized.

Show comment
Hide comment
@fd0

fd0 Oct 29, 2017

Member

FYI, I'm currently reworking the archiver code, let's wait until that is done.

Member

fd0 commented Oct 29, 2017

FYI, I'm currently reworking the archiver code, let's wait until that is done.

fd0 added a commit that referenced this issue Jan 28, 2018

Remove archiver progress "data processed" bandwith
This commit removes the bandwidth displayed during backup process. It is
misleading and seldomly correct, because it's neither the "read
bandwidth" (only for the very first backup) nor the "upload bandwidth".
Many users are confused about (and rightly so), c.f. #1581, #1033, #1591

We'll eventually replace this display with something more relevant when
 #1494 is done.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment