New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Backup data read from stdin #255

Closed
fd0 opened this Issue Aug 10, 2015 · 14 comments

Comments

Projects
None yet
7 participants
@fd0
Member

fd0 commented Aug 10, 2015

This issue tracks implementing the feature that restic reads data from stdin and saves it to the repo (with dedup etc).

I can imagine several use cases, e.g.:

  • Saving SQL dumps directly: mysqldump db | restic backup -
  • Saving complete devices: dd if=/dev/sda | restic backup -

Before implementing this, we need to decide how to save this in the json data structures. I'd like restic to behave consistent with this, e.g. when the repo is mounted with the fuse backend, data read from stdin should appear as a file. I'd also like to be able to write this data to a file.

Any other thoughts or use cases that haven't been mentioned so far? Do you like the syntax (restic backup -), or should we rather use restic backup --stdin?

@bchapuis

This comment has been minimized.

Contributor

bchapuis commented Aug 10, 2015

The shorthand for stdin - is good but what about making it optional with something like mysqldump db | restic backup. It's closer from pipe chains such as cat foo | grep bar | sort | less.

@cfcs

This comment has been minimized.

cfcs commented Aug 11, 2015

I'm a fan of - for explicitness. If a shells script uses a variable to specify the target, and that target turns out to be some reason (by mistake or due to filesystem errors or whatever), I'd like an error message and a negative return value to be cast rather than having it block for input and uploading that.

@bchapuis

This comment has been minimized.

Contributor

bchapuis commented Aug 28, 2015

I just had a look at this issue and at the way chunking is implemented. Restic reads files twice, once for chunking and once for persisting. ATM, if we want to persist data from stdin without refactoring the chunker, the best solution would probably be to save stdin in a temporary file and to backup this file. WDYT?

@rawtaz

This comment has been minimized.

Contributor

rawtaz commented Aug 28, 2015

Sounds like a lot of writing to do, and requiring disk space. This versus just The normal reads and no disk space :)

@bchapuis

This comment has been minimized.

Contributor

bchapuis commented Aug 28, 2015

Yes, but I don't think it's possible to read from stdin twice without writing to a temporary location. If the typical use case is a mysql dump, then keeping everything in memory until the second read may not be feasible.

@fw42

This comment has been minimized.

Member

fw42 commented Aug 28, 2015

if we want to (...) without refactoring the chunker

Why do we not want to refactor the chunker? If that's the cleaner solution, lets do that.

@fd0

This comment has been minimized.

Member

fd0 commented Aug 30, 2015

Absolutely! If the chunker interface needs to be refactored, let's do that! It's our own component, not a holy cow ;)

@pvgoran

This comment has been minimized.

pvgoran commented Oct 14, 2015

What I would like to see is the ability to backup (and then restore!) special files in general, not just stdin. My immediate use case is backup of the VM image residing in a LVM logical volume. Currently I don't see a way of backing my image with restic, except for copying it to a temporary file.

Out of three similar backup programs - attic, borg and restic - only borg supports this much-needed and obvious feature with –read-special flag on backup and --stdout flag on restore.

@pvgoran

This comment has been minimized.

pvgoran commented Oct 14, 2015

Regarding reading files twice. I think this is a bad thing for at least two reasons:

  • Reading data twice slows things down. In many cases this won't be noticeable due to caching, or because read speed is not a limiting factor. On the other hand, I can easily imagine how it can become a limiting factor for a multi-core machine with not-so-fast disks doing backups of large files.
  • If a file is being changed during backup, the second read pass will possibly see different data which may potentially confuse the backup algorithms to the point of crash, invalid backup or repository corruption.
    • These are just speculative worst cases. But even if the algorithms are explicitly protected from changing data, this protection may be incomplete, or introduce some undesired side effects.
    • Of course, in a perfect world we would only backup frozen data, but in real world enforcing this requirement would be too cumbersome in many cases (personally I'm only prepared to go as far as creating a snapshot for my VM volume, but not snapshotting and remounting all of my filesystems), which would violate the first design goal.
@fd0

This comment has been minimized.

Member

fd0 commented Oct 14, 2015

I think going forward and refactoring restic to do single-pass is a good thing, including refactoring the chunker interface. This simplifies the design, gives us a better handle at resource (especially memory) usage by limiting the number of large buffers available and makes reading e.g. from stdin possible.

Is it still necessary to have something like borg's --read-special? Or is reading from stdin sufficient?

@pvgoran

This comment has been minimized.

pvgoran commented Oct 14, 2015

A separate flag is definitely worth it:

  • We may need stdin for other purposes (password entry, anyone?).
  • Specifying a special file on the command line will yield a sensible filename/target recorded in the snapshot.
@fd0

This comment has been minimized.

Member

fd0 commented Oct 14, 2015

Your second argument is convincing.

@fd0 fd0 removed the rfc label May 8, 2016

@fd0 fd0 referenced this issue May 8, 2016

Merged

Allow reading data from stdin #509

2 of 2 tasks complete

@fd0 fd0 self-assigned this May 8, 2016

@fd0 fd0 closed this in #509 May 9, 2016

@angristan

This comment has been minimized.

angristan commented Jun 10, 2018

Hello,

At the moment does deduplication work with stdin snapshots?

@fd0

This comment has been minimized.

Member

fd0 commented Jun 10, 2018

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment