Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show history of file #3073

Open
aawsome opened this issue Nov 8, 2020 · 6 comments
Open

Show history of file #3073

aawsome opened this issue Nov 8, 2020 · 6 comments

Comments

@aawsome
Copy link
Contributor

aawsome commented Nov 8, 2020

Output of restic version

restic 0.11.0 (v0.11.0-42-g9e4e0077) compiled with go1.14.7 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

Add a possibility to show some kind of "history" for one or more file(s). An option would be to add an option to restic find.

E.g. restic find --history --long /data/my_file.txt might produce something like:

Found matching entries in snapshot 3e8ff4a9 from 2020-02-04 03:53:08 (+3 subsequent snaphots)
-rw-r--r--  1000  1000      6 2020-02-04 03:41:48 /data/my_file.txt

Found matching entries in snapshot b41a0aa2 from 2020-02-04 04:10:27 
-rw-r--r--  1000  1000      6 2020-02-04 03:58:10 /data/my_file.txt

Found matching entries in snapshot c3d8da1e from 2020-02-04 04:17:20 (+1 subsequent snaphots)
-rw-r--r--  1000  1000      6 2020-02-04 04:15:23 /data/my_file.txt

Of course, if using find, we should also sort/group the snapshots by paths and date. Just realized that this actually is not the case.

What are you trying to do? What problem would this solve?

If a file is backuped by a automated procedure, it will be usually be contained in many snapshots.
Now imagine you need this file and just realize it has been "damaged" (e.g. by a user trying to work on it), You may want to get the last "undamaged" version from your backup.

However, restic so far can only produce a list of snapshots where the file is contained and you have to manually go through all of those to find the version. This may even apply if the file was just changed a few times. So it might be handy to have restic dermine how many different "versions" of this file really exist in the backup and which snapshots can be used to access those.

Did restic help you today? Did it make you happy in any way?

Backing up shared directories (where many users can write files) with restic makes me feel much more relaxed. Having many users with write access increases the risk of errors by mistake a lot. I'm happy to have a very good backup utility with restic here which simply works!

@greatroar
Copy link
Contributor

This is potentially very useful, but I'm not convinced it warrants a the exact feature proposed here. If you mount your repository, you can already do script something very similar with ease:

cd $restic_mountpoint/hosts/laptop

# [^l]* skips "latest"
for f in [^l]*/data/myfile; do
    stat "$f"
    sha256sum "$f"
done

@cfbao
Copy link

cfbao commented Nov 8, 2020

@aawsome
Copy link
Contributor Author

aawsome commented Nov 12, 2020

I know, that if you trust the mtime, it is also sufficient to use

restic find --long  /data/my_file.txt | grep -v "Found matching\|^$" | uniq

which would produce in my example

-rw-r--r--  1000  1000      6 2020-02-04 03:41:48 /data/my_file.txt
-rw-r--r--  1000  1000      6 2020-02-04 03:58:10 /data/my_file.txt
-rw-r--r--  1000  1000      6 2020-02-04 04:15:23 /data/my_file.txt

(still need to find the snapshots where those "versions" are referenced in)

And with some more scripting, of course you could get the wanted result using restic mount as suggested by @greatroar or by an enhanced version of restic find or restic ls where the SHA256 of the files is computed for each snapshot.

But this feature proposal is about adding this functionality to restic directly, as I can imagine that this is something users may ask themselves quite often and those users maybe already use something like restic find.

Moreover, I don't think that using the files hashes is a good way to go. We would need to load all blobs for each snapshot to hash them. For small files the cache in mount would do, but for large files this can by very slow. I would simply use the Contents + other metadata of the restic.Node to check for differences. This would allow to get the results reading only tree blobs.

Thanks @cfbao for your references. Didn't find the discussion in the forum. Maybe this is some sort of duplicate of #2072.

@lorenz
Copy link
Contributor

lorenz commented Jun 20, 2021

I want to develop a small graphical utility similar to what Déja Dup/Nautilus provide where you can right-click a file and restore old versions or right-click in a directory to restore files. What I'd like to present to the user is exactly what's being asked here, i.e. a list of versions and the time at which they were snapshotted. Any pointers to how I should do this and what I need to consider?

@deliciouslytyped
Copy link

deliciouslytyped commented Mar 5, 2022

This would be nice for managing files. I'm not sure mtime can be trusted in all cases, and re-hashing a multi-terabyte remote repository is not an option.

@mfschumann
Copy link

mfschumann commented Jul 27, 2023

I was looking for file history functionality and ended up using restic find --json with some jq magic:

restic find $search_string --json | | jq '[.[] | .snapshot as $snapshot | .matches[] | {snapshot: $snapshot, mtime: .mtime, path: .path}] | group_by(.path)[] | unique_by(.mtime)[] | (.mtime + " " + .snapshot + " " + .path)'

This outputs a list of all the historic modifications of the files matching $search_string side-by-side with the id of the first snapshot recording each modification. For a situation where fileA and fileB each have a single historic version recorded in the same snapshot, and fileC and fileD each have two historic versions, the output looks like this:

"2022-06-14T07:46:40+02:00 e5d69d00ead5366d8b1cc066fbed1e802483799f3f541704922512ea679f5a09 /fileA"
"2022-05-28T20:11:23+02:00 e5d69d00ead5366d8b1cc066fbed1e802483799f3f541704922512ea679f5a09 /fileB"
"2023-06-20T20:09:22+02:00 e5d69d00ead5366d8b1cc066fbed1e802483799f3f541704922512ea679f5a09 /fileC"
"2023-07-27T06:46:14+02:00 16c01993a608df36a25b4c3981a86d75388a11998b0aff9d689729f97b7cfbe7 /fileC"
"2023-06-22T20:46:47+02:00 838071291ecb5a6629b70bdcc74b28c4fd9236a3e90883ee9d6fd1e26c03f331 /fileD"
"2023-07-03T19:39:42+02:00 e5d69d00ead5366d8b1cc066fbed1e802483799f3f541704922512ea679f5a09 /fileD"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants