Add content hash to ls --json #2870

wojas · 2020-08-04T10:06:56Z

Output of `restic version`

restic 0.9.6 (v0.9.6-337-g0b21ec44-dirty)

What should restic do differently? Which functionality do you think we should add?

Add a content hash to the ls --json output.

When the file was stored in a single chunk, the real sha256 has is available and the reported hash will have the format "sha256:...".

When the file was split into multiple chunks, it is not possible to show a real content hash, because restic does not currently store this information. We can however construct a hash out of the chunk hashes. To distinguish this from a real sha256 of the contents, this hash will have the format "multi:...".

This 'multi' hash can only be compared within a single repo, as different repos will split files in different locations.

I have a PR ready that I add in a moment.

What are you trying to do?

I am trying to figure out if any original pictures have succumbed to bitrot since my initial restic backup of them.

Did restic help you today? Did it make you happy in any way?

Keeping the full backup history of my pictures in an efficient way makes it possible to recover from bitrot in the future.

The text was updated successfully, but these errors were encountered:

aawsome · 2020-08-04T15:22:59Z

What you are basically looking for is a possibility to check if your local files match the state of a backup snapshot. I agree that this would be a nice and useful extension to restic. It is similar to #2011.

However, I don't agree that listing hashes in ls is a good option to solve this. As you already pointed out, hashes for files in the repository which have been split into more than one chunks are not available to restic.
Honestly, I dislike a lot the "hack" to create an artificial hash using the chunks hashes.

A solution using ls would need to print all chunk hashes together with the sizes of the chunks. This would allow an external program to split the files itself into the same chunks and check the hashes of these chunks.

But I would prefer to have this implemented into restic - either as extension to the restore command or as a new command, e.g. verify.

greatroar · 2020-08-05T10:05:44Z

I share @aawsome's objection. The suggestion is to introduce a hash that doesn't correspond to anything in the restic object model and is also not the hash of a file on disk, so its usefulness is very limited. There must be a cleaner way to compare files that also works with a file outside the repo.

wojas · 2020-08-05T10:22:45Z

I'm also not too happy with the 'multi' solution. Ideally restic would store the full file hash in the metadata, but this would impose a performance cost during backup. I don't think it's this is really worth it just for this.

My original approach just included the list of content hashes in the output. Unfortunately these content hashes are not very useful by themselves, unless of course you want to fetch the content. In order to check any local files, you would need to either know the chunk sizes or know the split secret and apply the same algorithm yourself.

This size information is currently unfortunately not available. Perhaps this is something we could add to the metadata and then print both Content and ContentSize slices? This would only work for newer snapshots, but that is OK I guess.

For my purposes of comparing files between different snapshots just having the list of content hashes would suffice. Would my PR be acceptable if would simply expose the list Content IDs in the JSON and get rid of the weird multi hash? I guess this is useful anyway, because it allows you to reconstruct the file contents.

I can also add a new issue to discuss the addition of a ContentSizes slice.

aawsome · 2020-08-05T11:37:27Z

This size information is currently unfortunately not available. Perhaps this is something we could add to the metadata and then print both Content and ContentSize slices? This would only work for newer snapshots, but that is OK I guess.

The chunk size information is available to restic, it is saved in the index. Simply use something like

for i, id := range node.Content {
   size[i] = repo.Index().Lookup(id,restic.DataBlob)[0].Length
}

However, I would still prefer to have a verify command or something similar within restic...

wojas · 2020-08-05T14:08:19Z

The chunk size information is available to restic, it is saved in the index. Simply use something like

Thanks! I will try this and see how it performs.

However, I would still prefer to have a verify command or something similar within restic...

I agree that this would be very useful. I currently do not have enough time to make any promises, but I may have a look at how much effort that would take.

This does however not preclude making the ls JSON output more useful for those that want to do some restic data mining.

MichaelEischer · 2020-10-10T09:28:46Z

@wojas Your use case sounds a lot like #805 to me. Storing a hash covering a complete file was already request in #1620.

wojas · 2022-01-06T10:26:13Z

I have closed the PR implementing this clunky solution and will close this issue. #805 and #1620 would both solve the original issue I wanted to address. Thanks!

wojas mentioned this issue Aug 4, 2020

Add content_hash to ls --json output #2871

Closed

8 tasks

MichaelEischer added state: need feedback waiting for feedback, e.g. from the submitter state: need triaging need categorizing, labeling, next-step decision labels Oct 10, 2020

wojas closed this as completed Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add content hash to ls --json #2870

Add content hash to ls --json #2870

wojas commented Aug 4, 2020

aawsome commented Aug 4, 2020 •

edited

Loading

greatroar commented Aug 5, 2020 •

edited

Loading

wojas commented Aug 5, 2020

aawsome commented Aug 5, 2020 •

edited

Loading

wojas commented Aug 5, 2020

MichaelEischer commented Oct 10, 2020

wojas commented Jan 6, 2022

Add content hash to ls --json #2870

Add content hash to ls --json #2870

Comments

wojas commented Aug 4, 2020

Output of restic version

What should restic do differently? Which functionality do you think we should add?

What are you trying to do?

Did restic help you today? Did it make you happy in any way?

aawsome commented Aug 4, 2020 • edited Loading

greatroar commented Aug 5, 2020 • edited Loading

wojas commented Aug 5, 2020

aawsome commented Aug 5, 2020 • edited Loading

wojas commented Aug 5, 2020

MichaelEischer commented Oct 10, 2020

wojas commented Jan 6, 2022

Output of `restic version`

aawsome commented Aug 4, 2020 •

edited

Loading

greatroar commented Aug 5, 2020 •

edited

Loading

aawsome commented Aug 5, 2020 •

edited

Loading