Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restic prune failing - tree XXX not found in repository #2700

Closed
robertfoss opened this issue Apr 21, 2020 · 22 comments · Fixed by #3246
Closed

restic prune failing - tree XXX not found in repository #2700

robertfoss opened this issue Apr 21, 2020 · 22 comments · Fixed by #3246

Comments

@robertfoss
Copy link

Output of restic version

restic 0.9.6 compiled with go1.12.12 on linux/amd64

How did you run restic exactly?

$ sudo -E restic -r /media/auto/nas_robertfoss/restic prune
enter password for repository: 
repository 237fdd10 opened successfully, password is correct
counting files in repo
building new index for repo
[3:52:27] 100.00%  440951 / 440951 packs
incomplete pack file (will be removed): 01e37cc714147b7ec051e8eb4cf045e71c69b1b10e18eb83ed9fe9338c2d24d6
incomplete pack file (will be removed): 02166c70e25a51cb5b5b44d3b9702c0a540f6ca842c19e4f78bd1774ca782957
incomplete pack file (will be removed): 02b80c6de7f81142a5e096891069209b88a4da91e1c84ae316a40f2b5238bcd4
incomplete pack file (will be removed): 0be78f6a34de32f800ad05cd49347b27ca16ccc1185e7858481faf785aa9992a
incomplete pack file (will be removed): 0d3bddd90c2748e6f1acd70ed286283d346f17d8c012c2ed606dd1aa6bac2f06
incomplete pack file (will be removed): 11a70cf10e5fc08fe6875b8c988c17af3447c99c6082f82006e9dd14a26b635e
incomplete pack file (will be removed): 11c8c8ef5e80cd654115769b87a3bf7c1d4fefaf5a4ec226209f3627b7159969
incomplete pack file (will be removed): 1efc58add323f3e1e2aae1d751d1be6f26b90d1c5a408cee3edffe1de02e4b4b
incomplete pack file (will be removed): 2d1763064c0a39abf21377bcdf056ad6f4e12a7b93cd28b3dbb98002dd91813d
incomplete pack file (will be removed): 31c299e7f80a3836a394da0795e8ed65d7277dfed9e3863db013bc0b4d99d9c7
incomplete pack file (will be removed): 3557c70b0a76e3303ad51ac755a2ab577d6c041640cd015da345308f492b563b
incomplete pack file (will be removed): 36745b2ec4fffe49ecf6a9483d74a7a7c352e077ec7b830d4d162b0d16739a6c
incomplete pack file (will be removed): 44090b5084dbe0746a68151a5980d5078666b8cd92b889032a99a4297ceca7c3
incomplete pack file (will be removed): 4433545485727ef6406307e1d18ded514ea59e6230fba5deeba28f82436cbf71
incomplete pack file (will be removed): 4b0be68275743cdf4e69dd795840050d038eb69d3f6d0f3c36cfc5d7d01d991b
incomplete pack file (will be removed): 4c4dd542a0300096250e01b86f71dd2f5a2e9e7e4b2cb9242df953ab432814cf
incomplete pack file (will be removed): 4df36fec011b6add8541e9c20064322b2292297f931fe525f46c88c69db8d752
incomplete pack file (will be removed): 4f654a34453ba4ff32aabd25013081c660ee9895cfe1482d764fa53d31f392e3
incomplete pack file (will be removed): 4fa5793656a1514e2846a4ff57aa2357333f92eb29617682760d42b72771cd8b
incomplete pack file (will be removed): 512ba62d659a52d26ce0db2e4bf09da509cac1f418fe9db151bf2a5ac830cd5d
incomplete pack file (will be removed): 59a07eafe7a187400ab744521edd9800dede1064260dad1b3f32080d090c2b5a
incomplete pack file (will be removed): 60719e8a0247033dcd8bc1c89dfd34cdf67499e423145ab568b32ac9b209e00d
incomplete pack file (will be removed): 625243d5fee5a4bdd8dbb33d9864a375b424a15c9fd7d190d244f2f1eb4e8ec3
incomplete pack file (will be removed): 64ecbbf7d6365c73a1e431b64697eed338fc8e130800dc261f268611b07b8447
incomplete pack file (will be removed): 684040807b3515b1336cc22d5ebe750e3dd0fa7a12e99d5808b1ee834a784f81
incomplete pack file (will be removed): 6a4e046d6de8f89f4e9a7a87dba5bcbf43e1b150148646a20f67130e9421ade4
incomplete pack file (will be removed): 6b6d7766ac57c254e8ed539fcb1f74d2722d024fc231abe7e5b8d26e3367ef01
incomplete pack file (will be removed): 6e05c04f66e9e8335dd4b3f5217c0edf389ce3f2cd53362273edddc7b2026c76
incomplete pack file (will be removed): 6e3407887327ac22b1271d22096b0fa817701166da8a0d7b1200b6795fd0ca3d
incomplete pack file (will be removed): 70c4133e9a9262d537e1c46eb6b09df5c9c04b8028108f3ca7d1a4b9e614f293
incomplete pack file (will be removed): 70ce970d142d08dd86bf4f4e9e1f7c2b6d5dca3655e72bea9a74c56de0608d69
incomplete pack file (will be removed): 74dac8ca2412961d27289c47562af19b62251ba83db41655a809a2aa3e04d1c4
incomplete pack file (will be removed): 780de5f377e4db9d7b5d383b530bd8e87768268f4e36d8182e9b8dcfd62f1292
incomplete pack file (will be removed): 784ace35191fc681b3a90fba99654d6b3983dc9e224f5ee96ca50101e1cc3c57
incomplete pack file (will be removed): 7e6fd087f8bcb9d77d97284c05568f0bb79db6425129a4b6796b27c115196629
incomplete pack file (will be removed): 7fe7e27c4aa423d015e85a9eb8abe7939af51ef24a0fb0c39c46d7d1f4260b8a
incomplete pack file (will be removed): 84cac8089d827106faf06a5c8bc9446573c75f0ac233749130c81b7e3538b0b6
incomplete pack file (will be removed): 899ff4b1a0d24b818b3b106d950804a747ba2a7f72676b7bd9b81c48306eaadd
incomplete pack file (will be removed): 9397dbad057de654ed96524e16db2a7101811352ebcb08d153ab96a2628720fe
incomplete pack file (will be removed): 946ad13a3cec6953448fc8947bd3f418ea1be1a9d38ac39b80dd108ad7a797f9
incomplete pack file (will be removed): 9ae6b1f02d3e4f2008bf9411213b51e697a703bfddfb5383165b1b17fceaf3bc
incomplete pack file (will be removed): 9b7662aeab336fef8dfba60399631c92069fd8514aea97d1175d92f7c8aeade8
incomplete pack file (will be removed): a0b52520a5c2b9f4f477e8c4155e269a24aaad5ac1c12c36fbf56072b28f2036
incomplete pack file (will be removed): a15b2d9db7b0955b1a0ebf4f76364b4a87ed0c830adf539be8aa592922e19ba7
incomplete pack file (will be removed): a6f12ba5da6420a4171b866a5c114ab7e558820331dfa82c5fbc909371a1c96c
incomplete pack file (will be removed): a889ced3aaba4a3b6c7536a4262b703c90244d842e0808f942819443fdd8049d
incomplete pack file (will be removed): b574b68758b3ad8f3eaf16f26c1af4e127b1a93af17e6f9b1ec5a17ad322452c
incomplete pack file (will be removed): bc36699cfcc6babb8670c6a27eef2ed23acc7f091e82bbc7102017ffab792dc6
incomplete pack file (will be removed): bd7b650d802217450570976e56e43185b36910b0a12ff5b73bf04a07a95c3c50
incomplete pack file (will be removed): bf384616635115743e79b6c25ebd0e65bc7c8b3fa27090e3719837811b753f9b
incomplete pack file (will be removed): c55c9b36a53c3953306fd4f003750b9c164c3c95d634becdd68b36a9343c45d7
incomplete pack file (will be removed): c5d74ba19f733815d2f26cf21a23cde22328d20d162bf16f8a9ff3cfa3f7a1d4
incomplete pack file (will be removed): c6d47d52ac7cb2f67c59226da1f488bcd77cb4e539be0069c9131585b2da8d64
incomplete pack file (will be removed): ce6f710e03a54ce10f747531efcdf3d6e54edbab42de62ddfb8eb955b3b0caaf
incomplete pack file (will be removed): d44fa38cf9ffb990049db2bbf233e7dee56678304326b834bd67259f71989112
incomplete pack file (will be removed): d8832cf344b82f26c10a68c65694ebd1a21e51ac117d874cdc82c84f1eb1dfd2
incomplete pack file (will be removed): dbe65af86c8eea1ec694c25180e9a2ca46f0047d7c65f4b48789eb48cafd0934
incomplete pack file (will be removed): dc6adc442ffd91c8841ef40133b45d0e27ebe2f45eda0a5ca5b698daf828261d
incomplete pack file (will be removed): de6d33643239da4809e9ff6ccdbb66f49ca31df8ea33e8378ba60fee00c751f7
incomplete pack file (will be removed): e4015a03a9b69818f2ba631f37d0cc713a3a4c8e73c1aba92327cf0bbead69b9
incomplete pack file (will be removed): ea3bf0acf2326a2c730e01fc233916686e4c1243ea327110bce1b0639b78238a
incomplete pack file (will be removed): ea896919de15181889c941a093021f90ef5b8340bc9cd894b9a91ca93b75751f
incomplete pack file (will be removed): ee0af7c8d44a05b74083bf20d9116ddb80f93f3f907da8a7132c6d5861df03b2
incomplete pack file (will be removed): ef6c72c5c9e34bb96456e53e8f646376416a6194db2260200c6764a1e9626a9f
incomplete pack file (will be removed): ef6e0530aec863e408e09860ff91121df08f3870d0a251c78b2b2cbff05a6f9b
incomplete pack file (will be removed): f003cd6ebfa976771f8d7e0a76eb41d14c92d62d962c3c74fcc67dc87769d209
incomplete pack file (will be removed): f095ad5213e7b4bae17fb627db806f41f644f8412f9ae89b8dddb1e2c57cc69c
incomplete pack file (will be removed): f460b877db2fe4148cb67826c70f757dedbd0f4be44da59536e17ca2863f82fe
incomplete pack file (will be removed): fcfc1622fe2967de167c5a6b2f1838117daf1b26dc3d11a9731d16ef04cb5188
repository contains 440882 packs (21714968 blobs) with 2.009 TiB
processed 21714968 blobs: 1546276 duplicate blobs, 53.897 GiB duplicate
load all snapshots
find data that is still in use for 42 snapshots
tree 51a481d9674fe804a2900e71f9a95ea8c1083d2167629db4549e0c9693ecf79b not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadTree
	github.com/restic/restic/internal/repository/repository.go:713
github.com/restic/restic/internal/restic.FindUsedBlobs
	github.com/restic/restic/internal/restic/find.go:11
main.pruneRepository
	github.com/restic/restic/cmd/restic/cmd_prune.go:191
main.runPrune
	github.com/restic/restic/cmd/restic/cmd_prune.go:85
main.glob..func18
	github.com/restic/restic/cmd/restic/cmd_prune.go:25
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra/command.go:762
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra/command.go:850
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra/command.go:800
main.main
	github.com/restic/restic/cmd/restic/main.go:86
runtime.main
	/usr/lib/go-1.12/src/runtime/proc.go:200
runtime.goexit
	/usr/lib/go-1.12/src/runtime/asm_amd64.s:1337

What backend/server/service did you use to store the repository?

Local filesystem (backed by CIFS)

Expected behavior

Prune not failing

Actual behavior

Prune failing

Steps to reproduce the behavior

restic -r /media/auto/nas_robertfoss/restic prune

Do you have any idea what may have caused this?

Likely some corruption in the repository. Not that I have any particular reasons to think it is corrupt.

Do you have an idea how to solve the issue?

No

Did restic help you today? Did it make you happy in any way?

Not today, but most other days :)

@MichaelEischer
Copy link
Member

restic is currently not able to find some of the directories referenced in one of the snapshots. It might be either just missing from the index or the data can be actually missing. The repository contains quite a lot of incomplete pack files, which looks like several of your backup runs got interrupted for some reason?

Did you run prune or check before? If yes, did they complete successfully (probably yes?)? Please try to locate the affected snapshots by running restic find --tree 51a481d9674fe804a2900e71f9a95ea8c1083d2167629db4549e0c9693ecf79b. It would be interesting to know whether this is for example the latest snapshot or maybe some random older one.

Please post the output of ls -la for some of the incomplete packs. Pack fcfc1622fe2967de167c5a6b2f1838117daf1b26dc3d11a9731d16ef04cb5188 would be located in /media/auto/nas_robertfoss/restic/data/fc/ fcfc1622fe2967de167c5a6b2f1838117daf1b26dc3d11a9731d16ef04cb5188.

@robertfoss
Copy link
Author

robertfoss commented Apr 22, 2020

I just ran prune, I'll try check right away and return with some logs.

restic find --tree

$ sudo -E restic -r /media/auto/nas_robertfoss/restic find --tree 51a481d9674fe804a2900e71f9a95ea8c1083d2167629db4549e0c9693ecf79b
enter password for repository: 
repository 237fdd10 opened successfully, password is correct
Unable to load tree 51a481d9674fe804a2900e71f9a95ea8c1083d2167629db4549e0c9693ecf79b
 ... which belongs to snapshot 10fb5a8e6383ea0d0df0be97d0eee4b4f9dec202d8267bce2c36a9e543a140de.
Unable to load tree f020d28451380b6016988b68aedd3c651f77714d79ede165fafa1b625ade3705
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree b7dff0cb6f9623b90391b17058073e46d883ae8d69a9233c1d4fc7521b36368c
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree b7dff0cb6f9623b90391b17058073e46d883ae8d69a9233c1d4fc7521b36368c
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree b7dff0cb6f9623b90391b17058073e46d883ae8d69a9233c1d4fc7521b36368c
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 82daf085b93d2ffa9d20ced9c35407e8a4adc6d9207f50de4278299a30d1a170
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 82daf085b93d2ffa9d20ced9c35407e8a4adc6d9207f50de4278299a30d1a170
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 82daf085b93d2ffa9d20ced9c35407e8a4adc6d9207f50de4278299a30d1a170
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 82daf085b93d2ffa9d20ced9c35407e8a4adc6d9207f50de4278299a30d1a170
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 82daf085b93d2ffa9d20ced9c35407e8a4adc6d9207f50de4278299a30d1a170
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 32bcf4b2bdf9797bdc143caa6ed083531aa8906ef6e1b2a0160401b958e8a7fb
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 32bcf4b2bdf9797bdc143caa6ed083531aa8906ef6e1b2a0160401b958e8a7fb
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
Unable to load tree 32bcf4b2bdf9797bdc143caa6ed083531aa8906ef6e1b2a0160401b958e8a7fb
 ... which belongs to snapshot 1249e139297b7bd14d1bfbf8dacebbbccebbcbad479d3ed242bb7169e0b87d4f.
[followed by thousands of similar lines]

ls -la

$ ls -la /media/auto/nas_robertfoss/restic/data/fc/fcfc1622fe2967de167c5a6b2f1838117daf1b26dc3d11a9731d16ef04cb5188 
-rwxr-xr-x 1 robertfoss root 5865472 Aug  6  2019 /media/auto/nas_robertfoss/restic/data/fc/fcfc1622fe2967de167c5a6b2f1838117daf1b26dc3d11a9731d16ef04cb5188

@MichaelEischer
Copy link
Member

MichaelEischer commented Apr 22, 2020

Hmm, the size of the pack file seems somewhat reasonable. Can you run shasum -a 256 <filename> for that pack file?

check will also output a very long list of complaints. Probably even more that restic find did. The output of the latter command looks like the problem affects multiple snapshots and not just snapshot 1249e1392? Or is that snapshots the problem most of the time?

As a next step it would probably be best to run restic rebuild-index (please create a copy of the index-folder in the repository first) to reconstruct the index which might let the files show up again.

@robertfoss
Copy link
Author

robertfoss commented Apr 22, 2020

I ran restic rebuild-index previously while trying to fix these issues, I don't have any logs though but it didn't change the behavior of restic prune.

$ shasum -a 256 /media/auto/nas_robertfoss/restic/data/fc/fcfc1622fe2967de167c5a6b2f1838117daf1b26dc3d11a9731d16ef04cb5188  
d9b630df7bca16b3a873ed17d61be57923d5c8a2234687ef19c88671f168b7f3  /media/auto/nas_robertfoss/restic/data/fc/fcfc1622fe2967de167c5a6b2f1838117daf1b26dc3d11a9731d16ef04cb5188

As for if other snapshots are failing, the above restic find --tree run mentioned more snapshots (but I had to cut the logs short).

@robertfoss
Copy link
Author

robertfoss commented Apr 23, 2020

restic check

$ resic check /media/auto/nas_robertfoss/restic
using temporary cache in /tmp/restic-check-cache-150523380
created new cache in /tmp/restic-check-cache-150523380
create exclusive lock for repository
load indexes
check all packs
pack efa872a4: not referenced in any index
pack fa8f1f68: not referenced in any index
pack f25d7900: not referenced in any index
pack f9800651: not referenced in any index
pack ff8e0e0d: not referenced in any index
pack f71586d6: not referenced in any index
pack f509a9e3: not referenced in any index
pack f3268676: not referenced in any index
pack f614b4f9: not referenced in any index
pack fbcf100f: not referenced in any index
pack e84bf949: not referenced in any index
pack f558c503: not referenced in any index
pack f2bf5627: not referenced in any index
[...]

error for tree ce295984:
  tree ce295984: file "boards.h" blob 0 size could not be found
  tree ce295984: file "bsp.c" blob 0 size could not be found
  tree ce295984: file "bsp.h" blob 0 size could not be found
  tree ce295984: file "bsp_btn_ant.c" blob 0 size could not be found
  tree ce295984: file "bsp_btn_ant.h" blob 0 size could not be found
  tree ce295984: file "bsp_btn_ble.c" blob 0 size could not be found
  tree ce295984: file "bsp_btn_ble.h" blob 0 size could not be found
  tree ce295984: file "n5_starterkit.h" blob 0 size could not be found
  tree ce295984: file "nrf6310.h" blob 0 size could not be found
  tree ce295984: file "pca10000.h" blob 0 size could not be found
  tree ce295984: file "pca10001.h" blob 0 size could not be found
  tree ce295984: file "pca10003.h" blob 0 size could not be found
  tree ce295984: file "pca10028.h" blob 0 size could not be found
  tree ce295984: file "pca10031.h" blob 0 size could not be found
  tree ce295984: file "pca10036.h" blob 0 size could not be found
  tree ce295984: file "pca10040.h" blob 0 size could not be found
  tree ce295984: file "pca20006.h" blob 0 size could not be found
  tree ce295984: file "wt51822.h" blob 0 size could not be found
  tree ce295984, blob 7950188b: not found in index
  tree ce295984, blob 5f6f99e6: not found in index
  tree ce295984, blob 4503e8c6: not found in index
  tree ce295984, blob 6d86cb57: not found in index
  tree ce295984, blob 2a5e90eb: not found in index
  tree ce295984, blob 521e6984: not found in index
  tree ce295984, blob 7f2e2584: not found in index
  tree ce295984, blob 4c00c613: not found in index
  tree ce295984, blob 4c56e9f0: not found in index
  tree ce295984, blob 616cf24f: not found in index
  tree ce295984, blob 65cabc05: not found in index
  tree ce295984, blob 0a5639f2: not found in index
  tree ce295984, blob 92e5201b: not found in index
  tree ce295984, blob f643e389: not found in index
  tree ce295984, blob 62339edb: not found in index
  tree ce295984, blob 1a89ffc6: not found in index
  tree ce295984, blob c8798dd0: not found in index
  tree ce295984, blob 140be077: not found in index
error for tree c5328a11:
  tree c5328a11: file "bsp_btn_ble.c" blob 0 size could not be found
  tree c5328a11: file "wt51822.h" blob 0 size could not be found
  tree c5328a11, blob 521e6984: not found in index
  tree c5328a11, blob 140be077: not found in index

@MichaelEischer
Copy link
Member

Do you know whether some of your backup runs were interrupted by maybe network problems? 67 incomplete packs should not appear out of nowhere, unless this is a problem of the underlying storage, it should take several interrupted backup runs to accumulate that many incomplete packs.

Could you run ls -la for a few other incomplete pack files? I wonder whether these were all created around the same time or are spread across a longer period of time.

You could run restic backup --force <backup-set> to let restic reread all files and maybe recover some of the missing blobs. Did you cut off the output from check? It just seems to complain about data blobs and not about missing tree blobs as prune did.

@robertfoss
Copy link
Author

Some runs have probably been interrupted. I've run daily backups using restic for ~2 years, and 67 interrupted backups does not surprise me.

Yeah, I truncated the restic check output, here's the full restic_check.log & restic_rebuild-index.log log.

I tried looking up some random from the restic find --tree run above using ls -la, but I none of them seem to exist. Where do I find file-paths to the incomplete packs?

@MichaelEischer
Copy link
Member

The check log scares me a bit. It claims that 10% of all packs in your repository are not contained in the index. This either means that either your repository index is totally broken (but that should have been fixed by rebuild-index by now), that CIFS from time to time fails to list all files (prune and rebuild-index reported the same number of packs, so that's unlikely), that 10% of all packs are damaged (I hope that's not the case) or there were some problems with reading the packs files or the index. Did the check run for the logfile happen after or before the rebuild-index run? If it ran before, could you run check again just to see whether is still complains about thousands of not referenced pack files?

Then to rule out the first two theories of what's happening:

  • Please run restic rebuild-index -v using the current master branch, either build it yourself or use the binary from https://beta.restic.net/?sort=time&order=desc . That version of rebuild-index will report which pack files could not be read and are therefore not contained in the index. My hope is that it will only complain about the 67 packs about which the initial prune run also complained.
  • Locate the repo cache using restic cache (last line of the output) and empty the index folder in the directory whose name starts with 237fdd10 (the repository id printed by the prune run). Then run a command that uses the index, e.g. restic find --tree ... which refills the cache folder. The index folder in the cache should now contain the same files as the index folder in the repository (except for the intermediate directories). To get a sorted list of the files in the cached index you could use find /path/to/cache/index -type f -exec basename '{}' \; | sort. That command also works for listing the files in the repository. Then use diff to check for differences. The pack lists should be identical.
  • Does check still complain about thousands of packs that are not contained in an index?

The find --tree command outputs snapshots and tree ids. The snapshots are just files with that name in the snapshot folder of the repository. For the trees it's far more complicated: restic groups many trees into a pack file which is then stored in the data folder. The command restic find --show-pack-id --tree <...> could normally be used to locate the storage location of a tree. However, that only works if the index contains the location of the tree. And that is not the case as otherwise check or prune would not fail.

@robertfoss
Copy link
Author

The restic check was run both before and after a restic rebuild-index

$ restic version
restic 0.9.6 (v0.9.6-193-g070d43e2) compiled with go1.13.8 on linux/amd64
$ rm -rf ~/.cache/restic/237fdd1007c6bdb78b699bdc8beca127076634ff4f07e461ade53c519748f502/index/*
$ find ~/.cache/restic/237fdd1007c6bdb78b699bdc8beca127076634ff4f07e461ade53c519748f502/index/ -type f -exec basename '{}' \; | sort > ~/tmp/restic_index_cache.log
$ find /media/auto/nas_robertfoss/restic/index/ -type f -exec basename '{}' \; | sort > ~/tmp/restic_index_repo.log
$ diff ~/tmp/restic_index_repo.log ~/tmp/restic_index_cache.log | wc -l 
0

Running restic rebuild-index -v and saving the logs now.

@robertfoss
Copy link
Author

robertfoss commented Apr 30, 2020

restic rebuild-index -v

log: restic_rebuild-index.log

@robertfoss
Copy link
Author

restic check

log: restic_check.log

@MichaelEischer
Copy link
Member

The log for rebuild-index looks reasonable, so it seems that just the 67 packs are damaged and everything else is still there and readable. Could you run check again to see about which trees and blobs it complains?

I currently don't see any obvious reason for the missing trees/blobs. You could start repairing the repository as a next step using restic backup --force ... and then remove snapshots that are still damaged afterwards.

@robertfoss
Copy link
Author

I ran restic check after rebuild-index already, would you still like me to run it again?

I'll go ahead an run backup force then, and then verify it using rebuild-index.

@MichaelEischer
Copy link
Member

MichaelEischer commented May 2, 2020

Oh, sorry I somehow didn't see the log for the check run. Since check has reported that there are no errors in the repository, you should be able to just run prune as you initially wanted to.

This starts to look a bit like CIFS/SMB temporarily "forget" to list some of the pack files in the repository. At least that's the most reasonable explanation I have for why it helped to run rebuild-index several times...

Just as an explanation: Without missing data in the repository backup --force will just behave like a slow normal backup run. If data were missing from the repository, you could run check afterwards (after backup --force) to see if that fixed some of the errors. Running rebuild-index is not necessary, that command is mostly used as the first step when recovering repositories in order to ensure that the repository index matches the actual content.

@robertfoss
Copy link
Author

I've run restic backup --force too, but restic prune is still failing.

restic_prune.log

@robertfoss
Copy link
Author

Maybe trying to figure out what went wrong will be impossible, and the better approach is just to migrate the data. Does there exist a tool for migrating snapshots from this broken repo, to a new repo?

@MichaelEischer
Copy link
Member

The prune log is cut off. At least the part up to the incomplete line that warns about an incomplete file, does not show any errors.

You could use #2606 to copy snapshots to a new repository.

@drzraf
Copy link

drzraf commented Aug 14, 2020

Very similar scenario with 0.9.6 but I had the opportunity to log all output of the various steps.
Note: My connection is unreliable during backup but I expected the backups not to be corrupted anyway.

01-backup.txt
02-check.txt
03-prune.txt
04-rebuild-index.txt
05-check-2nd.txt
06-prune-2nd.txt
07-find.txt

I followed the above suggested procedure and still see corruption/non-found warnings. What would help?

@MichaelEischer
Copy link
Member

MichaelEischer commented Aug 16, 2020

@drzraf Could you run rebuild-index again and check whether that solves the check/prune issues? For some reason only the first prune run complains that bac665f6 is an incomplete pack file. The second prune run no longer prints that warning, which would mean that rebuild-index should also be able to read that pack file and fix the tree errors.

Running the backup again and then removing old snapshots should also fix the errors you're seeing.

For future reference, here's a summary of the important log parts:

01-backup.txt: Save(<data/bac665f66a>) returned error, retrying after 432.240701ms: client.PutObject: Timeout when reading or writing data This is the pack file that later on seems to be causing problems.
The check run 02-check.txt works as expected and 03-prune.txt complains about incomplete pack file (will be removed): bac665f66a0005b7596780fb862380b3b1666427bbe14c6a79595feb5963af4a and aborts a bit later to avoid damaging the repository.
rebuild-index currently doesn't warn about invalid pack files (the current master branch will print a warning when called with -v). But as 05-check-2nd.txt shows bac665f6 is no longer contained in the index: pack bac665f6: not referenced in any index. This causes the missing tree errors shown by check and prune.
The last prune run 06-prune-2nd.txt fails, but no longer complains about bac665f6 being an incomplete pack file.

So it looks like restic received the incomplete pack from the first failed upload attempt up to log 05-check-2nd.txt. The prune run afterwards seems to have received the properly uploaded version.
Edit: What we're might be seeing here is some effect caused by the eventual consistency of Swift. I don't know enough about the precise consistency guarantees (at the moment) to tell whether that is actually the case or whether we can prevent that such an upload error can cause problems.

@drzraf
Copy link

drzraf commented Aug 26, 2020

Subsequent rebuild-index worked + backup seem to fixedthis.
But when I prune again, I got this error:

will delete 16 packs and rewrite 120 packs, this frees 302.299 MiB
hash does not match id: want e1ee8d8a29fd16633ae622e2c18c993ae3984d77d93d5641917e6eb0cf7a6687, got 97b186d7fdc6a7cf237f803ce233702fd90d6ea6e672d556c07190dda8bbafa2
[with a stacktrace]

I'll redo the same step, but I wonder if something couldn't be improved about backup integrity out-of-the-box (or if there are issue already tracking that behavior)?

@rawtaz
Copy link
Contributor

rawtaz commented Aug 26, 2020

I'll redo the same step, but I wonder if something couldn't be improved about backup integrity out-of-the-box (or if there are issue already tracking that behavior)?

Can you clarify what you mean? Out of the box restic produces backups that does have integrity and that you can verify the integrity of.

@MichaelEischer
Copy link
Member

MichaelEischer commented Aug 26, 2020

hash does not match id: want e1ee8d8a29fd16633ae622e2c18c993ae3984d77d93d5641917e6eb0cf7a6687, got 97b186d7fdc6a7cf237f803ce233702fd90d6ea6e672d556c07190dda8bbafa2

This error message tells me two things: Firstly the affected pack file was fully uploaded to the storage backend (as prune would otherwise have failed during the initial reindexing) and secondly the file was somehow damaged.

Can you download the file from data/e1/e1ee8d8a29fd16633ae622e2c18c993ae3984d77d93d5641917e6eb0cf7a6687 manually (restic cat pack e1ee8d8a29fd16633ae622e2c18c993ae3984d77d93d5641917e6eb0cf7a6687 should also work) and run shasum -a256 <file> on it? If this still yields 97b186d... then the file was probably damaged before/during the upload. As your logfiles indicate that the backend connections runs over HTTPS, that would leave two locations for where the bitflip (?) could have happened: On your local computer or somewhere in the Swift Proxy/Storage nodes in the backend.

The "hash does not match id" error was already discussed in great depth in #1999, and also in #1596 (and probably many more). There are also a few ideas to salvage blobs from damaged packs, e.g. #1727. I'm not aware right now of an issue to track adding end-to-end checksums for the backends (see #804 (comment) for some elaboration on that topic). For the download step there's #2302 with a suggestion to improve handling of bit-flips.

[Edit]I've noticed that my interpretation of the error message is only fully accurate for the latest master branch. With restic 0.9.6 there is a chance that the initial rebuild-index check does not fail. However, if I understood you correctly then the check command did complete successfully after rebuilding the index? In that case we know for sure that all relevant pack files have a valid pack header (i.e. the were fully uploaded) and then my previous reasoning applies unchanged.[/Edit]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants