Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repository corrupted after prune from host where the rest-server is #1871

Closed
infestdead opened this issue Jul 2, 2018 · 13 comments · Fixed by restic/rest-server#195
Closed
Labels
backend: rest category: resilience preventing and recovering from repository problems state: need implementing cause/request established, need work/solution type: bug

Comments

@infestdead
Copy link

infestdead commented Jul 2, 2018

Output of restic version

restic 0.9.1 compiled with go1.10.2 on linux/amd64

How did you run restic exactly?

I'm making the backups via:

/usr/bin/restic -r rest:http://user:pass@192.168.1.3:8000/cirrus backup /home/ipm/ --exclude-file="/home/ipm/restic/exclude" -p /home/ipm/restic/pw

then I'm forgeting/prunging on the host where the rest-server is running (192.168.1.3) with:

/usr/bin/restic -r /media/ccf12954-b340-4a28-accb-eca692ec4612/restic/cirrus/ -p /root/restic/cirrus-pw forget --prune --keep-daily 30 --keep-monthly 6 --keep-yearly 3

I'm doing this locally because the rest-server is running in append only mode.

What backend/server/service did you use to store the repository?

rest-server 0.9.7

Expected behavior

After prune, I expect the repository to continue working properly.

Actual behavior

However, after prune I can no longer backup, I get the following when I run my original backup line from above:

open repository
repository e2e469c9 opened successfully, password is correct
lock repository
load index files
Load(<index/0dc6389d8c>, 0, 0) returned error, retrying after 513.516072ms: <index/0dc6389d8c> does not exist
Load(<index/aa054c5f90>, 0, 0) returned error, retrying after 734.704668ms: <index/aa054c5f90> does not exist
Load(<index/0dc6389d8c>, 0, 0) returned error, retrying after 1.070671441s: <index/0dc6389d8c> does not exist
Load(<index/aa054c5f90>, 0, 0) returned error, retrying after 392.08203ms: <index/aa054c5f90> does not exist
Load(<index/aa054c5f90>, 0, 0) returned error, retrying after 1.656041247s: <index/aa054c5f90> does not exist

If I try and work with the repository on the host where rest-server is running locally via restic -r /path/to/repo - then it all looks good, rebuild-index, check etc - all works and looks good.

Steps to reproduce the behavior

  1. backup via rest: a couple of tiles
  2. go to server where the rest-server is running
  3. run forget --prune locally in the folder where the repo is
  4. try backup from original remote host again via rest:

Do you have any idea what may have caused this?

No idea.

Do you have an idea how to solve the issue?

No.

Did restic help you or made you happy in any way?

Yes, great software, quick and intuitive :)

@fd0
Copy link
Member

fd0 commented Jul 2, 2018

Hm, thanks for the report. I think this is a bug within restic, but I don't have any idea what's going on yet. Can you please rebuild restic and create a debug log (instructions) and attach it to the issue? Please make sure to replace any sensitive details included in there (authentication tokens, passwords etc) before posting it.

Ideally, you would reproduce this (without a debug log) on a small repo, and once it happens, you switch on the debug log and only attach the run where it fails with the index errors. That'd be the most helpful.

Thanks!

@fd0 fd0 added type: bug state: need triaging need categorizing, labeling, next-step decision backend: rest state: need feedback waiting for feedback, e.g. from the submitter labels Jul 2, 2018
@infestdead
Copy link
Author

Hey, I'm having trouble building from source, I'm on Fedora 29 and when I try and build from source with debug I get:

go build: when using gccgo toolchain, please pass compiler flags using -gccgoflags, not -gcflags
go build: when using gccgo toolchain, please pass linker flags using -gccgoflags, not -ldflags
# github.com/restic/restic/internal/restic
github.com/restic/restic/internal/restic/node_linux.go:22:18: error: incompatible type for field 1 in struct construction (cannot use type syscall.Timespec_sec_t as type int64)
   {Sec: utimes[0].Sec, Nsec: utimes[0].Nsec},
                  ^
github.com/restic/restic/internal/restic/node_linux.go:22:39: error: incompatible type for field 2 in struct construction (cannot use type syscall.Timespec_nsec_t as type int64)
   {Sec: utimes[0].Sec, Nsec: utimes[0].Nsec},
                                       ^
github.com/restic/restic/internal/restic/node_linux.go:23:18: error: incompatible type for field 1 in struct construction (cannot use type syscall.Timespec_sec_t as type int64)
   {Sec: utimes[1].Sec, Nsec: utimes[1].Nsec},
                  ^
github.com/restic/restic/internal/restic/node_linux.go:23:39: error: incompatible type for field 2 in struct construction (cannot use type syscall.Timespec_nsec_t as type int64)
   {Sec: utimes[1].Sec, Nsec: utimes[1].Nsec},
                                       ^
build failed: exit status 2
exit status 1

Any suggestions? I'm very new to go.

@fd0
Copy link
Member

fd0 commented Jul 3, 2018

I know what's going on: There are two different compilers (=implementations) for Go: The Go project's official compiler called go, and the gcc port of it gccgo. It seems you've installed gccgo. We're only support and test with the official Go compiler. It looks like your system is using gccgo.

For Fedora, the correct package should be golang. Can you try installing it (and maybe remove gccgo)?

@infestdead
Copy link
Author

Thanks, that got it to work!

restic-debug.log

here's the debug log, my setup is local and testing only so far, so even If I've exposed some sensitive data - that's not a problem

Before running the debug I cleaned the local .cache/restic/ folder just in case and restarted the rest-server backend. Then ran restic backup again, and after a few lines of not found - I Ctrl-C'ed it and that's the debug log.

@infestdead
Copy link
Author

Hm, I think I found the issue (myself :) ).

The rest-server is running with user: files and when I forget/prune I used root by default without thinking - but the indexes get recreated by root - so when I try to use the rest backend again I get "not found" which confused me, in reality it's "permission denied" but shown as not-found because it can't fetch it probably. After a good recursive chown to the user which rest runs as (files) it all works as expected now.

@fd0
Copy link
Member

fd0 commented Jul 8, 2018

Ah, hm. So the error message is wrong, hm. This is an issue in the rest-server then, it should have returned a different error code and logged the error...

@fd0 fd0 added state: need implementing cause/request established, need work/solution and removed state: need feedback waiting for feedback, e.g. from the submitter state: need triaging need categorizing, labeling, next-step decision labels Jul 8, 2018
@garrmcnu
Copy link
Contributor

The rest server does read the OS error code (and logs "permission denied" to the console if --debug is specified), however it doesn't use the error code and always returns a HTTP StatusNotFound (404) status code. One option would be to map the OS error to a more specific HTTP status code (e.g. os.IsPermission() -> HTTP StatusForbidden (403)).
With the current version of restic, this logs "unexpected HTTP response (403): 403 Forbidden". This is because restic checks for HTTP StatusNotFound (and logs "does not exist") but logs "unexpected HTTP response" for any other error. Could add an additional check for HTTP StatusForbidden?

@fd0
Copy link
Member

fd0 commented Jul 29, 2018

Yes, I think we need to rework error handling in the REST server, badly :/

@garrmcnu
Copy link
Contributor

The changes needed to handle this specific issue are fairly minor. Do you have any thoughts for further improvements?

@fd0
Copy link
Member

fd0 commented Jul 31, 2018

I have in mind that the REST server code could use some cleanups, in terms of error handling and logging. It uses a custom log format at the moment (for debug log output I mean), and it does not contain enough information for my taste. I'd like to change it (eventually) to a format which prints one line per request, including the requested repo/path, the username, the action, and the result (similar to the common_log format for Apache/nginx). At the moment each log entry has multiple lines, making it hard to read.

@garrmcnu
Copy link
Contributor

garrmcnu commented Aug 2, 2018

That makes sense, in addition to the log format, think would still need to pass status back to the client. Not familiar with the nginx log format, but sounds like a good starting point.
Thanks

@meus
Copy link

meus commented Aug 20, 2019

I just ran into the same issue. Thanks to the hint with the wrong permissions I could resolve it pretty fast!
Still it would be nice to have a better error message in this scenario.
btw. thanks for all the work on restic!

@diditopher
Copy link

I also just ran into this issue. Thanks to this thread I was able to resolve it quickly, but I feel that running prune as root should not change ownership of the repository. At the very least it should output a warning. And the documentation at https://restic.readthedocs.io/en/latest/060_forget.html should caution against running prune as root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: rest category: resilience preventing and recovering from repository problems state: need implementing cause/request established, need work/solution type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants