Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic REST backend #253

Closed
wants to merge 22 commits into from
Closed

Basic REST backend #253

wants to merge 22 commits into from

Conversation

bchapuis
Copy link
Contributor

@bchapuis bchapuis commented Aug 4, 2015

I started working on a very basic REST backend (#23) and test server (server.go). At this point, the test server must be started in the background before running the tests. The I'm not really familiar with go and I don't really know how to stop the server after the execution. Feedbacks appreciated... ;)

Connects #23

@fd0
Copy link
Member

fd0 commented Aug 4, 2015

Hey, thanks for your contribution! We need to get this PR in shape before it can be merged, but that shouldn't be too hard. I'll have a look later today.

@bchapuis
Copy link
Contributor Author

bchapuis commented Aug 4, 2015

It wasn't really intended for a direct merge at this point. The current PR doesn't care about HTTP Auth which is probably the minimal requirement, except if the backend is for demo purpose only. Do you have something in mind regarding this point?

@fd0
Copy link
Member

fd0 commented Aug 4, 2015

I don't have the time to look at the code directly. The server component should be split out as far as possible and the tests could be done with a mock-server (should be easy to implement). I'll have a look at this later. HTTP auth is a good idea.

@bchapuis
Copy link
Contributor Author

bchapuis commented Aug 5, 2015

Ok, the tests are now passing. Regarding the basic authentication, is seems possible to add the credentials in the url (http://username:password@host/)instead of the Http Headers. It's probably a good option to keep things as simple as possible.

@fd0
Copy link
Member

fd0 commented Aug 5, 2015

Cool, sounds good. I didn't yet have the time to look at your code (just to give you a heads-up).

@fd0
Copy link
Member

fd0 commented Aug 5, 2015

In the following, I'll dump my thoughts I had today:

Could you give a high-level overview of how the REST server should look like and what it should do? E.g., what are the paths, methods etc? We should document this below doc/ somewhere so people can easily reimplement the server part in something else than Go. In addition, I'd like to support different 'paths', so it should be possible to specify the REST server URL like this http://user:password@host/foo/bar/repopath, so e.g. the config can be accessed at the path /foo/bar/repopath/config.

I'm thinking about adding a 'server' command to restic that can later be used to start different servers, e.g. REST, a remote server for stdin/stdout etc. Are you also interested in implementing the REST server part? Like, run restic server --repo /tmp/repo --bind-address 127.0.0.1 --port 8080 --user foo --password bar or so, which starts an HTTP server that serves the repo located at /tmp/repo via REST (but doesn't try to open the repo, just serving), so that this can be started somewhere remote and accessed by a local restic instance. (This can also be done in a different PR)

@fd0
Copy link
Member

fd0 commented Aug 5, 2015

Ok, first round of comments is done 😁

@@ -108,6 +108,7 @@ func testBackend(b backend.Backend, t *testing.T) {
OK(t, err)

found, err := b.Test(tpe, id.String())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this empty line?

@bchapuis
Copy link
Contributor Author

bchapuis commented Aug 6, 2015

I'm not sure if the cleanup is necessary since the router is already extracting variables from the path. In this case, a relative path can not be injected in the variable.

@fd0
Copy link
Member

fd0 commented Aug 6, 2015

Even if it isn't necessary I'd like to implement it, this approach follows the "defense-in-depth" or "defensive programming" paradigm and I think it's a good idea. It's not much additional code (a call to filepath.Clean() and possibly checking if the filename is a valid ID), we can just write a function for that.

@fd0
Copy link
Member

fd0 commented Aug 6, 2015

Another thought I had today: The REST backend can (with minor modifications) directly be used to restore a backup that is served via HTTP, e.g. by an Apache web server. In this case locking wouldn't work (because adding new files doesn't work), but we could ignore that with a command-line flag. Very cool!

@bchapuis
Copy link
Contributor Author

bchapuis commented Aug 7, 2015

From a backend perspective the more defensive approach would be to test the types and the blobIDs. Since we know the types and the blobIDs are sha256 sums it shouldn't be too hard to implement very strong checks on the server side.

I just tried an approach that validates the "type" and the "blobID" but it seems that the generic tests are not using valid sha256 sums. I don't want to change the tests at the moment since it will have an impact on the other backends. A different PR would be more suitable. What do you think?

var blobTypes = []string{"data", "snapshot", "index", "lock", "key", "temp"}

func isBlobType(blobType string) bool {
    for _, t := range blobTypes {
        if blobType == t {
            return true
        }
    }
    return false
}

func isBlobID(blobID string) bool {
    if len(blobID) != sha256.Size {
        return false
    }
    if _, e := hex.DecodeString(blobID); e != nil {
        return false
    }
    return true
}

@bchapuis
Copy link
Contributor Author

bchapuis commented Aug 7, 2015

Regarding the command line for starting a server in restic, I think it would make sense to have a very small read only http backend over a local directory. However, I wonder if it is the right place to build a bigger backend, since it will probably come with other dependencies. What do you think?

@fd0
Copy link
Member

fd0 commented Aug 7, 2015

From a backend perspective the more defensive approach would be to test the types and the blobIDs. Since we know the types and the blobIDs are sha256 sums it shouldn't be too hard to implement very strong checks on the server side.

I agree. We could just use filepath.Clean additionally (even if it isn't required).

I just tried an approach that validates the "type" and the "blobID" but it seems that the generic tests are not using valid sha256 sums.

Oh, they are valid, you just have a small bug in your code: In the representation as a string, the string length is 2*32 == 64, each byte is represented by two characters in hex.

The function isBlobType() should be implemented in https://github.com/restic/restic/blob/master/backend/interface.go#L8-L15 (because that's where the blob types are defined), and it should be exported (as IsBlobType()), or maybe move the blob type definition and this function to its own file. When comparing, use backend.IDSize instead of sha256.Size https://github.com/restic/restic/blob/master/backend/id.go#L11

isBlobID() is nearly the same as ParseID() https://github.com/restic/restic/blob/master/backend/id.go#L17 so we can just use that.

Regarding the command line for starting a server in restic, I think it would make sense to have a very small read only http backend over a local directory. However, I wonder if it is the right place to build a bigger backend, since it will probably come with other dependencies. What do you think?

I haven't made my mind about that. The read-only HTTP backend has at least the problem that locking isn't possible because for that a blob of type lock must be created, so we need to have a special case for that. But I think a basic server (read and write) isn't so much code at all, and I think we don't need so many dependencies outside the stdlib. I agree that this server part should be added in a different PR, so we can concentrate on the backend here.

What server implementation do you use for testing at the moment?

return nil, err
}

req.Header.Add("Range", "bytes="+string(offset)+"-"+string(offset+length))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work, see http://play.golang.org/p/uVA6JAf1qD

You can just use fmt.Sprintf("bytes=%d-%d", offset, offset+length).

I wonder why the GetReader() isn't tested at all, and I've added #254 to track adding such a test.

@fd0
Copy link
Member

fd0 commented Aug 7, 2015

Thanks again for your contribution! I like it, and the documentation for the REST server interface is good for now. 😃


rb.final = true

// Check key does not already exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do you think we can just skip the HEAD request here and just execute the POST below? In the (rather unlikely) event that a blob already exists, the POST request returns an error. This way we could save a round-trip (just one HTTP request), but maybe upload a large amount of data before receiving an error...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the internals of the golang http server. However, if the call on HandleFunc is performed immediately when the http headers are received, it may be possible to cancel the POST request before receiving the full Body of the request (multipart). It would be something similar to the HEAD request with one round-trip. I will remove this call for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This saves a round-trip for each newly added blob. Because the error condition we're trying to avoid here (uploading a duplicate blob) is unlikely, this should speed up the regular case (new blob) greatly.

@fd0
Copy link
Member

fd0 commented Aug 7, 2015

Other minor things I noticed:

  • Usually, git commit messages have a very terse summary in the first line of the commit message, followed by an empty line, followed by a more verbose description or a list of changed things. For examples, see http://chris.beams.io/posts/git-commit/
  • If you change/add multiple different things that aren't related at all, try to make several smaller commits (e.g. one commit for the documentation changes, one for adding the methods for the blobs etc.). This is much easier to review. Using git add -p allows staging and committing only some changes.

("Minor things" here means "you don't need to change existing commits, but try to apply the tips next time")

@fd0
Copy link
Member

fd0 commented Sep 10, 2015

I"ll have a look in the evening.

@fd0
Copy link
Member

fd0 commented Sep 11, 2015

I had a look, the commit indeed caches trees aggressively, but this needs A LOT of RAM (I cancelled my test backup after ~6GB)... So let's finish the REST backend in this PR and I'll deal with the tree cache in another.

@bchapuis
Copy link
Contributor Author

Ok, looks pretty bad. I only tested it with small backups (~3GB) and didn't noticed this memory issue. This is strange, in that case the files loaded in memory represents only 35MB. Let's focus on the REST backend for now.

@fd0
Copy link
Member

fd0 commented Sep 13, 2015

I had a look, and the backend works really well. I came across the following issues:

  • REST server URLs are only accepted with at least one slash / character in the path, it was very irritating that http://localhost:4567 wasn't accepted as a valid HTTP url.
  • When accessing and listing things, the singular form of the type is required, i.e. /key/ instead of /keys/. That's different from the directory structure and not obvious at first, we should either user the plural form or add this to the REST documentation. What about '/key' (without the trailing slash)?

I tested the REST backend against a quickly thrown-together REST server in ruby, see https://gist.github.com/d83e9da3ab9364a1d912

Could you remove the aggressive caching commit and rebase against the current master, then I'll merge the PR.

@fd0
Copy link
Member

fd0 commented Sep 13, 2015

Next issue: when the REST server is not reachable, a nil pointer is dereferenced:

./restic -r http://localhost:2323/ backup ~/shared 
debug enabled
parsed url http://localhost:2323/
enter password for repository: 
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x596b5c]

goroutine 1 [running]:
github.com/restic/restic/backend/rest.(*Rest).List(0xc8200a9260, 0x912750, 0x3, 0xc820076300, 0x0)
    /tmp/restic-build-809839120/src/github.com/restic/restic/backend/rest/rest.go:189 +0x37c
github.com/restic/restic/repository.SearchKey(0xc820073960, 0xc820079ad0, 0x3, 0x0, 0x0, 0x0)
    /tmp/restic-build-809839120/src/github.com/restic/restic/repository/key.go:102 +0xe4
github.com/restic/restic/repository.(*Repository).SearchKey(0xc820073960, 0xc820079ad0, 0x3, 0x0, 0x0)
    /tmp/restic-build-809839120/src/github.com/restic/restic/repository/repository.go:559 +0x5f
main.GlobalOptions.OpenRepository(0x7fff8f6def70, 0x16, 0x0, 0x0, 0x0, 0xc820079ad0, 0x3, 0x7f7688531198, 0xc82008e008, 0x7f7688531198, ...)
    /tmp/restic-build-809839120/src/github.com/restic/restic/cmd/restic/global.go:122 +0x344
main.CmdBackup.Execute(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xb62500, 0xc820074480, 0x1, 0x4, ...)
    /tmp/restic-build-809839120/src/github.com/restic/restic/cmd/restic/cmd_backup.go:253 +0x48c
main.(*CmdBackup).Execute(0xc820074380, 0xc820074480, 0x1, 0x4, 0x0, 0x0)
    <autogenerated>:12 +0xc9
github.com/jessevdk/go-flags.(*Parser).ParseArgs(0xc820090870, 0xc820090010, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0)
    /tmp/restic-build-809839120/src/github.com/jessevdk/go-flags/parser.go:278 +0x8e3
github.com/jessevdk/go-flags.(*Parser).Parse(0xc820090870, 0x0, 0x0, 0x0, 0x0, 0x0)
    /tmp/restic-build-809839120/src/github.com/jessevdk/go-flags/parser.go:154 +0x9b
main.main()
    /tmp/restic-build-809839120/src/github.com/restic/restic/cmd/restic/main.go:26 +0x1bf

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/lib/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 20 [syscall]:
os/signal.loop()
    /usr/lib/go/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
    /usr/lib/go/src/os/signal/signal_unix.go:28 +0x37

goroutine 21 [chan receive]:
github.com/restic/restic.init.1.func1.1()
    /tmp/restic-build-809839120/src/github.com/restic/restic/lock.go:262 +0x184
created by github.com/restic/restic.init.1.func1
    /tmp/restic-build-809839120/src/github.com/restic/restic/lock.go:265 +0x2b

goroutine 3 [select, locked to thread]:
runtime.gopark(0x9e8b08, 0xc82002ef28, 0x9139c0, 0x6, 0x44c618, 0x2)
    /usr/lib/go/src/runtime/proc.go:185 +0x163
runtime.selectgoImpl(0xc82002ef28, 0x0, 0x18)
    /usr/lib/go/src/runtime/select.go:392 +0xa64
runtime.selectgo(0xc82002ef28)
    /usr/lib/go/src/runtime/select.go:212 +0x12
runtime.ensureSigM.func1()
    /usr/lib/go/src/runtime/signal1_unix.go:227 +0x353
runtime.goexit()
    /usr/lib/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 22 [chan receive]:
main.CleanupHandler(0xc8200760c0)
    /tmp/restic-build-809839120/src/github.com/restic/restic/cmd/restic/cleanup.go:58 +0x71
created by main.init.1
    /tmp/restic-build-809839120/src/github.com/restic/restic/cmd/restic/cleanup.go:25 +0x12a
Command exited with non-zero status 2

@fd0
Copy link
Member

fd0 commented Sep 13, 2015

And the fuse mount with a REST backend doesn't work (at least with my test server, but I don't see any errors):

ls -al ~/mnt/test/snapshots
ls: cannot access 2015-07-05T11:10:44+02:00: Input/output error
ls: cannot access 2015-09-13T15:46:34+02:00: Input/output error
ls: cannot access 2015-05-14T23:31:09+02:00: Input/output error
ls: cannot access 2015-07-10T22:53:04+02:00: Input/output error
ls: cannot access 2015-07-07T00:18:01+02:00: Input/output error
ls: cannot access 2015-06-30T21:42:28+02:00: Input/output error
ls: cannot access 2015-07-19T12:02:09+02:00: Input/output error
ls: cannot access 2015-07-05T23:38:32+02:00: Input/output error
ls: cannot access 2015-07-12T17:31:55+02:00: Input/output error
ls: cannot access 2015-07-05T12:27:54+02:00: Input/output error
ls: cannot access 2015-07-09T00:01:11+02:00: Input/output error
ls: cannot access 2015-05-14T18:30:54+02:00: Input/output error
ls: cannot access 2015-05-14T18:08:49+02:00: Input/output error
ls: cannot access 2015-07-19T00:19:21+02:00: Input/output error
ls: cannot access 2015-07-10T22:44:36+02:00: Input/output error
ls: cannot access 2015-07-13T23:54:27+02:00: Input/output error
ls: cannot access 2015-07-12T22:28:06+02:00: Input/output error
ls: cannot access 2015-07-10T22:55:54+02:00: Input/output error
total 0
d????????? ? ? ? ?            ? 2015-05-14T18:08:49+02:00
d????????? ? ? ? ?            ? 2015-05-14T18:30:54+02:00
d????????? ? ? ? ?            ? 2015-05-14T23:31:09+02:00
d????????? ? ? ? ?            ? 2015-06-30T21:42:28+02:00
d????????? ? ? ? ?            ? 2015-07-05T11:10:44+02:00
d????????? ? ? ? ?            ? 2015-07-05T12:27:54+02:00
d????????? ? ? ? ?            ? 2015-07-05T23:38:32+02:00
d????????? ? ? ? ?            ? 2015-07-07T00:18:01+02:00
d????????? ? ? ? ?            ? 2015-07-09T00:01:11+02:00
d????????? ? ? ? ?            ? 2015-07-10T22:44:36+02:00
d????????? ? ? ? ?            ? 2015-07-10T22:53:04+02:00
d????????? ? ? ? ?            ? 2015-07-10T22:55:54+02:00
d????????? ? ? ? ?            ? 2015-07-12T17:31:55+02:00
d????????? ? ? ? ?            ? 2015-07-12T22:28:06+02:00
d????????? ? ? ? ?            ? 2015-07-13T23:54:27+02:00
d????????? ? ? ? ?            ? 2015-07-19T00:19:21+02:00
d????????? ? ? ? ?            ? 2015-07-19T12:02:09+02:00
d????????? ? ? ? ?            ? 2015-09-13T15:46:34+02:00

@bchapuis
Copy link
Contributor Author

Thank you for the feedbacks, I will be very busy during the next days but I will try to solve these issues one evening by the end of the week.

@fd0
Copy link
Member

fd0 commented Sep 14, 2015

Sure, no need to rush anything. Thanks for your contribution!

@bchapuis
Copy link
Contributor Author

Regarding the singular form I mainly used it to avoid the mapping between the directories and the types. The same approach is adopted by the S3 backend. But it's not an issue to change this behavior, what do you think?

Regarding the trailing slashes, it seems common to use them to distinguish a group of resources from a specific resource.

The fuse mount works with my server implementation but I will check what happening with in your case.

@fw42
Copy link
Member

fw42 commented Sep 15, 2015

👍 to using plural, that's more common

@fd0
Copy link
Member

fd0 commented Sep 16, 2015

Hm, changing the behavior of the s3 backend should be backwards-compatible. I'd suggest leaving the s3 backend alone for now, but still use the plural form for the REST backend.

@bchapuis
Copy link
Contributor Author

I made some changes that address all the comments. Regarding fuse, it seems to work with the go server but I haven't been able to identify what goes wrong with the ruby server. I also removed completely the dependency to gorilla mux.

@fd0
Copy link
Member

fd0 commented Sep 22, 2015

Thanks a lot!

Just a quick heads-up: I'm up to my neck in other work, I don't have the time to look at it right now ;)

@fd0 fd0 mentioned this pull request Feb 21, 2016
1 task
@fd0
Copy link
Member

fd0 commented Feb 21, 2016

I've started porting the REST backend to the current code, this PR is therefore superseded by #464

@fd0 fd0 closed this Feb 21, 2016
fd0 added a commit that referenced this pull request Feb 21, 2016
This is a port of the original work by @bchapuis in
#253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants