Refactor API client #779

technoweenie · 2015-10-23T16:42:47Z

There's a lot of cruft in the lfs package for dealing with both LFS APIs. This can make it difficult to follow the code path, and really difficult to add support for alternate protocols. This is what I'd like to see:

Simple go interface with methods like DownloadObjects() and UploadObjects().
Public, exported package at github.com/github/git-lfs/client.
Consistent usage in git lfs fetch, git lfs pull, and git lfs smudge.

The text was updated successfully, but these errors were encountered:

sinbad · 2015-10-27T10:32:28Z

I've previously (#378) had a go at abstracting the APIs to add SSH as usable for a complete API round-trip. That needs completely revisiting now but were you happy with the general principles? If so I could take this on at the same time as revisiting SSH.

technoweenie · 2015-10-27T16:50:59Z

I think your interface is too big. Of course, this was before the batch API was finalized and all that. But I'd like to see something like:

type ApiContext interface {
    // Get the endpoint this context was constructed from
    Endpoint() Endpoint
    // Close the context & any resources it's using
    Close() error

    Upload(objects []*ObjectResource) ([]*ObjectResource, *WrappedError)
    Download(objects []*ObjectResource) ([]*ObjectResource, *WrappedError)
}

I don't know if we'll need *Check too. Ideally all of the commands just use this high level interface. I think it'll be nice to get all the http client and batching code out of this catch-all lfs package.

sinbad · 2015-10-28T14:39:01Z

Yeah, most of that is because of the previous need to support individual objects as well as batches. Whether we need *Check depends on whether this interface is supposed to complete the upload/download or just return resources through which the actual upload/download could subsequently be performed - which is what your interface suggests (it's akin to batch in my previous one).

That would be fine, but I think there needs to be the potential to abstract the actual upload and download as well, so they don't necessarily have to be across HTTP (but can be - so you can have an SSH API that gives back HTTP upload/download links or ones which use the SSH pipe). That suggests an interface like this:

type ApiContext interface {
    // Get the endpoint this context was constructed from
    Endpoint() Endpoint
    // Close the context & any resources it's using
    Close() error

    // Perform API request for upload/download & return resources / errors related to that batch
    RequestUpload(pointers []*Pointer) ([]*ObjectResource, *WrappedError)
    RequestDownload(pointers []*Pointer) ([]*ObjectResource, *WrappedError)
    // Perform the actual upload/download - async completion
    Upload(objects []*ObjectResource, c chan *TransferResult) (error)
    Download(objects []*ObjectResource, c chan *TransferResult) (error)
}

sinbad · 2015-10-30T11:01:48Z

Scratch that, I think we need 2 separate interfaces, one for the API and one for the content operations (actual upload/downloads). A package (say core has one for HTTP, one for SSH initially) can provide both implementations if they want. Also it should be perfectly acceptable to call the API over SSH and download over HTTP or vice versa.

I'd still like to call the API functions RequestUpload and RequestDownload rather than Upload/Download for clarity that it's not performing the actual transfer.

I think in both cases the implementation should be derived from the protocol of the link - for the API interface that's from the LFS URL, and for the content provider that's from the resource URLs. So the SSH API could return HTTP links for example, and custom protocols could be used to extend to other systems if required later (just like in browsers).

technoweenie · 2015-10-30T13:52:44Z

Scratch that, I think we need 2 separate interfaces, one for the API and one for the content operations (actual upload/downloads).

Do we really need the second interface? All the commands should care about is whether their objects were uploaded or downloaded. It shouldn't matter to them if that requires a single batch API request and individual content requests, multiple (legacy) API and content requests, or a single SSH connection.

I suppose one benefit of this is that an SSH API adapter could take advantage of the existing batch transfer code for HTTP downloads. Is that what you're thinking?

sinbad · 2015-10-30T14:13:25Z

Do we really need the second interface? All the commands should care about is whether their objects were uploaded or downloaded.

2 reasons - firstly there are cases where you need to know if an object is on the remote, without actually downloading it, for example:

push - to determine if it's ok that objects referred to in the graph are missing locally
prune - to confirm objects really are on the remote before deleting

Secondly, the API and content transfer are orthogonal sets of operations. Just because you're using SSH to access the API doesn't mean you have to use it for the content; you could be using SSH to authenticate with your Git repo but S3 for content storage. Or HTTP API and SCP to some custom content host. There's no reason to make an implementation of the interface implement the NxM matrix of all combinations of API serving and upload/download route.

Separating the API & content transfer interfaces achieves both with one approach.

technoweenie · 2015-10-30T16:19:26Z

firstly there are cases where you need to know if an object is on the remote, without actually downloading it

This seems more like a case for RequestUpload() and RequestDownload() on ApiContext. And IIRC, both of those only check the LFS api and not the content api (s3).

Secondly, the API and content transfer are orthogonal sets of operations. Just because you're using SSH to access the API doesn't mean you have to use it for the content; you could be using SSH to authenticate with your Git repo but S3 for content storage. Or HTTP API and SCP to some custom content host. There's no reason to make an implementation of the interface implement the NxM matrix of all combinations of API serving and upload/download route.

The HTTP and SSH API objects can share code internally. but I don't think any of this should be exposed to the commands package. There should be no extra configuration of an LFS server beyond its endpoint if we can avoid it.

I do like the idea of having a separate API context object for each major Git remote type: HTTPS, SSH, maybe even file://. This means we'll be able to remove the current SSH workaround, and that the SSH adapter will have to support git-lfs-authenticate AND whatever the new command is. Also, all of this is out of scope for this specific issue, which is concerned with defining an interface and moving that code to a separate go package. After that's done, you can start experimenting with a better SSH adapter. If we really need content storage API interfaces, they could be internal to this extracted go package.

sinbad · 2015-11-02T10:26:44Z

This seems more like a case for RequestUpload() and RequestDownload() on ApiContext. And IIRC, both of those only check the LFS api and not the content api (s3).

Yep, which is why I think they're a separate interface. In practice the server side of the API may be checking the content store unless it caches info but the client only talks to the API.

The HTTP and SSH API objects can share code internally. but I don't think any of this should be exposed to the commands package. There should be no extra configuration of an LFS server beyond its endpoint if we can avoid it.

Sorry if I wasn't clear on that, but yeah I had no intention of the split being visible to the code in the commands package. This is why I was saying that the code path should be picked entirely based on the URLs - the LFS API URL mainly but also the content URL which may use a separate protocol - but that switchover would be invisible to commands when performing a full download / upload.

What I meant was that the abstraction of the protocols should be separate between the LFS API and the content transfers. The choice of implementation would always need to be fronted anyway by utilities which picked the correct path to use, but internally those 2 interfaces, one for the LFS API and one for the content transfers, would be a bit more flexible. Without exposing any of that to the regular commands.

Also, all of this is out of scope for this specific issue, which is concerned with defining an interface and moving that code to a separate go package.

Yeah, multi-stage approach makes sense, simply moving existing code mostly as-is first then refactoring as a second stage. I think we were talking about 2 different things, you about the facade that commands would see and I about future internals behind that.

In that case, I think the initial step is a client package with just a set of top-level functions (not an interface, because this level won't need to be overridden). Behind those top-level functions can be the ApiContext style approaches, and eventually that split between API and content. I don't think the top-level functions can be quite as simple as just DownloadObjects() with a return collection, since some commands do want to receive results as a channel in order to do useful work on the fly. So maybe a channel-based function as the primary implementation, and a convenience wrapper which returns collections of results if progressive results aren't needed.

technoweenie · 2015-11-02T17:44:51Z

@sinbad: Sounds good. I didn't think about the client being a concrete object, and only using LFS API/Content API interfaces internally.

So maybe a channel-based function as the primary implementation, and a convenience wrapper which returns collections of results if progressive results aren't needed.

👍

ttaylorr · 2017-08-04T03:34:49Z

See: #1839.

technoweenie added the roadmap label Oct 23, 2015

technoweenie mentioned this issue Oct 23, 2015

Git LFS v1.1 #755

Closed

13 tasks

technoweenie mentioned this issue Oct 27, 2015

teach 'git lfs smudge' to send the expected size to the LFS api #795

Merged

technoweenie mentioned this issue Nov 3, 2015

Public go packages for LFS management #813

Closed

This was referenced Nov 19, 2015

Git LFS v1.2 #844

Closed

Add metadata headers when uploading blobs #850

Closed

Export ObjectResource #863

Merged

technoweenie mentioned this issue Mar 11, 2016

extract common pre-push and push uploading code #1071

Merged

technoweenie mentioned this issue May 17, 2016

Package refactor no.1 #1226

Merged

ttaylorr closed this as completed Aug 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor API client #779

Refactor API client #779

technoweenie commented Oct 23, 2015

sinbad commented Oct 27, 2015

technoweenie commented Oct 27, 2015

sinbad commented Oct 28, 2015

sinbad commented Oct 30, 2015

technoweenie commented Oct 30, 2015

sinbad commented Oct 30, 2015

technoweenie commented Oct 30, 2015

sinbad commented Nov 2, 2015

technoweenie commented Nov 2, 2015

ttaylorr commented Aug 4, 2017

Refactor API client #779

Refactor API client #779

Comments

technoweenie commented Oct 23, 2015

sinbad commented Oct 27, 2015

technoweenie commented Oct 27, 2015

sinbad commented Oct 28, 2015

sinbad commented Oct 30, 2015

technoweenie commented Oct 30, 2015

sinbad commented Oct 30, 2015

technoweenie commented Oct 30, 2015

sinbad commented Nov 2, 2015

technoweenie commented Nov 2, 2015

ttaylorr commented Aug 4, 2017