Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor API client #779

Closed
technoweenie opened this issue Oct 23, 2015 · 10 comments
Closed

Refactor API client #779

technoweenie opened this issue Oct 23, 2015 · 10 comments
Labels

Comments

@technoweenie
Copy link
Contributor

There's a lot of cruft in the lfs package for dealing with both LFS APIs. This can make it difficult to follow the code path, and really difficult to add support for alternate protocols. This is what I'd like to see:

  • Simple go interface with methods like DownloadObjects() and UploadObjects().
  • Public, exported package at github.com/github/git-lfs/client.
  • Consistent usage in git lfs fetch, git lfs pull, and git lfs smudge.
@technoweenie technoweenie mentioned this issue Oct 23, 2015
13 tasks
@sinbad
Copy link
Contributor

sinbad commented Oct 27, 2015

I've previously (#378) had a go at abstracting the APIs to add SSH as usable for a complete API round-trip. That needs completely revisiting now but were you happy with the general principles? If so I could take this on at the same time as revisiting SSH.

@technoweenie
Copy link
Contributor Author

I think your interface is too big. Of course, this was before the batch API was finalized and all that. But I'd like to see something like:

type ApiContext interface {
    // Get the endpoint this context was constructed from
    Endpoint() Endpoint
    // Close the context & any resources it's using
    Close() error

    Upload(objects []*ObjectResource) ([]*ObjectResource, *WrappedError)
    Download(objects []*ObjectResource) ([]*ObjectResource, *WrappedError)
}

I don't know if we'll need *Check too. Ideally all of the commands just use this high level interface. I think it'll be nice to get all the http client and batching code out of this catch-all lfs package.

@sinbad
Copy link
Contributor

sinbad commented Oct 28, 2015

Yeah, most of that is because of the previous need to support individual objects as well as batches. Whether we need *Check depends on whether this interface is supposed to complete the upload/download or just return resources through which the actual upload/download could subsequently be performed - which is what your interface suggests (it's akin to batch in my previous one).

That would be fine, but I think there needs to be the potential to abstract the actual upload and download as well, so they don't necessarily have to be across HTTP (but can be - so you can have an SSH API that gives back HTTP upload/download links or ones which use the SSH pipe). That suggests an interface like this:

type ApiContext interface {
    // Get the endpoint this context was constructed from
    Endpoint() Endpoint
    // Close the context & any resources it's using
    Close() error

    // Perform API request for upload/download & return resources / errors related to that batch
    RequestUpload(pointers []*Pointer) ([]*ObjectResource, *WrappedError)
    RequestDownload(pointers []*Pointer) ([]*ObjectResource, *WrappedError)
    // Perform the actual upload/download - async completion
    Upload(objects []*ObjectResource, c chan *TransferResult) (error)
    Download(objects []*ObjectResource, c chan *TransferResult) (error)
}

@sinbad
Copy link
Contributor

sinbad commented Oct 30, 2015

Scratch that, I think we need 2 separate interfaces, one for the API and one for the content operations (actual upload/downloads). A package (say core has one for HTTP, one for SSH initially) can provide both implementations if they want. Also it should be perfectly acceptable to call the API over SSH and download over HTTP or vice versa.

I'd still like to call the API functions RequestUpload and RequestDownload rather than Upload/Download for clarity that it's not performing the actual transfer.

I think in both cases the implementation should be derived from the protocol of the link - for the API interface that's from the LFS URL, and for the content provider that's from the resource URLs. So the SSH API could return HTTP links for example, and custom protocols could be used to extend to other systems if required later (just like in browsers).

@technoweenie
Copy link
Contributor Author

Scratch that, I think we need 2 separate interfaces, one for the API and one for the content operations (actual upload/downloads).

Do we really need the second interface? All the commands should care about is whether their objects were uploaded or downloaded. It shouldn't matter to them if that requires a single batch API request and individual content requests, multiple (legacy) API and content requests, or a single SSH connection.

I suppose one benefit of this is that an SSH API adapter could take advantage of the existing batch transfer code for HTTP downloads. Is that what you're thinking?

@sinbad
Copy link
Contributor

sinbad commented Oct 30, 2015

Do we really need the second interface? All the commands should care about is whether their objects were uploaded or downloaded.

2 reasons - firstly there are cases where you need to know if an object is on the remote, without actually downloading it, for example:

  • push - to determine if it's ok that objects referred to in the graph are missing locally
  • prune - to confirm objects really are on the remote before deleting

Secondly, the API and content transfer are orthogonal sets of operations. Just because you're using SSH to access the API doesn't mean you have to use it for the content; you could be using SSH to authenticate with your Git repo but S3 for content storage. Or HTTP API and SCP to some custom content host. There's no reason to make an implementation of the interface implement the NxM matrix of all combinations of API serving and upload/download route.

Separating the API & content transfer interfaces achieves both with one approach.

@technoweenie
Copy link
Contributor Author

firstly there are cases where you need to know if an object is on the remote, without actually downloading it

This seems more like a case for RequestUpload() and RequestDownload() on ApiContext. And IIRC, both of those only check the LFS api and not the content api (s3).

Secondly, the API and content transfer are orthogonal sets of operations. Just because you're using SSH to access the API doesn't mean you have to use it for the content; you could be using SSH to authenticate with your Git repo but S3 for content storage. Or HTTP API and SCP to some custom content host. There's no reason to make an implementation of the interface implement the NxM matrix of all combinations of API serving and upload/download route.

The HTTP and SSH API objects can share code internally. but I don't think any of this should be exposed to the commands package. There should be no extra configuration of an LFS server beyond its endpoint if we can avoid it.

I do like the idea of having a separate API context object for each major Git remote type: HTTPS, SSH, maybe even file://. This means we'll be able to remove the current SSH workaround, and that the SSH adapter will have to support git-lfs-authenticate AND whatever the new command is. Also, all of this is out of scope for this specific issue, which is concerned with defining an interface and moving that code to a separate go package. After that's done, you can start experimenting with a better SSH adapter. If we really need content storage API interfaces, they could be internal to this extracted go package.

@sinbad
Copy link
Contributor

sinbad commented Nov 2, 2015

This seems more like a case for RequestUpload() and RequestDownload() on ApiContext. And IIRC, both of those only check the LFS api and not the content api (s3).

Yep, which is why I think they're a separate interface. In practice the server side of the API may be checking the content store unless it caches info but the client only talks to the API.

The HTTP and SSH API objects can share code internally. but I don't think any of this should be exposed to the commands package. There should be no extra configuration of an LFS server beyond its endpoint if we can avoid it.

Sorry if I wasn't clear on that, but yeah I had no intention of the split being visible to the code in the commands package. This is why I was saying that the code path should be picked entirely based on the URLs - the LFS API URL mainly but also the content URL which may use a separate protocol - but that switchover would be invisible to commands when performing a full download / upload.

What I meant was that the abstraction of the protocols should be separate between the LFS API and the content transfers. The choice of implementation would always need to be fronted anyway by utilities which picked the correct path to use, but internally those 2 interfaces, one for the LFS API and one for the content transfers, would be a bit more flexible. Without exposing any of that to the regular commands.

Also, all of this is out of scope for this specific issue, which is concerned with defining an interface and moving that code to a separate go package.

Yeah, multi-stage approach makes sense, simply moving existing code mostly as-is first then refactoring as a second stage. I think we were talking about 2 different things, you about the facade that commands would see and I about future internals behind that.

In that case, I think the initial step is a client package with just a set of top-level functions (not an interface, because this level won't need to be overridden). Behind those top-level functions can be the ApiContext style approaches, and eventually that split between API and content. I don't think the top-level functions can be quite as simple as just DownloadObjects() with a return collection, since some commands do want to receive results as a channel in order to do useful work on the fly. So maybe a channel-based function as the primary implementation, and a convenience wrapper which returns collections of results if progressive results aren't needed.

@technoweenie
Copy link
Contributor Author

@sinbad: Sounds good. I didn't think about the client being a concrete object, and only using LFS API/Content API interfaces internally.

So maybe a channel-based function as the primary implementation, and a convenience wrapper which returns collections of results if progressive results aren't needed.

👍

@ttaylorr
Copy link
Contributor

ttaylorr commented Aug 4, 2017

See: #1839.

@ttaylorr ttaylorr closed this as completed Aug 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants