-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce prototype for upload API, based on S3 #5
Conversation
@nathanegillett This isn't ready for proper review yet, I need other PRs to be landed first. But I pushed it so you can have a look in case you're also prototyping APIs. However, the way things are done here in some cases isn't a great example, because it's trying to make an API compatible with S3 which doesn't follow the preferred conventions of FastAPI. |
Manually published docs at https://rohanpm.github.io/exodus-gw/ so they can be reviewed. |
@rohanpm In order to make exdus-gw acess localstack(for integration test), I did some changes like this.
It works with the |
This API can be used to upload objects. Rather than inventing our own API for this, let's use a subset of the true S3 API. This allows usage of existing clients (such as boto) to upload content through exodus-gw. The examples/s3-upload script can be used to demonstrate this. This implementation is prototype-level and comes with a TODO list. Some major missing pieces are: - performance testing to verify that this doesn't perform notably worse than using S3 directly - a proper way to handle configuration (it currently allows the boto library to use whatever config is discovered by default, meaning only a single environment is supported) This commit also experiments a bit more on how to document these APIs. There is now a separate API user guide and reference, as it's not clear if there's a decent way of including general usage info into the redoc-hosted reference.
Previously, all request bodies were awaited in full before starting any requests to S3. That's unnecessarily slow and also means that all data from the user's request has to be held in memory. Fix it up so that requests can be streamed properly (which requires a bit of a tweak to make botocore and aiobotocore work together properly). Note that this does require the caller always to provide the Content-MD5 and Content-Length headers. The higher-level boto methods such as upload_file already do this by default, so this shouldn't be cumbersome (and we probably would have required Content-MD5 by policy anyway). In my test environment with 20MB chunk size and 10 threads, this reduced the typical overhead of using exodus-gw for uploads from about 10-20% to essentially zero.
To assist with testing, make it possible to override the default S3 endpoint URL by setting EXODUS_GW_S3_ENDPOINT_URL. It can be used to (e.g.) test against localstack. Note that, at some point, we expect to support multiple target environments and have some method of managing their configuration; most likely this env var will be dropped at that time.
Added two commits just now:
OK to merge this? I think it'd be good to get something in here that others can start building on, even if the API is later expected to change. |
@rohanpm This is a large PR in the sense of lots of code changes. Is it possible to break it to little atomic PRs? |
In general I'm very much in support of that and I do tend to make small PRs with focused goals, but in a case like this - early development of a new project - I'm struggling to find an efficient workflow for it. Maybe I'm missing some option or tool in github, but I don't see any way to make one pull request depend on another. If I consider this PR as an example then, a way that I could have split it up is:
The problem is the dependencies between them... (2) arguably depends on (1), (3) and (4) depend on (2). A simple way to fix this at the cost of time is to wait for review of each PR to be completed before asking for review on the next one... but that's extremely inefficient. From my point of view, every review is at least overnight (or more if it doesn't pass on the first attempt). That means making one bigger PR has a chance of getting merged in a day or two, but splitting it into 5 smaller dependent PRs would need at least a full week to get them all merged even if everything goes perfectly. I'm interested in solutions to this problem (and not only for this project), so if you have some experience or suggestions on how to manage dependencies between PRs please let me know. |
@rohanpm You have raised great discussion points. Let's continue the discussion over an email thread or even better, a meeting. For this PR specifically, I do not have the tools (yet) to provide an officiant code review. |
This API can be used to upload objects.
Rather than inventing our own API for this, let's use a subset of
the true S3 API. This allows usage of existing clients (such as
boto) to upload content through exodus-gw. The examples/s3-upload
script can be used to demonstrate this.
This implementation is prototype-level and comes with a TODO list.
Some major missing pieces are:
worse than using S3 directly
library to use whatever config is discovered by default, meaning
only a single environment is supported)
This commit also experiments a bit more on how to document these
APIs. There is now a separate API user guide and reference, as it's
not clear if there's a decent way of including general usage info
into the redoc-hosted reference.