Skip to content
This repository has been archived by the owner on Apr 14, 2022. It is now read-only.

[API design] Streaming upload API: push-style or pull-style? #199

Open
njsmith opened this issue Jan 28, 2020 · 3 comments
Open

[API design] Streaming upload API: push-style or pull-style? #199

njsmith opened this issue Jan 28, 2020 · 3 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jan 28, 2020

[This is partially discussed in #124, but that's a huge omnibus issue so I'm pulling it out into a more focused one. This is also capturing some notes from a voice chat @sethmlarson and I had last week.]

Say you're uploading some data as a request body, and you want to generate it on the fly. There are two basic API approaches that hip could offer. In a "push" style API, the user pushes us data as it becomes available:

request_object = await hip.start_request(...)
await request_object.send_all(data1)
await request_object.send_all(data2)
response = await request_object.send_eof()

In a "pull" style API, the user provides an (async) iterable, and hip pulls data out of it:

async def generate_body_parts():
    yield data1
    yield data2
await hip.post(..., body=generate_body_parts())

These are kind of duals of each other. Generally, Trio's style prefers the "push" approach, for two reasons: First, keeping the loop in user code makes the control flow is more explicit and transparent, so I think it's easier to reason about. And second, if you have a push-style API, then it's easy to implement a pull-style API, but going the other direction is much more awkward:

# Pull on top of push is just a for loop:
async for data in generate_body_parts():
    await request_object.send_all(data)

# Push on top of pull needs more complicated machinery:
send_side, receive_side = trio.open_memory_channel()
async with trio.open_nursery() as nursery:
    nursery.start_soon(partial(hip.post, ..., body=receive_side))
    await send_side.send(data1)
    await send_side.send(data2)
    # FIXME: also have to retrieve the return value from hip.post somehow

Push-style APIs are also much more natural if we want to support bidirectional HTTP, i.e. letting people send a bit of the request, then read a bit of the response, then send a bit more of the request, etc. (I'm not sure we do want to do this, but still, it's a trade-off to consider.)

However, for hip's purposes, the push-style approach has some significant downsides:

  • For HTTP uploads specifically, it's very common to already have some kind of object that represents the body (a byte-string, some JSON, an open file, an async generator, etc.), and it's natural to want to just pass this object as a simple kwarg like body=. We probably don't want to make people explicitly write a for loop every time. So, if we supported a push-style API, then we'd also end up adding a pull-style option as sugar on top of it, and then we have two APIs.

  • Push-style APIs expose a lot more states in the API. With a classic requests.get API, you call the function, and it doesn't return until at least the whole request body has been delivered and the response headers have been received. With a push-style API, I guess we'd need to... first return an object that represents the request-in-progress, and then the user would push data through it for a while, and then when they mark the request body complete then that would need to return a second object that has the response headers etc.? And what happens if the server starts responding early? This makes the API much more complex to design and use.

    • In particular, the "early response from server" case is important for bidi HTTP: for normal standard-conformant HTTP, if the server does an early response then you want to hard-abort sending the request body. The whole point of bidi HTTP though is that you don't do this. So even if we did have a push-style API, that wouldn't be enough on its own to support bidi HTTP; we'd still need some extra API surface area beyond that.
  • Push-style APIs are even more awkward if you want to handle redirects and retries, where you might have to rewind the body and re-send it. So now we need some way for hip to tell the user that they've been redirected and need to start over on pushing the body, etc.

Putting this all together, I think the conclusion is that we should follow requests instead of Trio, and expose a pull-style API instead of a push-style API.

If we do decide that bidi HTTP is an important use case (e.g. so grpc clients can use hip), then another possibility would be to make the API look similar to an Upgrade/CONNECT. For Upgrade/CONNECT, I imagine that our API will involve sending a request, getting a response object, and then if it succeeded calling some method on the response object to get a bidi byte stream for the new protocol to use. For bidi HTTP, we could use a similar API pattern, where during the request you set some special flag that says "I want to do bidi HTTP", and then we use some special internal processing that doesn't send any request body immediately, but then lets you use the pull-out-a-byte-stream method to get a byte stream referring to the request and response bodies. I think that will end up being much more natural.

@sethmlarson
Copy link
Contributor

I'm in complete agreement with the conclusion.

I wonder if there's a way we could include a push-based approach for a lower-level API, something that we've talked about a couple times about two layers of API: one that's basically ahip.request(...) (and all config objects like ahip.Retries()) for the 95% use-case and another for basically if grpc or the like want to use Hip. That's a far-future feature though.

@njsmith
Copy link
Member Author

njsmith commented Jan 29, 2020

Yeah, it's possible we'll want to make different decisions in a lower-level API. I'm not 100% sure it will make sense though. A low-level API doesn't have to be as comfortable to use or support retries, so those downsides don't apply. But we'd still need to figure out how to deal with the state complexity, and need some special-case mechanism to avoid aborting the request body when we get an early response.

We'll also need to double-check what grpc actually needs before committing to anything... my clever idea to make bidi HTTP look like an Upgrade/CONNECT is clever and all, but it still assumes that bidi HTTP users are OK with waiting for the response headers before they switch into bidi mode, and I'm actually not 100% sure whether that's true of grpc or not. (@standy66, do you know?)

Also, grpc only cares about bidi support over HTTP/2, and with HTTP/2 we don't need to abort the response body on early responses, so that whole issue goes away...

@agronholm
Copy link
Contributor

The way I thought about this was like how websockets are usually implemented. You write stuff to the request, read responses then. If you want to alternate between the two (extreme case: multipart request and response), you can. The server responding early is not a problem as I see it. If the user doesn't stop sending data after an early response, we can raise an exception when they try to send more.

As this implies, I strongly favor push-style over pull-style.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants