Skip to content
This repository was archived by the owner on Jan 13, 2021. It is now read-only.
This repository was archived by the owner on Jan 13, 2021. It is now read-only.

Better 'concurrency'. #65

@Lukasa

Description

@Lukasa

Right now hyper doesn't read frames until the user attempts to read data or our connection window closes. This is obviously a problem:

  1. We will send all the data on a stream before we get a chance to read the RST_STREAM frame that the server sent after our HEADERS frame. This kind of behaviour will likely cause a server to kill our connection.
  2. More generally, we don't find out about changes in connection state until a read event. This is a bit troubling: we'll be slow to respond to SETTINGS with ACKs, for example.
  3. We allow clients to put themselves into a position where they can shoot themselves in the foot, e.g. by sending large numbers of requests with small (or no) bodies before reading anything. In this case, hyper just won't find out at all.
  4. Our TCP buffer will fill and push back on the TCP connection, hurting throughput unnecessarily. HTTP/2 flow control exists to avoid this problem.

We need some solutions to this. Proposals:

  1. Have a separate thread that reads from the socket to handle grabbing control frames, one per connection. This will work. Downsides: should libraries launch their own threads? Also, this will cause problems if the main thread is heavily CPU loaded. There's also a perf overhead for threads in Python.
  2. Have a method that will read N frames off the connection each time we send a frame. This method avoids the problem of having concurrency. Problem: if N is too small we have the same problem we have now, if N is too large we can severely slow down send-ing operations.
  3. A variation on (1), have those threads be explicitly launched by users. At least that way we can blame them for all the bad stuff.
  4. Have a design that puts HTTP/2 connections into separate processes and uses queues to send requests/responses between them. There's overhead here (SO MUCH MEMORY COPYING), and we can't force it to be used. Downsides: COPYING, require all users to launch extra processes?
  5. Accept that this is terrible, have a port for asyncio and basically tell people that HTTP in non-async is a bad idea.

Any other ideas? /cc @shazow, @sigmavirus24, @alekstorm, I need your wealth of experience. If you know anyone else with good ideas in this area I'd love to hear it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions