Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
197 lines (159 sloc) 8.09 KB

PEP: 3153 Title: Asynchronous IO support Version: $Revision$ Last-Modified: $Date$ Author: Laurens Van Houtven <_@lvh.cc> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-May-2011 Post-History: TBD

Abstract

This PEP describes protocol and transport abstraction for the Python standard library.

The goal is to reach an abstraction that can be implemented by many different IO backends, both synchronous and asynchrous, and provides a target for library developers to write code portable between those different backends.

Rationale

The current state of the art for asynchronous IO libraries lies squarely outside the Python standard library. While each of these libraries has their own merit, there is a lack of portability between them, resulting in a lot of duplicated effort.

The immediate goal is to specify a protocol abstraction, with some basic tools for creating real protocols. This will hopefully become a widely supported standard, so that many different libraries with very different internals can use the same protocol implementations.

An eventual added goal would be for standard library implementations of wire and network protocols to evolve towards being real protocol implementations, as opposed to standalone libraries that do everything including calling recv() blockingly. This means they could be easily reused for both synchronous and asynchronous code.

Communication abstractions

Transports

Transports provide a uniform API for reading bytes from and writing bytes to different kinds of connections. Transports in this PEP are always ordered, reliable, bidirectional, stream-oriented two-endpoint connections. This might be a TCP socket, an SSL connection, a pipe (named or otherwise), a serial port... It may abstract a file descriptor on POSIX platforms or a Handle on Windows or some other data structure appropriate to a particular platform. It encapsulates all of the particular implementation details of using that platform data structure and presents a uniform interface for application developers.

Transports talk to two things: the other side of the connection on one hand, and a protocol on the other. It's a bridge between the specific underlying transfer mechanism and the protocol. Its job can be described as allowing the protocol to just send and receive bytes, taking care of all of the magic that needs to happen to those bytes to be eventually sent across the wire.

The primary feature of a transport is sending bytes to a protocol and receiving bytes from the underlying protocol. Writing to the transport is done using the write and write_sequence methods. The latter method is a performance optimization, to allow software to take advantage of specific capabilities in some transport mechanisms. Specifically, this allows transports to use writev instead of write or send, also known as scatter/gather IO.

A transport can be paused and resumed. A paused transport buffers all input and output it gets, rather than sending them to their proper places. When resumed, it will flush the buffers. Whether this will cause the same amount of data_received calls to the protocol as if it was never paused is implementation-specific and not guaranteed.

A transport can also be closed, half-closed and aborted. A closed transport will finish writing all of the data queued in it to the underlying mechanism, and will then stop reading or writing data. Aborting a transport stops it, closing the connection without sending any data that is still queued.

Further writes will result in exceptions being thrown. A half-closed transport may not be written to anymore, but will still accept incoming data.

Protocols

Protocols are exactly what you'd expect them to be: HTTP, IRC, SMTP... are all examples of something that would be implemented in a protocol. While they usually have some form of "socket" for a transport, it may sometimes be useful to use a separate transport: maybe a file for unit testing purposes, maybe a serial port for interfacing with legacy systems.

The shortest useful definition of a protocol is a (usually two-way) bridge between the transport and the rest of the application logic. A protocol will receive bytes from a transport and translates that information into some behavior, typically resulting in some method calls on an object. Similarly, application logic calls some methods on the protocol, which the protocol translates into bytes and communicates to the transport.

If a protocol wants to send data to the other side, it will call write on its transport attribute.

When a transport receives data, it calls the data_received method on the protocol. The protocol cannot request new data, it has to wait until it has new data. In practice, this means that applications cannot read from a socket and then process the data: they always have to process the data as it comes in.

Composability example: line-based protocols

One of the simplest protocols is a line-based protocol, where data is delimited by \r\n. The protocol will receive bytes from the transport and buffer them until there is at least one complete line. Once that's done, it will pass this line along to some object. Ideally that would be accomplished using a callable or even a completely separate object composed by the protocol, but it could also be implemented by subclassing (as is the case with Twisted's LineReceiver). For the other direction, the protocol could have a write_line method, which adds the required \r\n and passes the new bytes buffer on to the transport.

This PEP suggests a generalized LineReceiver called ChunkProtocol, where a "chunk" is a message in a stream, delimited by the specified delimiter. Instances take a delimiter and a callable that will be called with a chunk of data once it's received (as opposed to Twisted's subclassing behavior). ChunkProtocol also has a write_chunk method analogous to the write_line method described above.

Why separate protocols and transports?

This separation between protocol and transport often confuses people who first come across it. In fact, the standard library itself does not make this distinction in many cases, particularly not in the API it provides to users.

It is nonetheless a very useful distinction. In the worst case, it simplifies the implementation by clear separation of concerns. However, it often serves the far more useful purpose of being able to reuse protocols across different transports.

Consider a simple RPC protocol. The same bytes may be transferred across many different transports, for example pipes or sockets. To help with this, we separate the protocol out from the transport. The protocol just makes sense of specific bytes, and doesn't really care what mechanism is used to eventually transfer bytes.

This also allows for protocols to be stacked or nested easily, allowing for even more code reuse. A common example of this is JSON-RPC: according to the specification, it can be used across both sockets and HTTP[#jsonrpc]_ . In practice, it tends to be primarily encapsulated in HTTP. The protocol-transport abstraction allows us to build a stack of protocols and transports that allow you to use HTTP as if it were a transport. For JSON-RPC, that might get you a stack somewhat like this:

  1. TCP socket transport
  2. HTTP protocol
  3. HTTP-based transport
  4. JSON-RPC protocol
  5. Application code

References

[1]Sections 2.1 and 2.2 .

Copyright

This document has been placed in the public domain.

Jump to Line
Something went wrong with that request. Please try again.