CPS transform for Python
Python
Latest commit b3a3e2d Nov 9, 2012 @samrushing k == return

README.rst

CPS (Continuation-Passing Style) transform for Python.

motivation

Event-driven systems like asyncore and Twisted must arrange for callbacks to be called in response to I/O. This style of programming is difficult to write, difficult to read, and makes combining multiple components together almost impossible. This module provides an automated CPS transform that essentially converts 'normal' code into 'callback' or event-driven code.

requirements

Since the transform relies on the PEP 3104 'nonlocal' extension, the code is written for and generates Python 3.

programming

The transformer currently performs a source->source conversion, i.e. it takes normal python code and emits transformed python code. When traversing a body of code, it needs to identify functions that need converting, and it needs to identify function calls that can be expected to follow the CPS calling 'protocol'.

This is currently done using a 'cps_' prefix on function names. If a call is made to a function with a name starting with 'cps_', it will be transformed into a CPS call.

The current transformer also assumes that function definitions that start with 'cps_' also want to be converted.

In order to interface normally with code following the protocol, it's necessary to provide a way to define functions that obey the protocol, but do not need conversion. This is done with a '@cps_manual' decorator, which is removed by the conversion process.

protocol

The CPS 'protocol' is currently very simple: aside from the required prefix in the name, a CPS function takes an extra argument (the first one), commonly called 'k', which represents the rest of the computation. Here's an example, hand-written cps_print() function:

@cps_manual
def cps_print (k, v):
    print (v)
    k()

In this case the continuation function itself takes no args. This is referred to in the code as a 'dead' continuation (it does not expect to receive a value).

A silly example of one accepting a value:

@cps_manual
def cps_double (k, x):
    k (x * 2)

An easy way to think about this function is to consider it a replacement for the 'return' statement.

Note: you don't normally provide or concern yourself with this argument, unless you are writing code that will interface with CPS'd code.

the transform

The transformer first converts a Python AST to a CPS Node structure. CPS is often used in functional languages as an intermediate representation, similar to the SSA form used by most modern compilers. [in fact CPS and SSA have been proven equivalent]. Looking at a CPS expression, you'll notice that it seems 'inside-out' compared to a normal AST. The outermost layers of a CPS tree represent the innermost layers of a computation. In that sense CPS looks more like assembly or byte code.

The reason for the inside-out nature of CPS is that we need to be able to capture 'the rest of the computation' at any point in case we need to package it up into a callback function.

The code generated by this transform is fairly primitive, with lots of room for optimizations and improvements, but I hope it demonstrates the feasibility of the approach.

Here's an example input:

# -*- Mode: Python -*-

@cps_manual
def cps_print (k, v):
    print (v)
    k()

def cps_fact (n):
    if n == 1:
        return 1
    else:
        return n * cps_fact (n-1)

cps_print (cps_fact (5))

And the output of the transform:

def cps_print(k, v):
    print(v)
    k()
def cps_fact (k, n):
    v13 = n
    v14 = 1
    v4 = v13 == v14
    if v4:
        v5 = 1
        k (v5)
    else:
        v7 = n
        def kf2 (v8):
            v6 = v7 * v8
            k (v6)
        v11 = n
        v12 = 1
        v9 = v11 - v12
        v10 = cps_fact
        v10 (kf2, v9)
def kf0 ():
    pass
def kf1 (v0):
    v1 = cps_print
    v1 (kf0, v0)
v2 = 5
v3 = cps_fact
v3 (kf1, v2)

With some simple optimizations (not yet implemented) it might look like this:

def cps_print(k, v):
    print(v)
    k()
def cps_fact (k, n):
    if n == 1:
        k (1)
    else:
        v7 = n
        def kf2 (v8):
            k (v7 * v8)
        cps_fact (kf2, n - 1)
def kf0 ():
    pass
def kf1 (v0):
    cps_print (kf0, v0)
cps_fact (kf1, 5)

trampoline

One problem with using CPS in Python is that it will quickly result in a stack overflow. CPS functions never actually return, they always just invoke another function (called the 'continuation', and often labeled simply 'k'). This will result in a never-ending accumulation of frames on the stack. A great demonstration of this can be had with the classic lisp 'tak' benchmark, which makes 63,609 recursive function calls before returning. Even after raising sys.recursionlimit to over 10,000 it is unable to execute without overflowing the stack.

However, the purpose of this module is to generate code that will work within an event-driven scheduler system, where callbacks will be stuffed into a data structure somewhere for later execution.

The transformer can be 'hooked' to schedule a continuation to be invoked later by such a scheduler, solving the stack overflow problem while also making the system actually useful.

The technique of handing a continuation off for later invocation is called trampolining.

I've provided a simple example scheduler and trampoline invocation scheme in the module trampoline.py. With this change the tak benchmark executes with no trouble.

exceptions

The transformer is by no means complete. It implements a small subset of Python's grammar - enough to hopefully give a proof of concept. One major missing piece is support for exceptions. I believe that it should not be too difficult to support exceptions, using a modification to CPS called 'exception-passing style'. This approach passes around two continuations at all times, the 'normal' continuation and an 'exception' continuation. If an exception is raised by any of the converted code, it will invoke the exception continuation. In terms of an event scheduler, each 'callback' will then consist of two functions.

timeouts

It should be possible to implement lots of nice thread-like features around this when combined with an event scheduler, including stuff like a with_timeout() function.

bytecode

A better version of this transform could probably be done utilizing bytecode output rather than source. I think this could also target Python 2.

CPS transform in C

I wrote some notes a while ago about applying the CPS transform to C code, readers may find it helpful/interesting.