report.tex

\documentclass[acmsmall, nonacm, screen]{acmart}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{listings}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{float}
\usepackage{xcolor}
\usepackage{stmaryrd}
\usepackage{anyfontsize}

\newif\ifdraft\draftfalse

\definecolor{green}{HTML}{298a33}
\definecolor{orange}{HTML}{995d02}

\newcommand{\outline}[1]{
  \ifdraft
  {\color{red}{#1}}
  \fi
}

\newcommand\doverline[1]{%
  \setbox0=\hbox{$\overline{#1}$}%
  \ht0=\dimexpr\ht0-.30ex\relax% CHANGE .15 TO AFFECT SPACING
  \overline{\copy0}%
}

\newcommand{\kw}[1]{\textsf{\color{ACMDarkBlue} #1}}
\newcommand{\ifThenElse}[3]{\textsf{\color{ACMDarkBlue}if}~#1~\textsf{\color{ACMDarkBlue}then}~#2~\textsf{\color{ACMDarkBlue}else}~#3}
\newcommand{\caseOf}[2]{\textsf{\color{ACMDarkBlue} case}~#1~\textsf{\color{ACMDarkBlue}of}~\{~#2~\}}
\newcommand{\letIn}[3]{\textsf{\color{ACMDarkBlue}let}~#1 = #2~\textsf{\color{ACMDarkBlue}in}~#3}
\newcommand{\shift}[2]{\textsf{\color{ACMDarkBlue}shift}~#1~\textsf{\color{ACMDarkBlue}in}~#2}
\newcommand{\callcc}[2]{\textsf{\color{ACMDarkBlue}call/cc}~#1~\textsf{\color{ACMDarkBlue}in}~#2}
\newcommand{\reset}[1]{\kw{reset}~#1}
\newcommand{\lambdaE}[2]{\lambda #1.\, #2}
\newcommand{\just}[1]{\textsf{Just}~#1}
\newcommand{\nothing}{\textsf{Nothing}}
\newcommand{\map}[3]{\textsf{map}^{\textsf{#1}}~#2~#3}
\newcommand{\unit}[2]{\textsf{unit}^{\textsf{#1}}~#2}
\newcommand{\join}[2]{\textsf{join}^{\textsf{#1}}~#2}
\newcommand{\cps}[1]{\mathcal{C}\llbracket #1 \rrbracket}
\newcommand{\cpsm}[1]{\mathcal{C}'\llbracket #1 \rrbracket}
\newcommand{\cpsmc}[1]{\mathcal{C}''\llbracket #1 \rrbracket}
\newcommand{\denote}[1]{\mathcal{E}\llbracket #1 \rrbracket}
\newcommand{\stringE}[1]{\textsf{\color{green} ``#1''}}
\newcommand{\quoteE}[1]{{\color{orange} \ulcorner #1 \urcorner}}
\newcommand{\unquoteE}[1]{{\color{black} \llparenthesis #1 \rrparenthesis }}

\lstset{ %
  backgroundcolor=\color{white},
  commentstyle=\color{ACMGreen},
  keywordstyle=\color{ACMDarkBlue},
  stringstyle=\color{ACMPurple},
  basicstyle=\ttfamily
}

\lstdefinestyle{hs}{
  language=Haskell
}

\lstdefinestyle{rkt}{
  language=lisp,
  deletekeywords={get},
  morekeywords={define},
  literate=*{(}{{\textcolor{gray}{(}}}{1}
    {)}{{\textcolor{gray}{)}}}{1}
}

\AtBeginDocument{%
  \providecommand\BibTeX{{%
    \normalfont B\kern-0.5em{\scshape i\kern-0.25em b}\kern-0.8em\TeX}}}


\acmBooktitle{}

\begin{document}

\noindent {\bf Delimited Continuations and Monads: Exploring Abstractions from First Principles}

\noindent Submitted for WPE-II examination April 5, 2021 by Harrison Goldstein.

\bigskip
\noindent Committee: Stephanie Weirich (chair), Benjamin Pierce (adviser), Rajeev Alur

\bigskip
This report is intended for a general computer science research audience (including early PhD
students), and assumes basic familiarity with programming languages concepts (e.g., CIS 500 or
equivalent).

\vfill

\pagebreak

\title{Delimited Continuations and Monads}
\subtitle{Exploring Abstractions from First Principles}
\titlenote{
  This report was compiled for part of the University of Pennsylvania's WPE-II Exam. The
  accompanying talk is available on the author's website.
}

\author{Harrison Goldstein}
\email{hgo@seas.upenn.edu}
\orcid{0000−0001−9631−1169}
\affiliation{%
  \institution{University of Pennsylvania}
  \city{Philadelphia, PA}
  \country{USA}
}

\renewcommand{\shortauthors}{Goldstein}

\begin{abstract}
  In 1990, two programming abstractions were introduced independently: {\em delimited
  continuations} and {\em monads}. These ideas quickly gained popularity, and over the course of
  30 years they have permeated programming languages literature. Modern researchers often assume
  knowledge of these abstractions when writing papers, a practice which streamlines exposition
  but increases barriers to entry.
  
  This report presents both delimited continuations and monads from first principles by following
  the papers that originally popularized the ideas. First I discuss {\em Abstracting Control} by
  \citet{danvy1990abstracting}, and then I explore {\em Comprehending Monads} by
  \citet{wadler1990comprehending}. Though the two abstractions initially seem to have little to
  do with one-another, delimited continuations and monads actually have a lot in common: they can
  implement many of the same design patterns, and their meta-theories are surprisingly
  compatible. I leverage these connections to develop intuition and provide context for future
  reading.
\end{abstract}

\maketitle

\section{Introduction} \label{sec:introduction}
Abstractions are the most powerful tools in a computer scientist's toolbox. They separate
high-level ideas from low-level details, highlighting the novel and interesting parts of a
program. Sometimes a particular abstraction becomes so ingrained in a community that members
naturally think in terms of the abstraction and use it without explanation. Broadly, this is
good---a powerful common language makes it easier to communicate ideas---but it does make it more
difficult for outsiders and new researchers to understand a body of work. This report sets out to
explain a couple of these abstractions for researchers familiar with, but not steeped in, the
programming languages literature.

The notion of {\em delimited continuations} is one such abstraction that is ingrained in the
programming languages community. These operators were first presented in {\em Abstracting
Control} by \citet{danvy1990abstracting}, and they are one of the many tools used to manipulate
program continuations.

A {\em continuation} is a first-class representation of the {\em context} around a computation.
For example, when evaluating the sub-expression ``$2$'' in the program
\[ \letIn{i}{84}{\letIn{j}{2}{i / j}}, \]
we say the context is
\[ \letIn{i}{84}{\letIn{j}{\_}{i / j}}, \]
and the continuation, $k$, is
\[ k = \lambdaE{x}{\letIn{i}{84}{\letIn{j}{x}{i / j}}}. \]
The continuation captures the context as a function, which can be used as a first-class value.
Danvy and Filinski present two operators for manipulating continuations: ``$\shift{k}{e}$'' and
``$\reset{e}$.'' The \kw{reset} operation delimits a context, and \kw{shift} packages that
context as a continuation. For example, we can use \kw{shift} and \kw{reset} to extract the
continuation from above:
\begin{align*}
  & \reset{(\letIn{i}{84}{\letIn{j}{(\shift{k}{k~2})}{i / j}})} \Rightarrow \\
  & \letIn{k}{(\lambdaE{x}{\letIn{i}{84}{\letIn{j}{x}{i / j}}})}{k~2}
\end{align*}
We replaced the expression $2$ with $\shift{k}{k~2}$, which says ``capture the context as $k$ and
then apply that to $2$,'' and we wrapped the whole computation in \kw{reset} to delimit the
context. The result is the same as the original expression, but the continuation is made
explicit. In that example, the \kw{shift} body just applies $k$, but it can do anything. In
particular, if $k$ is not used at all,
\[ \reset{(\letIn{i}{84}{\letIn{j}{(\shift{k}{\textsf{Nothing}})}{i / j}})} \]
the expression essentially throws an {\em exception}: the context is discarded and the result of
the entire computation is just \textsf{Nothing}. This flexibility makes delimited continuations
powerful but also notoriously difficult for newcomers to reason about.

Another abstraction that is deeply embedded in programming languages literature is the {\em
monad}. Monads are notorious for being explained poorly, to the point where people jokingly quote
``a monad is just a monoid in the category of endofunctors''---a sentence which is
incomprehensible even to most community insiders~\cite{iry_2009,mac2013categories}. The internet
is filled with monad tutorials that compare the mathematical structure to burritos (tasty, but
unhelpful) and other ``more accessible'' concepts, but few provide useful explanations. To avoid
falling into this trap of explaining monads badly, I follow here one of the earliest and clearest
explanations of monads for programming, {\em Comprehending Monads} by
\citet{wadler1990comprehending}.

Wadler presents monads using an analogy to list comprehensions like
\[
  [ (x,\,y) \mid x \leftarrow [1,\, 2] ;\ y \leftarrow [3,\, 4] ],
\]
which evaluates to the Cartesian product of the lists $[1,\, 2]$ and $[3,\, 4]$. Wadler observes
that comprehension syntax can be used with more than just lists---in fact it can be used with any
monad. One of the simplest monads is, called \textsf{Maybe} is defined as \[ \textsf{data
Maybe}~\alpha = \textsf{Nothing} \mid \textsf{Just}~\alpha \] in languages like Haskell. We can
write a {\em \textsf{Maybe} comprehension} like,
\[
  [ i / j \mid i \leftarrow [84]^{\textsf{Maybe}} ;\ j \leftarrow \ifThenElse{n = 0}{\textsf{Nothing}}{[n]^{\textsf{Maybe}}} ]^{\textsf{Maybe}}
\]
which uses comprehension syntax with the \textsf{Maybe} monad to abstract over {\em error
handling}. In the expression, we first bind $i$ to the value $84$. Then we check if $n$ is $0$:
if it is, the whole expression results in \textsf{Nothing}, otherwise we bind $j$ to $n$.
Finally, if the computation has not failed, we compute $i / j$. Just as list comprehensions
provide a way to sequence lists, \textsf{Maybe} comprehensions provide a way to sequence
computations that might fail. There are dozens of monads that programmers use in practice, each
of which gives a different interpretation of comprehension syntax.

Both delimited continuations and monads are interesting on their own, but why present them
together? It is not immediately clear that manipulating continuations has anything to do with
interpreting comprehension syntax; one could be forgiven for assuming that these ideas are
largely orthogonal. But we have already seen that delimited continuations and monads can both be
used to simulate exceptions, and it turns out that there is significantly more overlap. Design
patterns like state, nondeterminism, and more can be implemented using both delimited
continuations and monads. Furthermore, continuations actually form a monad, and Wadler shows that
this fact can be exploited to recover results from Danvy and Filinski.

In this report I make three contributions:
\begin{itemize}
  \item I explain delimited continuations and monads by following two influential papers: {\em
  Abstracting Control} by Danvy and Filinski (\S~\ref{sec:danvy}) and {\em Comprehending Monads}
  by Wadler (\S~\ref{sec:wadler}).
  \item I develop intuition for programming with delimited continuations by implementing a number
  of the design patterns that Wadler presents as monads. This solidifies the close relationship
  between the two language features (\S~\ref{sec:patterns}).
  \item I present the {\em continuation monad} and show that, by performing a monad-agnostic
  transformation given by Wadler, we can recover the ``extended continuation passing style''
  transformation presented by Danvy and Filinski (\S~\ref{sec:contmonad}). This is a small
  extension of Wadler's original result.
\end{itemize}
I conclude with references to further reading that builds on these abstractions
(\S~\ref{sec:conclusion}).

\subsection*{Notation} \label{sec:notation}
My source papers differ slightly in notation and conventions. I have chosen a middle-ground and
use one unified set of conventions to discuss both papers. Throughout this report, programs are
written in a lambda calculus with syntax defined by:
\begin{align*}
  e \Coloneqq &\ x \mid \lambdaE{x}{e} \mid e_1~e_2 \mid \letIn{x}{e_1}{e_2} \\
            | &\ \kw{true} \mid \kw{false} \mid \ifThenElse{e_1}{e_2}{e_3} \mid n \mid e_1 + e_2 \\
            | &\ (e_1,\, e_2) \mid \kw{fst}~e \mid \kw{snd}~e \mid \cdots
\end{align*}
The core calculus is made up of variables, $x$, lambda abstractions, $\lambdaE{x}{e}$, and
function application $e_1~e_2$. Let-binding makes presentation clearer, so I assume we have that
too, although formally I assume that
\[ \letIn{x}{e_1}{e_2} \] 
is implemented as
\[ (\lambdaE{x}{e_2})~e_1, \]
so I often ignore it in meta-theory. Danvy and Filinski use Booleans and natural numbers to
illustrate delimited continuations; Wadler uses pairs and lightweight data structure definitions
to illustrate monads---I also use these features as needed. Since we are almost exclusively
concerned with dynamic semantics, I do not formalize a type system.

Built-in language operators are written in \kw{blue sans-serif} and defined functions are written
in \textsf{black sans-serif}.

\section{{\em Abstracting Control}} \label{sec:danvy}
Before explaining delimited continuations, I should explain some background on continuations and
first-class continuation operators. Continuations first appeared as meta-theoretical
tools~\cite{strachey2000continuations}, but before long they were introduced into concrete
languages via the {\em continuation-passing style} or CPS translation. Here is a call-by-value
CPS translation, $\cps{\cdot}$, for our basic lambda calculus with Booleans and if-statements:
\begin{align*}
  \cps{x} &= \lambdaE{\kappa}{\kappa~x} \\
  \cps{\lambdaE{x}{e}} &= \lambdaE{\kappa}{\kappa~(\lambdaE{x}{\cps{e}})} \\
  \cps{e_1~e_2} &= \lambdaE{\kappa}{\cps{e_1}~(\lambdaE{f}{\cps{e_2}~(\lambdaE{x}{f~x~\kappa})})} \\
  \cps{\kw{true}} &= \lambdaE{\kappa}{\kappa~\kw{true}} \\
  \cps{\ifThenElse{e_1}{e_2}{e_3}} &= \lambdaE{\kappa}{\cps{e_1}~(\lambdaE{b}{\ifThenElse{b}{\cps{e_2}~\kappa}{\cps{e_3}~\kappa}})}
\end{align*}
This translation extracts every evaluation context---essentially every individual sub-term of a
program---as an explicit continuation function. The resulting terms do not ``return'' in the
traditional sense: when these terms finish computing, they pass the resulting value to a
continuation, $\kappa$, which is provided as an argument. A translated term can be run by
applying it to the trivial continuation, and $\cps{e}~(\lambdaE{x}{x})$ should have the same
semantics as $e$. In other words, this translation is semantics-preserving---it only changes the
structure of the program.

The CPS translation has been used extensively in compilers and interpreters, where it gives
fine-grained control over evaluation. This is because the translation makes evaluation order
explicit. This particular CPS translation is clearly call-by-value, since in the rule for
application,
\[ \cps{e_1~e_2} = \lambdaE{\kappa}{\cps{e_1}~(\lambdaE{f}{\cps{e_2}~(\lambdaE{x}{f~x~\kappa})})} \]
the argument $e_2$ is evaluated before $f$ is applied.

As CPS translations became more popular, researchers began to consider first-class abstractions
for working with continuations. The most famous of these early abstractions, ``call with current
continuation,'' or \kw{call/cc}~\cite{haynes1984continuations}, can be added as an operator in
our source language and implemented as an extension to the CPS translation:
\[ \cps{\callcc{k}{e}} = (\lambdaE{\kappa}{\cps{e}~\kappa})[k \mapsto
\lambdaE{x}{\lambdaE{\kappa'}{\kappa~x}}], \]
where $e[x \mapsto v]$ means $e$ with $v$ substituted for free instances of $x$.

The \kw{call/cc} operator is extremely powerful. The expression $\callcc{k}{e}$ captures the current
program context as $k$ and then runs $e$, allowing the programmer to arbitrarily pause and resume
evaluation as they see fit. For example,
\[ \callcc{k}{1} \]
aborts the computation, returning the value $1$, and
\[ \callcc{k}{(\kw{print}~\stringE{foo};\ k~10)} \]
pauses the computation, prints $\stringE{foo}$, and then resumes computation with the value 10.
Unfortunately, many have argued that \kw{call/cc} is {\em too} powerful. Consider the popular
Yin-Yang Puzzle:
\begin{align*}
& \kw{let}\ \textsf{yin}\ =\ (\lambdaE{c}{(\kw{print}~\stringE{@};\ c)})~(\callcc{k}{k})~\kw{in} \\
& \kw{let}\ \textsf{yang}\ =\ (\lambdaE{c}{(\kw{print}~\stringE{*};\ c)})~(\callcc{k}{k})~\kw{in} \\
& \textsf{yin}~\textsf{yang}
\end{align*}
What does this confusing mess of continuations do? Apparently it counts. This program prints
increasing numbers in a unary representation,
\[ \stringE{@*@**@***@****@*****...}, \]
but it is extremely difficult to understand exactly why that is.

\citet{felleisen1988theory} reigned in the power of \kw{call/cc} with operators that he
called \kw{prompt} and \kw{control}. These allowed programmers to delimit the
effects of \kw{call/cc} with {\em scopes}. Unfortunately, Felleisen's scopes were dynamic,
and his approach did not admit a straightforward translation into a standard lambda calculus.
This was the impetus for Danvy and Filinski's statically delimited continuations.

\subsection{Delimited Continuations and ECPS}
In {\em Abstracting Control}, Danvy and Filinski introduce the \kw{shift} and \kw{reset} operations
mentioned in Section \ref{sec:introduction}, which provide a more usable alternative to
\kw{call/cc}. We extend our lambda calculus:
\[
  e \Coloneqq \cdots \mid \shift{k}{e} \mid \reset{e}
\]
Like \kw{call/cc}, these operators can be interpreted via a modified CPS translation. Danvy
and Filinski call the new translation {\em extended continuation-passing style} (ECPS):
\begin{align*}
  \cps{\shift{k}{e}} &= (\lambdaE{\kappa}{\cps{e}~\textsf{id}})[k \mapsto \lambdaE{x}{\lambdaE{\kappa'}{\kappa'~(\kappa~x)}}] \\
  \cps{\reset{e}} &= \lambdaE{\kappa}{\kappa~(\cps{e}~\textsf{id})}
\end{align*}
The equation for $\reset{e}$ translates $e$ and extracts its final value by applying it to
\textsf{id}. This means that any continuation manipulation done by $e$ does not leak outside of
the scope of the \kw{reset}, so we say that \kw{reset} ``delimits'' the continuation. The
translation for \kw{shift} gives $e$ access to its continuation, $\kappa$, via the variable $k$.

Together, \kw{shift} and \kw{reset} provide an elegant interface for working with continuations.
For example, take the following expression and its evaluation:
\begin{align*}
& 1 + \reset{(10 + \shift{k}{k~(k~100)})} \Rightarrow \\
& 1 + (10 + (10 + 100)) \Rightarrow \\
& 121
\end{align*}
The continuation, $k = \lambdaE{x}{10 + x}$, is captured by the \kw{shift} operator, since
``$10 + \shift{k}{\dots}$'' is within a context delimited by \kw{reset}. The outer ``$1\ +$'' is
not captured because it is outside of the \kw{reset}. The function $k$ is applied twice to $100$,
resulting in $120$, and then we add $1$ to get $121$.

\subsection{A Denotational Semantics}
Danvy and Filinski point out that the ECPS translation, with rules for \kw{shift} and \kw{reset},
is no longer truly in continuation-passing style. In particular, the rule for \kw{shift} means
that the resulting program might not enforce strict call-by-value semantics, since the
translation of \kw{shift} introduces a redex $\kappa'~(\kappa~x)$. This could be solved by
translating the result a second time, but we can use this as an opportunity to take a step back
and look at another way of defining the semantics of our calculus with \kw{shift} and \kw{reset}.

In addition to the ECPS translation, Danvy and Filinski give a denotational translation. Whereas
the ECPS translation is purely syntactic, this transformation is {\em semantic} and maps every
program to a mathematical object in a given {\em domain}. In this case the domain contains
Booleans (since we have \kw{true} and \kw{false} as
base values) and computable functions. This is common practice in programming languages
literature and hearkens back to the origin of continuations~\cite{strachey2000continuations}.
Assuming $\textsf{Ans}$ is a suitable domain of final answers, we define:
\begin{center}
  \begin{tabular}{llr}
    $\rho \in \textsf{Env}$ & $=$ & $\textsf{Var} \rightharpoonup \textsf{Val}$ \\
    $\gamma \in \textsf{MCont}$ & $=$ & $\textsf{Val} \to \textsf{Ans}$ \\
    $\kappa \in \textsf{Cont}$ & $=$ & $\textsf{Val} \to \textsf{MCont} \to \textsf{Ans}$ \\
    $\mathcal{E}$ & $:$ & $\textsf{Exp} \to \textsf{Env} \to \textsf{Cont} \to \textsf{MCont} \to \textsf{Ans}$
  \end{tabular}
\end{center}
The variable environment, $\rho$, maps variables to domain values, $\gamma$ is a
``meta''-continuation that captures the continuation {\em outside} of the closest enclosing
\kw{reset}, and $\kappa$ is the continuation that we are used to (inside of the closest
\kw{reset}). The denotational semantics is given by the equations:
\begin{align*}
  \denote{x}~\rho~\kappa~\gamma &= \kappa~(\rho[x])~\gamma \\
  \denote{\lambdaE{x}{e}}~\rho~\kappa~\gamma &= \kappa~(\lambdaE{v}{\denote{e}~(\rho[x \mapsto v])})~\gamma \\
  \denote{e_1~e_2}~\rho~\kappa~\gamma &=
    \denote{e_1}~\rho~(\lambdaE{f}{\denote{e_2}~\rho~(\lambdaE{x}{f~x~\kappa})})~\gamma \\
  \denote{\kw{true}}~\rho~\kappa~\gamma &= \kappa~\kw{true}~\gamma \\
  \denote{\ifThenElse{e_1}{e_2}{e_3}}~\rho~\kappa~\gamma &= 
    \denote{e_1}~\rho~(\lambdaE{b}{\ifThenElse{b}{\denote{e_2}~\rho~\kappa}{\denote{e_3}~\rho~\kappa}})~\gamma \\
  \denote{\shift{k}{e}}~\rho~\kappa~\gamma &=
    \denote{e}~(\rho[k \mapsto \lambdaE{x}{\lambdaE{\kappa'}{\lambdaE{\gamma'}{\kappa~x~(\lambdaE{w}{\kappa'~w~\gamma'})}}}])~(\lambdaE{x}{\lambdaE{\gamma''}{\gamma''~x}})~\gamma \\
  \denote{\reset{e}}~\rho~\kappa~\gamma &= \denote{e}~\rho~(\lambdaE{x}{\lambdaE{\gamma'}{\gamma'~x}})~(\lambdaE{x}{\kappa~x~\gamma})
\end{align*}
It is safe to ignore the $\gamma$ arguments for all rules other than the ones for \kw{shift} and
\kw{reset}, since $\gamma$ can be $\eta$-reduced away (since $\lambdaE{x}{f~x}$ is equivalent to
$f$). Without $\gamma$, the first five equations look a lot like the standard CPS translation,
just lifted to a semantic domain. The meta-continuations are used in the \kw{shift} and
\kw{reset} rules to capture the outer continuation as a computable function. Remember that the
constructions on the right-hand side are part of a meta-language and represent mathematical
objects rather than concrete syntax. Our continuations are embedded in the meta-theory.

All of this is important for theoretical work on this language, but the syntactic ECPS
transformation is more useful for concrete implementations. Next, we return to the ECPS
translation and explore ways that it can be made more efficient.

\subsection{Metacircular Interpreters}
A {\em metacircular interpreter} is a powerful tool for expressing a language's
semantics~\cite{reynolds1972definitional}. In the case of our \kw{shift}--\kw{reset} language,
the metacircular interpreter looks a lot like the ECPS translation, but it does a bit more work
by translating constructs an executable meta-language rather than generating syntax directly.

The meta-language for this interpreter, is the same lambda calculus we have been working with,
but without \kw{shift} or \kw{reset}. We use a simple {\em quoting} mechanism to specify which
parts of the output are executable code and which are concrete syntax. The code in black is the
executable interpreter, while the $\quoteE{\text{quoted}}$ code is concrete syntax output. We use
$\unquoteE{\text{unquote brackets}}$ to splice computations into quoted segments. The translation
is defined as:
\begin{align*}
  \cpsm{x} &= \lambdaE{\kappa}{\kappa~x} \\
  \cpsm{\lambdaE{x}{e}} &=
    \lambdaE{\kappa}{\kappa~\quoteE{\lambdaE{\unquoteE{x}}{\lambdaE{k}{\unquoteE{\cpsm{e}~(\lambdaE{a}{\quoteE{k~\unquoteE{a}}})}}}}} \\
  \cpsm{e_1~e_2} &= \lambdaE{\kappa}{\cpsm{e_1}~(\lambdaE{f}{\cpsm{e_2}~(\lambdaE{x}{\quoteE{\unquoteE{f}~\unquoteE{x}~(\lambdaE{t}{\unquoteE{\kappa~\quoteE{t}}})}})})} \\
  \cpsm{\kw{true}} &= \lambdaE{\kappa}{\kappa~\kw{true}} \\
  \cpsm{\ifThenElse{e_1}{e_2}{e_3}} &= \lambdaE{\kappa}{\cpsm{e_1}~(\quoteE{\lambdaE{b}{\ifThenElse{b}{\unquoteE{\cpsm{e_2}~\kappa}}{\unquoteE{\cpsm{e_3}~\kappa}}}})} \\
  \cpsm{\shift{k}{e}} &= (\lambdaE{\kappa}{\cpsm{e}~\textsf{id}})[k \mapsto \quoteE{\lambdaE{x}{\lambdaE{\kappa'}{\kappa'~\unquoteE{\kappa~\quoteE{x}}}}}] \\
  \cpsm{\reset{e}} &= \lambdaE{\kappa}{\kappa~(\cpsm{e}~\textsf{id})}
\end{align*}
and the final ECPS result is given by applying $\cpsm{e}$ to the identity function.

Like in to the denotational semantics, the continuations are all meta-functions; there are no
$\kappa$s in the resulting programs. The difference is that the meta-functions are executed to
produce concrete syntax, rather than interpreted as abstract functions. For example, the rule for
if-statements will produce a concrete term of the form $\lambdaE{b}{\ifThenElse{\_}{\_}{\_}}$,
rather than a mathematical object. Representing continuations at the meta-level drastically
reduces the size of the resulting programs---there are no unnecessary redexes in the output.

It turns out that we can do even better if we include \kw{shift} and \kw{reset} in our
meta-language. This final ECPS translation,
\begin{align*}
  \cpsmc{x} &= x \\
  \cpsmc{\lambdaE{x}{e}} &= \quoteE{\lambdaE{\unquoteE{x}}{\lambdaE{\kappa}{\unquoteE{\reset{(\quoteE{\kappa~\unquoteE{\cpsmc{e}}})}}}}} \\
  \cpsmc{e_1~e_2} &= \shift{\kappa}{\quoteE{\unquoteE{\cpsmc{e_1}}~\unquoteE{\cpsmc{e_2}}~(\lambdaE{t}{\unquoteE{\kappa~\quoteE{t}}})}} \\
  \cpsmc{\kw{true}} &= \kw{true} \\
  \cpsmc{\ifThenElse{e_1}{e_2}{e_3}} &= \kw{shift}~\kappa~\kw{in} \\
  &\ \ \quoteE{\ifThenElse{\unquoteE{\cpsmc{e_1}}}{\unquoteE{\reset{(\kappa~\cpsmc{e_2})}}}{\unquoteE{\reset{(\kappa~\cpsmc{e_3})}}}} \\
  \cpsmc{\shift{k}{e}} &= \shift{\kappa}{(\reset{\cpsmc{e})}[k \mapsto \quoteE{\lambdaE{x}{\lambdaE{\kappa'}{\kappa'~\unquoteE{\kappa~\quoteE{x}}}}}]} \\
  \cpsmc{\reset{e}} &= \reset{\cpsmc{e}}
\end{align*}
also avoids unnecessary $\eta$-expansion (i.e., it always produces $f$ instead of
$\lambdaE{x}{f~x}$), resulting in a pleasingly compact final representation.

This final interpreter is essentially written in the same language that it interprets---this is
why metacircular interpreters are called ``meta''---and that poses a bootstrapping problem.
Luckily, we already have both the ECPS translation and our denotational semantics as descriptions
of this meta-language. The efficient interpreter can be written in this meta-language and then
that interpreter can itself be ECPS transformed into a standard lambda calculus.

\subsection{Use Case: Nondeterministic Programming} \label{sec:danvy:nondet}
We can understand this machinery better by examining a use-case for delimited continuations that
Danvy and Filinski mention in their paper: nondeterministic computations.

We start by defining operators \textsf{fail} and \textsf{flip}, which form the basis of
nondeterministic computation:
\begin{align*}
\textsf{fail}~() &= \shift{k}{\stringE{failure}} \\
\textsf{flip}~() &= \shift{k}{(k~\kw{true};\ k~\kw{false};\ \textsf{fail}~())}
\end{align*}
We can think of \textsf{flip} as flipping a coin and \textsf{fail} as signaling a dead-end
outcome.

Concretely the \textsf{fail} operation acts as an exception. It throws away the continuation and
returns the string $\stringE{failure}$. The \textsf{flip} operation calls its continuation on
{\em both} \kw{true} and \kw{false}, which simulates a nondeterministic choice between the two
options. This works because the result of the nondeterministic computation is actually a sequence
of values---all of the possible outcomes. When we call \textsf{flip}, the possible options are
\kw{true} and \kw{false}, so we continue with both. If later in the computation one of those
values is undesirable, we can call \textsf{fail}.

The \textsf{choice} operation builds on \textsf{flip} to simulate a nondeterministic choice of
positive integers less than a given value.
\[
  \textsf{choice}~n = \ifThenElse{n < 1}{\textsf{fail~()}}{\ifThenElse{\textsf{flip}~()}{\textsf{choice}~(n - 1)}{n}}
\]
Calling $\textsf{choice}~2$ will continue with $2$, then $1$, and then $\stringE{failure}$.

Finally, Danvy and Filinski implement the function
\begin{align*}
\textsf{triple}~n~s =&  \\
& \kw{let}~i = \textsf{choice}~n~\kw{in} \\
& \kw{let}~j = \textsf{choice}~(i - 1)~\kw{in} \\
& \kw{let}~k = \textsf{choice}~(j - 1)~\kw{in} \\
& \ifThenElse{i + j + k = s}{(i,\, j,\, k)}{\textsf{fail}~()},
\end{align*}
which finds all triples of distinct positive integers that sum to a given integer $s$. The
complexity is pushed into \textsf{choice}, so this function is nice and readable. Evaluating the
expression
\[ \reset{(\kw{print}~(\textsf{triple}~9~15))} \]
prints all triples of integers up to $9$ that sum to $15$.

The \textsf{triple} example is mostly a toy based on an early benchmark, but these primitives can
be practically useful. In their paper, Danvy and Filinski show an evaluator for {\em
nondeterministic finite automata} (NFAs), which is straightforward to implement thanks to
operators like \textsf{flip} and \textsf{fail}. The implementation is not especially instructive,
so I do not repeat it here.

We return to continuations later on, but for now we can take a break from continuations and look
at monads.

\section{{\em Comprehending Monads}} \label{sec:wadler}
As mentioned in Section \ref{sec:introduction}, monads are as popular as they are poorly
understood. They were originally introduced to the programming languages literature by
\citet{moggi1991notions}, who was working in {\em category theory}. Luckily, monads can be
understood without learning a new branch of mathematics, and {\em Comprehending Monads} by
\citet{wadler1990comprehending} is a great place to start.

A monad is essentially just a data structure (e.g., \textsf{List} or \textsf{Maybe}) with certain
associated operations. First, a monad must have an operation \textsf{map} that applies a function
to the ``elements'' of the data structure. For example,
\[ \map{List}{f}{\overline{x}} \]
applies $f$ to all elements of $\overline{x}$ and 
\[ \map{Maybe}{f}{\overline{x}} \]
applies $f$ to the value contained in $\overline{x}$, if one exists. (I adopt Wadler's convention
of writing monadic values with a bar over the variable name.) The \textsf{map} operation must
obey two laws:
\begin{align*}
  \textsf{map}~\textsf{id} &= \textsf{id} \\
  \textsf{map}~(g \circ f) &= \textsf{map}~g \circ \textsf{map}~f
\end{align*}
The first law says that mapping the identity function over a structure does nothing---this is
necessary to make sure that \textsf{map} does not change the structure itself, only the elements
in the structure. The second law says that \textsf{map} is well-behaved with respect to function
composition.

A data structure with a valid \textsf{map} operation is called a {\em functor}. In order for a
functor to be a monad, it needs two more operations: \textsf{unit} and \textsf{join}. Applying
\textsf{unit} to a value injects that value into the monad structure. In the case of
\textsf{List},
\[ \unit{List}{x} = \textsf{singleton}~x = [x], \]
which is the simplest way to inject a value into a list. Likewise,
\[ \unit{Maybe}{x} = \textsf{Just}~x. \]


The \textsf{join} operation works on ``doubled up'' instances of the monad. It takes a doubled
value like
\begin{center}
  $[[1, 2], [3, 4]]$\hspace{5mm}or\hspace{5mm}$\textsf{Just}~(\textsf{Just}~5)$
\end{center}
and flattens it down to a single application of the monad like
\begin{center}
  $[1, 2, 3, 4]$\hspace{5mm}or\hspace{5mm}$\textsf{Just}~5$.
\end{center}
To be precise:
\begin{center}
  \begin{tabular}{lll}
    $\join{List}{\doverline{x}}$ & $=$ & $\textsf{flatten}~\doverline{x}$ \\
    $\join{Maybe}{\doverline{x}}$ & $=$ & $\caseOf{\doverline{x}}{\textsf{Just}~(\textsf{Just}~x) \to \textsf{Just}~x;\ \_ \to \textsf{Nothing}}$
  \end{tabular}
\end{center}

These operations must obey certain laws in order for the structure to truly be a monad:
\begin{align}
  \textsf{map}~f \circ \textsf{unit} &= \textsf{unit} \circ f \\
  \textsf{map}~f \circ \textsf{join} &= \textsf{join} \circ \textsf{map}~(\textsf{map}~f) \\
  \textsf{join} \circ \textsf{unit} &= \textsf{id} \\
  \textsf{join} \circ \textsf{map}~\textsf{unit} &= \textsf{id} \\
  \textsf{join} \circ \textsf{join} &= \textsf{join} \circ \textsf{map}~\textsf{join}
\end{align}
Law (1) and (2) enforce that \textsf{unit} and \textsf{join} are well-behaved with respect to
\textsf{map}. In particular, (1) is a strong statement that says that \textsf{unit} cannot {\em
do} very much---it must commute with an arbitrary function $f$ (modulo the monad structure). Law
(3) says that applying \textsf{unit} to a monadic value and then \textsf{join}-ing is a no-op.
Law (4) says the same as (3) except we apply \textsf{unit} {\em under} the monad using
\textsf{map}. Finally, law (5) says that for a triple-stacked monad (e.g., $[[[1]]]$), either way
of flattening it to a single monad (e.g., $[1]$) is the same.

Importantly, the monad laws require that \textsf{unit} and \textsf{join} are the {\em simplest}
ways to do their respective tasks. For example, suppose we tried to define:
\[ \unit{List}{x} = [x,\, x,\, x,\, x,\, x] \]
Technically this does inject a value into a list, but it is not the simplest way to do so and
\[
  (\textsf{join}^{\textsf{List}} \circ \textsf{unit}^{\textsf{List}})~[3] = [3,\, 3,\, 3,\, 3,\, 3] \neq \textsf{id}~[3]. \\
\]

All of this background on monads and their operations is important, but in practice using these
operations directly does not make for particularly elegant code. In the next section, I explore
Wadler's use of {\em comprehensions} to make programming with monads much more intuitive.

\subsection{Comprehensions}
{\em Comprehending Monads} does more than just explain monads. As the name cleverly suggests, the
paper develops a notation for monadic computations derived from list comprehensions.

List comprehensions are based on set-builder notation (e.g. $\{n + 1 \mid n \in \mathbb{N}\}$)
and were first introduced to programming by the SETL language in the
1960s~\cite{schwartz2012programming}. Since then, list comprehensions have become ubiquitous,
appearing in many modern languages including Racket, Python, and even C++ (with some abuse of
operator overloading). Comprehensions are a convenient way of building lists from other lists,
for example a construction like
\[ [(x,\, y) \mid x \leftarrow \overline{x};\ y \leftarrow \overline{y}]^{\textsf{List}} \]
computes the Cartesian product of $\overline{x}$ and $\overline{y}$. The analogous set-builder
construction looks like
\[ \{(x,\, y) \mid x \in X \wedge y \in Y\}. \]

List comprehensions have the form $[e \mid q]$, where $q$ is called a {\em qualifier}. Wadler
defined the syntax of qualifiers as:
\[ q \Coloneqq \Lambda \mid x \leftarrow e \mid (q_1;\ q_2) \]
Qualifiers can be empty (denoted $\Lambda$), a binder for a variable $x$, or a composition of two
qualifiers. This notation can be a bit heavy, so Wadler usually writes ``$[e]$'' instead of ``$[e
\mid \Lambda]$,'' and he proves that qualifier composition is associative so he can write ``$[e
\mid q_1;\ q_2]$'' instead of $[e \mid (q_1;\ q_2)]$.

{\em Comprehending Monads} centers around the observation that lists are not the only structure
for which comprehensions make sense. We can characterize behavior of list comprehensions with
following rules:
\begin{align*}
  [e \mid \Lambda] &= \textsf{singleton}~e \\
  [e_1 \mid x \leftarrow e_2] &= \textsf{map}~(\lambdaE{x}{e_1})~e_2 \\
  [e \mid (q_1;\ q_2)] &= \textsf{flatten}~[[e \mid q_2] \mid q_1]
\end{align*}
Notice that \textsf{singleton} is $\textsf{unit}^{\textsf{List}}$ and \textsf{flatten} is
$\textsf{join}^{\textsf{List}}$. Would using \textsf{unit} and \textsf{join} for other monads
give a reasonable way to re-interpret comprehensions?

Yes! Wadler defines a lambda calculus with generic {\em monad comprehension} syntax:
\begin{align*}
 e &\Coloneqq \cdots \mid [e \mid q]^{\textsf{M}} \\
 q &\Coloneqq \Lambda \mid x \leftarrow e \mid (q_1;\ q_2)
\end{align*}
For any monad \textsf{M} the monad comprehension behaves according to the following rules:
\begin{align*}
  [e \mid \Lambda]^{\textsf{M}} &= \unit{M}{e} \\
  [e_1 \mid x \leftarrow e_2]^{\textsf{M}} &= \map{M}{(\lambdaE{x}{e_1})}{e_2} \\
  [e \mid (q_1;\ q_2)]^{\textsf{M}} &= \join{M}{[[e \mid q_2]^{\textsf{M}} \mid q_1]^{\textsf{M}}}
\end{align*}
The empty qualifier simply injects $e$ into the monad using \textsf{unit}. The binding qualifier
defines a function: $e_2$ produces some monadic value, and the result of the comprehension is
$\lambdaE{x}{e_1}$ mapped over that value. Finally, qualifier composition translates to a nested
comprehension which is flattened using \textsf{join}. Note that while Wadler opts to extend the
language with comprehensions, they could instead be implemented as syntactic sugar---in other
words, they can be implemented as surface syntax and omitted from the formal language.

Comprehensions can be used for computations like the \textsf{Maybe} comprehension from Section
\ref{sec:introduction},
\[
  [ i / j \mid i \leftarrow [84]^{\textsf{Maybe}} ;\ j \leftarrow \ifThenElse{n = 0}{\textsf{Nothing}}{[n]^{\textsf{Maybe}}} ]^{\textsf{Maybe}}
\]
which cleanly handles the fact that the expression binding $j$ might fail, without introducing
significant syntactic overhead.

\subsection{Examples of Monads} \label{sec:monad-examples}
\textsf{List} and \textsf{Maybe} are far from the only monads, and while the formal definitions
are important, the best way to understand monads is by example. Wadler presents a whole zoo of
monads in his paper; exploring those structures provides valuable intuition for how monads work
and what they can do.

For each of these monads, I define the monad data structure (with a fairly standard syntax,
similar to Haskell) and the three operations, \textsf{map}, \textsf{unit}, and \textsf{join}. I
assume that we have case-statements in our language for destructuring these data structures.

\subsubsection{List and Maybe} \label{sec:monad-examples:maybe} \hfill {\bf The Running Examples}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{data List}$~$\alpha$ & $=$ & $\textsf{Nil} \mid \textsf{Cons}~\alpha~(\textsf{List}~\alpha)$ \\
    $\map{List}{f}{\overline{x}}$ & $=$ & $\caseOf{\overline{x}}{\textsf{Nil} \to \textsf{Nil};\ \textsf{Cons}~y~\overline{y} \to \textsf{Cons}~(f~y)~(\map{List}{f}{\overline{y}})}$ \\
    $\unit{List}{x}$ & $=$ & $\textsf{singleton}~x$ \\
    $\join{List}{\doverline{x}}$ & $=$ & $\textsf{flatten}~\doverline{x}$ \\
    \\
    $\textsf{data Maybe}$~$\alpha$ & $=$ & $\textsf{Nothing} \mid \textsf{Just}~\alpha$ \\
    $\map{Maybe}{f}{\overline{x}}$ & $=$ & $\caseOf{\overline{x}}{\textsf{Nothing} \to \textsf{Nothing};\ \textsf{Just}~y \to \textsf{Just}~(f~y)}$ \\
    $\unit{Maybe}{x}$ & $=$ & $\textsf{Just}~x$ \\
    $\join{Maybe}{\doverline{x}}$ & $=$ & $\caseOf{\doverline{x}}{\textsf{Just}~(\textsf{Just}~x) \to \textsf{Just}~x;\ \_ \to \textsf{Nothing}}$
  \end{tabular}
  }
\end{center}
\medskip
We have already seen the definitions for \textsf{List} and \textsf{Maybe}: the \textsf{List}
monad coincides exactly with list comprehensions, and the \textsf{Maybe} monad can be used to
sequence computations that might fail (returning \textsf{Nothing}).

\subsubsection{Identity and Strictness} \label{sec:monad-examples:identity} \hfill {\bf The Degenerate Cases}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{data Id}$~$\alpha$ & $=$ & $\alpha$ \\
    $\map{Id}{f}{\overline{x}}$ & $=$ & $f~\overline{x}$ \\
    $\unit{Id}{x}$ & $=$ & $x$ \\
    $\join{Id}{\doverline{x}}$ & $=$ & $\doverline{x}$ \\
    \\
    $\textsf{data Str}$~$\alpha$ & $=$ & $\alpha$ \\
    $\map{Str}{f}{\overline{x}}$ & $=$ & $\ifThenElse{\overline{x} \neq \bot}{f~\overline{x}}{\bot}$ \\
    $\unit{Str}{x}$ & $=$ & $x$ \\
    $\join{Str}{\doverline{x}}$ & $=$ & $\doverline{x}$
  \end{tabular}
  }
\end{center}
\medskip
\textsf{List} and \textsf{Maybe} are relatively simple monads, but there are a couple that are even
simpler. The most basic monad is just an identity---it does nothing. This monad is almost not
worth mentioning, but it does have one cute use-case: comprehension syntax for the identity monad
can be used instead of \kw{let} binding. We can write
\[ [e_3 \mid x \leftarrow e_1;\ y \leftarrow e_2]^{\textsf{Id}} \]
instead of
\[ \letIn{x}{e_1}{\letIn{y}{e_2}{e_3}}. \]
There is no real reason to prefer this syntax, so I will continue to use \kw{let}, but it is a
fun observation.

A slight variation on the identity monad, the {\em strictness} monad, does actually have some
interesting uses. Assuming a call-by-name base language, the strictness monad provides
control over evaluation order by altering the definition of $\textsf{map}$ to force the
evaluation of the argument $\overline{x}$. The expression
\[ [e_3 \mid x \leftarrow e_1;\ y \leftarrow e_2]^{\textsf{Str}} \]
works like
\[ \letIn{!x}{e_1}{\letIn{!y}{e_2}{e_3}} \]
where $\letIn{!x}{e_1}{e_2}$ forces the evaluation of $e_1$ before assigning to $x$. \textsf{Str}
comprehensions give access to call-by-value semantics as needed, without changing the underlying
language.

\subsubsection{State} \label{sec:monad-examples:state} \hfill {\bf Pure ``Imperative'' Programming}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{data State}$~$\alpha$ & $=$ & $\sigma \to (\alpha,\, \sigma)$ \\
    $\map{State}{f}{\overline{x}}$ & $=$ & $\lambdaE{s}{\letIn{(x,\, s')}{\overline{x}~s}{(f~x,\, s')}}$ \\
    $\unit{State}{x}$ & $=$ & $\lambdaE{s}{(x,\, s)}$ \\
    $\join{State}{\doverline{x}}$ & $=$ & $\lambdaE{s}{\letIn{(\overline{x},\, s')}{\doverline{x}~s}{\overline{x}~s'}}$
  \end{tabular}
  }
\end{center}
\medskip
The \textsf{State} monad enables code that {\em looks} stateful, even though there are no actual
references under the hood. Assuming a fixed type $\sigma$ of states, the type $\sigma \to
(\alpha,\, \sigma)$ says that each stateful computation takes a state as input and produces a
{\em result} of type $\alpha$ along with a new state.

Since \textsf{State} is such an important and interesting monad, we should look at the operations
in a bit of detail. As usual, \textsf{unit} is the simplest; it simply injects a value into a
stateful computation. The state stays the same, and the result is the argument passed to
\textsf{unit}. The \textsf{map} operation takes a computation, runs it, and then applies a
function to the result value. Finally, \textsf{join} runs a computation which results in {\em
another} computation, and then it runs that computation to get a final result. These operations
together mean that state comprehensions act like imperative code, with state handled implicitly.

As with many monads, interesting programs in the state-monad make use of some auxiliary
operations:
\begin{center}
  \begin{tabular}{lll}
    $\textsf{get}$ & $=$ & $\lambdaE{s}{(s,\, s)}$ \\
    $\textsf{put}~s'$ & $=$ & $\lambdaE{s}{((),\, s')}$
  \end{tabular}
\end{center}
The \textsf{get} operation accesses the state, and the \textsf{put} operation sets the state.
These can be used to write a program like
\[
  p = [k \mid i \leftarrow \textsf{get};\ \_ \leftarrow \textsf{put}~(i + 3);\ j \leftarrow
  \textsf{get};\ \_ \leftarrow \textsf{put}~(j * 7);\ k \leftarrow \textsf{get}]^{\textsf{State}},
\]
which simulates a simple stateful computation. We can call $p~3$ (choosing $3$ as the initial
state) to get the pair $(42,\, 42)$. If we expand the comprehension syntax and monad operation
definitions,
\begin{align*}
\lambda s_0.\ & \kw{let}~(i,\, s_1) = \textsf{get}~s_0~\kw{in} \\
& \kw{let}~((),\, s_2) = \textsf{put}~(i + 3)~s_1~\kw{in} \\
& \kw{let}~(j,\, s_3) = \textsf{get}~s_2~\kw{in} \\
& \kw{let}~((),\, s_4) = \textsf{put}~(j * 7)~s_3~\kw{in} \\
& \kw{let}~(k,\, s_5) = \textsf{get}~s_4~\kw{in} \\
& (k,\, s_5),
\end{align*}
we can see that the state parameters $s_0$, $s_1$, etc. are being passed from one expression to
the next, tracking the state as it evolves.

\subsubsection{Reader} \label{sec:monad-examples:reader} \hfill {\bf AKA ``State Reader''}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{data Reader}$~$\alpha$ & $=$ & $\rho \to \alpha$ \\
    $\map{Reader}{f}{\overline{x}}$ & $=$ & $\lambdaE{r}{f~(\overline{x}~r)}$ \\
    $\unit{Reader}{x}$ & $=$ & $\lambdaE{r}{x}$ \\
    $\join{Reader}{\doverline{x}}$ & $=$ & $\lambdaE{r}{(\doverline{x}~r)~r}$
  \end{tabular}
  }
\end{center}
\medskip
The \textsf{Reader} monad is a simplification of the \textsf{State} monad that does not allow the
state to be modified. Despite its relative simplicity, the \textsf{Reader} monad is quite
powerful. After defining the auxiliary operation,
\begin{center}
  \begin{tabular}{lll}
    $\textsf{ask}$ & $=$ & $\lambdaE{r}{r}$,
  \end{tabular}
\end{center}
\textsf{Reader} can be used to keep track of configuration values, environment information, and
any other static value that would otherwise be cumbersome to pass around explicitly. Readers are
also safer and more flexible than global variables or constants, since they are implemented as
functions and thus have a clearly delimited scope.

\subsubsection{Nondeterminism} \label{sec:monad-examples:nondet} \hfill {\bf The (Power) Set Monad}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{data ND}$~$\alpha$ & $=$ & $\mathcal{P}(\alpha)$ \\
    $\map{ND}{f}{\overline{x}}$ & $=$ & $\{f~x \mid x \in \overline{x}\}$ \\
    $\unit{ND}{x}$ & $=$ & $\{x\}$ \\
    $\join{ND}{\doverline{x}}$ & $=$ & $\bigcup \doverline{x}$
  \end{tabular}
  }
\end{center}
\medskip
The nondeterminism monad enables direct-style programming of nondeterministic algorithms, much
like the \textsf{triple} example from Section \ref{sec:danvy:nondet}. A nondeterministic value is
really a set of values, corresponding to each of the possible outcomes of a nondeterministic
computation. We define the same operators that we saw before:
\begin{center}
  \begin{tabular}{lll}
    $\textsf{fail}$ & $=$ & $\varnothing$ \\
    $\textsf{flip}$ & $=$ & $\{\kw{true},\, \kw{false}\}$,
  \end{tabular}
\end{center}
We can \textsf{fail} by returning the empty set---there are no possible outcomes---and we can
also \textsf{flip} between \kw{true} and \kw{false} as expected.

The monad operations keep track of all possible outcomes at a given point in the program, but
programs can be written as if there is only one path.

\subsubsection{Parser} \hfill {\bf Top-Down Parser Combinators}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{data Parser}$~$\alpha$ & $=$ & $\textsf{String} \to \textsf{List}~(\alpha,\, \textsf{String})$ \\
    $\map{Parser}{f}{\overline{x}}$ & $=$ & $\lambdaE{i}{[(f~x,\, i') \mid (x,\, i') \leftarrow \overline{x}~i]^{\textsf{List}}}$ \\
    $\unit{Parser}{x}$ & $=$ & $\lambdaE{i}{[(x, i)]^{\textsf{List}}}$ \\
    $\join{Parser}{\doverline{x}}$ & $=$ & $\lambdaE{i}{[(x,\, i'') \mid (\overline{x},\, i') \leftarrow \doverline{x}~i;\ (x,\ i'') \leftarrow \overline{x}~i']^{\textsf{List}}}$
  \end{tabular}
  }
\end{center}
\medskip
Finally, here is a more complicated monad: \textsf{Parser}. Parsers can be seen as a combination
of two monads: \textsf{List} and \textsf{State}. Intuitively, the \textsf{String} argument is the
parser input, and the result is a list of potential parses, each of which is made up of a result
of type $\alpha$ and the \textsf{String} that remains after parsing.

There are a number of auxiliary operations that are useful for \textsf{Parser} (often called {\em
parser combinators}), including \textsf{satisfy} which parses a character matching a given
predicate and \textsf{many} which applies a parser multiple times. An amazing amount of research
has been done on the \textsf{Parser} monad and parser combinators~\cite{hutton1996monadic,
leijen2001parsec}, and it is still an active area~\cite{willis2020staged}.

\subsection{Other Formulations of Monads}
Before moving on, I want to address some other formulations of monads that have become more
popular since {\em Comprehending Monads} was published.

A monad's \textsf{join} operation can be used to define special kind of function composition
called {\em Kleisli composition}. Given a monad \textsf{M} and two functions $f: \beta \to
\textsf{M}~\gamma$ and $g: \alpha \to \textsf{M}~\beta$, we can write a function
\[ f \circ_\textsf{M} g = \textsf{join}^\textsf{M} \circ \textsf{map}^\textsf{M}~f
\circ g \]
that composes them together while properly keeping track of $\textsf{M}$. This is useful for
programming because it means that monadic operations like $f$ and $g$ can be built up
individually and composed together to build larger programs. A very similar construction can also
yield the \kw{bind} (or \textsf{>>=}) definition of monads that is more common in modern
programming languages.

The \textsf{bind} definition of monads admits syntactic sugar called {\em do--notation}, which is
popular in Haskell. Do--notation is actually very similar to monad comprehension syntax:
\[
  \kw{do}\ \{\ i \leftarrow \unit{Maybe}{84};\ j \leftarrow
(\ifThenElse{n = 0}{\nothing}{\unit{Maybe}{n}});\ \unit{Maybe}{i/j}\ \}
\]
The distinctions between comprehension syntax and do--notation are mostly cosmetic. When using
do--notation, programmers write ``$\unit{Maybe}{e}$'' (or more idiomatically
``$\textsf{return}~e$'') instead of ``$[e]^{\textsf{M}}$.'' Also rather than put a return value at
the front of the comprehension, do--blocks evaluate to their final expression. For the remainder
of this paper I stick to comprehension syntax, but programming-focused monad examples usually use
something closer to do--notation.

Next, we examine examples of monads that are defined using delimited continuations.

\section{Common Design Patterns} \label{sec:patterns}
The monadic design patterns that Wadler presents are also a great showcase for delimited
continuations. We have already seen delimited continuations implement exceptions (\textsf{Maybe})
and nondeterminism (\textsf{ND}), but it turns out that they can be used to implement all of the
monads from the previous section.

This presentation was inspired by a blog post by \citet{xia_2019}, which shows how to implement a
variety of monads using continuation-passing. Following Xia's definitions, I present each pattern
by defining (1) the primitive operations for the pattern and (2) a \textsf{run} function that
delimits the scope of the computation using \kw{reset}.

\subsubsection{Maybe} \hfill {\bf See Section \ref{sec:monad-examples:maybe}}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{abort}$ & $=$ & $\shift{k}{\textsf{Nothing}}$ \\
    $\textsf{run}^{\textsf{Maybe}}~c$ & $=$ & $\reset{(\textsf{Just}~c)}$
  \end{tabular}
  }
\end{center}
\medskip
We have already seen exceptions in the contexts of both continuations and monads already,
although I have been imprecise when ``throwing an exception'' using \kw{shift}. This construction
faithfully simulates the \textsf{Maybe} monad, since its computations either fail and return
$\nothing$ or succeed with a value $v$ and return $\just{v}$.

\pagebreak
\subsubsection{Identity and Strictness} \hfill {\bf See Section \ref{sec:monad-examples:identity}}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{noop}~e$ & $=$ & $\shift{k}{k~e}$ \\
    $\textsf{run}^{\textsf{Id}}~c$ & $=$ & $\reset{c}$ \\
    \\
    $\textsf{force}~e$ & $=$ & $\shift{k}{k~(\ifThenElse{e \neq \bot}{e}{\bot})}$ \\
    $\textsf{run}^{\textsf{St}}~c$ & $=$ & $\reset{c}$
  \end{tabular}
  }
\end{center}
\medskip
Identity can be encoded without using continuations at all, of course, but this presentation is
instructive. Here the \textsf{noop} operation shows how to do nothing using delimited
continuations---just call the continuation. All \textsf{run} needs to do is delimit the
continuation scope.

Just like with monads, we can also encode strictness (provided our source language is
non-strict). We do the same trick as before, forcing $e$ and then calling the continuation.

\subsubsection{State} \hfill {\bf See Section \ref{sec:monad-examples:state}}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{get}$ & $=$ & $\shift{k}{\lambdaE{s}{k~s~s}}$ \\
    $\textsf{put}~s'$ & $=$ & $\shift{k}{\lambdaE{s}{k~()~s'}}$ \\
    $\textsf{run}^{\textsf{State}}~c~i$ & $=$ & $\reset{((\lambdaE{v}{\lambdaE{s}{v}})~c)}~i$
  \end{tabular}
  }
\end{center}
\medskip
As with the state monad, encoding state with continuations is about capturing state in function
parameters and return values. Our continuations take two parameters: the first is the computation
result, and the second is the new state. The \textsf{get} operation takes in the current state
and continues with that state as both the result and the new state. The \textsf{put} operation
ignores the current state and continues with unit as the result and $s'$ as the new state. The
\textsf{run} function takes an initial state and passes it in {\em outside} the continuation
scope.

As with the monadic version, these operations can be used to simulate stateful code. For example,
this program, which is essentially the same program from Section \ref{sec:monad-examples:state},
\begin{align*}
\textsf{run}^{\textsf{State}}~(& \kw{let}~i = \textsf{get}~\kw{in} \\
& \textsf{put}~(i + 3); \\
& \kw{let}~j = \textsf{get}~\kw{in} \\
& \textsf{put}~(j * 7); \\
& \textsf{get})~3
\end{align*}
returns 42 as expected. In this case the control flow is a bit more complicated, but the
intuition is the same as for the monadic version.

\subsubsection{Reader} \hfill {\bf See Section \ref{sec:monad-examples:reader}}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{ask}$ & $=$ & $\textsf{get}$ \\
    $\textsf{run}^{\textsf{Reader}}$ & $=$ & $\textsf{run}^{\textsf{State}}$
  \end{tabular}
  }
\end{center}
\medskip
\textsf{Reader} is strictly simpler than \textsf{State}, but we can use the same definitions
anyway. We just rename \textsf{get} to \textsf{ask} and don't provide a \textsf{put} operation.

\subsubsection{Nondeterminism} \hfill {\bf See Section \ref{sec:monad-examples:nondet}}
\begin{center}
  \framebox[\textwidth]{
  \begin{tabular}{lll}
    $\textsf{fail}$ & $=$ & $\shift{k}{k~\varnothing}$ \\
    $\textsf{flip}$ & $=$ & $\shift{k}{k~\kw{true} \cup k~\kw{false}}$ \\
    $\textsf{run}^{\textsf{ND}}~c$ & $=$ & $\reset{((\lambdaE{v}{\{v\}})~c)}$
  \end{tabular}
  }
\end{center}
\medskip
We already saw one way of encoding nondeterminism using continuations, but here is another one
that is a bit closer to the monadic presentation. We again make use of sets to define the
primitive operations \textsf{fail} and \textsf{flip}: Our implementation of \textsf{run} lifts
computations into singleton sets, and our operators call the continuation on whatever values
should be considered---either nothing in the case of \textsf{fail} or both \kw{true} and
\kw{false} in the case of \textsf{flip}.

\subsection{Other Monad Implementations}
A couple of Wadler's monad examples are missing here, but not for any fundamental reason.
\textsf{List} can be implemented using delimited continuations without issues, but there are a
number of subtly different ways to do it and the technicalities are not informative.
\textsf{Parser}s can also be done using \kw{shift} and \kw{reset}, but the construction is
complex and, again, not informative. It would seem that anything that can be done with monads can
also be done with delimited continuations.

It turns out that with enough effort, this is roughly true! A few years after {\em Abstracting
Control}, \citet{filinski1994representing} published another paper showing that delimited
continuations can represent all {\em pure} monads (roughly those whose \textsf{map}, \textsf{join},
and \textsf{bind} operations can be implemented in our base lambda calculus). While the
constructions in Section \ref{sec:patterns} are ad-hoc and dependent on the particular monad in
question, Filinski managed to give a translation that represents monadic patterns with
continuations once-and-for-all. Still, the ad-hoc connections are valuable: they provide more
granular intuition than Filinski's presentation, and they are all entirely pure, while Filinski's
formulation relies on a single reference cell.

\section{The Continuation Monad} \label{sec:contmonad}

Up to this point we have viewed delimited continuations and monads in parallel, bouncing back and
forth between the abstractions and showing how they compare. But there is one final insight from
Wadler's paper that I skipped over, and it serves as a bridge between continuations and monads.

Towards the end of {\em Comprehending Monads}, Wadler presents the {\em continuation monad}.
Given a result type, $\rho$, the continuation monad is defined as:
\begin{center}
  \begin{tabular}{lll}
    $\textsf{data Cont}$~$\alpha$ & $=$ & $(\alpha \to \rho) \to \rho$ \\
    $\map{Cont}{f}{\overline{x}}$ & $=$ & $\lambdaE{\kappa}{\overline{x}~(\lambdaE{x}{\kappa~(f~x)})}$ \\
    $\unit{Cont}{x}$ & $=$ & $\lambdaE{\kappa}{\kappa~x}$ \\
    $\join{Cont}{\doverline{x}}$ & $=$ &
      $\lambdaE{\kappa}{\doverline{x}~(\lambdaE{\overline{x}}{\overline{x}~\kappa})}$
  \end{tabular}
\end{center}
$\textsf{Cont}~\alpha$ is a computation that take a continuation expecting an input $\alpha$ and
producing a result of a fixed type $\rho$. The \textsf{map} operation composes a function $f$
with the continuation, \textsf{unit} simply continues with a given value, and \textsf{join}
untangles nested continuations to enable chained computations.

If this seems complicated, it is, but it builds on the intuition we built up from {\em
Abstracting Control}. Programs written using the continuation monad are automatically in a
continuation-passing style. A sequence of operations
\[ [(x, y) \mid x \leftarrow \overline{x};\ y \leftarrow \overline{y}]^{\textsf{Cont}} \]
expands to
\[ \lambdaE{\kappa}{\overline{x}~(\lambdaE{x}{\overline{y}~(\lambdaE{y}{\kappa~(x,\, y)})})}, \]
which is precisely the way a tuple of expressions would be evaluated in CPS. The order of
evaluation is clear: first $\overline{x}$ is evaluated, then $\overline{y}$ is evaluated, and
finally the results are packed into a tuple.

As with many of the monads before, we can define auxiliary operations that interact in useful
ways with the monad operations. In particular, we can define our friends \textsf{shift} and
\textsf{reset}!
\begin{center}
  \begin{tabular}{lll}
    $\textsf{shift}$~$f$ & $=$ & $\lambdaE{\kappa}{f~(\lambdaE{x}{\lambdaE{\kappa'}{\kappa'~(\kappa~x)}})~(\lambdaE{x}{x})}$ \\
    $\textsf{reset}$~$\overline{x}$ & $=$ & $\lambdaE{\kappa}{\kappa~(\overline{x}~(\lambdaE{x}{x}))}$
  \end{tabular}
\end{center}
These definitions are based closely on the ECPS translation, but whereas in Section
\ref{sec:danvy} we built these operations into the language, now they are simply defined. This
means that there is no need to extend any meta-theory---we can interpret these operations in the
same lambda calculus that we worked with in Section \ref{sec:wadler}.

In particular, this means that we can use the \textsf{Cont} comprehension to write programs like
\[
  [x + y \mid x \leftarrow [1];\ y \leftarrow \textsf{reset}~[u + v \mid u \leftarrow [10];\ v \leftarrow \textsf{shift}(\lambdaE{k}{[b \mid a \leftarrow k~100;\ b \leftarrow k~a]})]],
\]
which is equivalent to the program
\[ 1 + \reset{(10 + \shift{k}{k~(k~100)})} \]
that we saw when discussing {\em Abstracting Control}. The monad version is a bit more verbose,
but, critically, neither version relies on manually managing continuation parameters; they are
both written in a direct style.

The continuation monad shows just how closely related continuations and monads are---CPS {\em is}
a monad! But we can actually take this one step further, using a monad embedding defined by
Wadler.

\subsection{Recovering ECPS}
Wadler presents a translation, $e^*$, that lifts a program of type $\alpha \to \beta$ to one
of type $\alpha \to \textsf{M}~\beta$ for some monad \textsf{M}. The resulting function takes a
value of type $\alpha$ and returns a {\em computation} of type $\beta$. The monad \textsf{M}
captures the effects of that computation. Here is the translation in full:
\begin{align*}
  x^* &= [x]^\textsf{M} \\
  (\lambdaE{x}{e})^* &= [\lambdaE{x}{e^*}]^\textsf{M} \\
  (e_1~e_2)^* &= [y \mid f \leftarrow e_1^*;\ x \leftarrow e_2^*;\ y \leftarrow f~x]^\textsf{M} \\
  (e_1,\, e_2)^* &= [(x, y) \mid x \leftarrow e_1^*;\ y \leftarrow e_2^*]^\textsf{M} \\
  (\kw{fst}~e)^* &= [\kw{fst}~x \mid x \leftarrow e^*]^\textsf{M}
\end{align*}
Each of these rules has a straightforward computational meaning. For example
\[ (e_1~e_2)^* = [y \mid f \leftarrow e_1^*;\ x \leftarrow e_2^*;\ y \leftarrow f~x]^\textsf{M} \]
says that in order to evaluate a (potentially effectful) application, we evaluate $e_1$ to $f$,
then evaluate $e_2$ to $x$, and then evaluate the application $(f~x)$, all the while keeping
track of any side effects that are produced.

If our source language had an impure effect, for example built-in operations \kw{get} and
\kw{put} that manipulate a reference cell, we could fix $\textsf{M} = \textsf{State}$ and add
translations
\begin{align*}
  \kw{get}^* &= [x \mid x \leftarrow \textsf{get}]^\textsf{State} \\
  (\kw{put}~e)^* &= [u \mid x \leftarrow e^*;\ u \leftarrow \textsf{put}~x]^\textsf{State}
\end{align*}
where \textsf{get} and \textsf{put} are defined as in Section \ref{sec:monad-examples}. The
result of the translation no longer requires references and is potentially simpler to reason
about.

We can apply this transformation to translate programs from Section \ref{sec:danvy} into programs
from Section \ref{sec:wadler}, using target monad $\textsf{M} = \textsf{Cont}$. We add rules to
translate ``$\shift{k}{e}$'' into ``$\textsf{shift}~(\lambdaE{k}{e})$'' and ``$\reset{e}$'' into
``$\textsf{reset}~e$''. Shockingly, we obtain exactly the ECPS translation presented by Danvy and
Filinski!
\begin{align*}
  &\phantom{=}\ \ \textbf{\textsf{Cont} Monad} & \textbf{ECPS} \\
  x^* &= [x]^\textsf{Cont} &= \lambdaE{\kappa}{\kappa~x} \\
  (\lambdaE{x}{e})^* &= [\lambdaE{x}{e^*}]^\textsf{Cont} &= \lambdaE{\kappa}{\kappa~(\lambdaE{x}{e^*})} \\
  (e_1~e_2)^* &= [y \mid f \leftarrow e_1^*;\ x \leftarrow e_2^*;\ y \leftarrow f~x]^\textsf{Cont} &= \lambdaE{\kappa}{e_1^*~(\lambdaE{f}{e_2^*~(\lambdaE{x}{f~x~\kappa})})} \\
  (e_1,\, e_2)^* &= [(x, y) \mid x \leftarrow e_1^*;\ y \leftarrow e_2^*]^\textsf{Cont} &= \lambdaE{\kappa}{e_1^*~(\lambdaE{x}{e_2^*~(\lambdaE{y}{\kappa (x,\, y)})})} \\
  (\kw{fst}~e)^* &= [\kw{fst}~x \mid x \leftarrow e^*]^\textsf{Cont} &= \lambdaE{\kappa}{e^*~(\lambdaE{x}{\kw{fst}~x)}} \\
  (\shift{k}{e})^* &= \textsf{shift}~(\lambdaE{k}{e^*}) &= (\lambdaE{\kappa}{e^*~\textsf{id}})[k \mapsto \lambdaE{x}{\lambdaE{\kappa'}{\kappa'~(\kappa~x)}}] \\
  \reset{e}^* &= \textsf{reset}~e^* &= \lambdaE{\kappa}{\kappa~(e^*~\textsf{id})}
\end{align*}

This is truly beautiful. It means that the continuation monad entirely captures the semantics of
a language with first-class continuation operators. Even though Danvy and Filinski give their
language no fewer than four semantic interpretations (including ECPS, a denotational semantics,
and two meta-circular interpreters) Wadler managed to find one more way of looking at
continuation semantics.

(Of course, Wadler did not present exactly this result. He observed that the call-by-name
translation results in the unusual call-by-name version of the CPS translation, but since \kw{shift}
and \kw{reset} did not exist, he did not complete the ECPS picture. I am sure that the full
ECPS--monad embedding is not novel, but I connected the dots independently.)

\section{Conclusion} \label{sec:conclusion}
No single paper could fully explain either delimited continuations or monads, much less the two
together. The goal of this report is simply to establish a foundation for further understanding.

For more reading on delimited continuations (and continuations in general) look for work that
deals with control flow. As previously mentioned, continuations were first used as meta-theoretic
tools, and Strachey and Wadsworth were among the earliest to use
continuations~\cite{strachey2000continuations}. Their goal was to give a semantics for
\texttt{GOTO} statements. Modern work on delimited continuations is extremely diverse, ranging
from automatic differentiation~\cite{wang2019demystifying} to logic
programming~\cite{schrijvers2013delimited}. Hands-on exploration of \kw{shift} and \kw{reset} can
be done in languages like Racket and Scheme.

Further reading on monads can be found throughout the functional programming literature.
Languages like Haskell---which uses monads to isolate all effectful code---rely on monads in even
the simplest programs, and all research on those languages implicitly involves monads. More
theoretical papers using monads often use them to embed effects in
languages~\cite{paykin2017linearity} or to define other more complex
abstractions~\cite{xia2019interaction}.

Finally, some modern ideas build on both delimited continuations and monads, most notably {\em
algebraic effects}~\cite{bauer2015programming}. Algebraic effects define {\em effect operators}
and {\em handlers}, which build on many of the ideas behind \kw{shift} and \kw{reset}. These
abstractions also isloate programmatic effects in a style very similar to that of monads, and
many think they can be replacement for monadic effects in some programming languages.

Delimited continuations and monads are both amazing ideas, and this report only scratches the
surface. Still, I hope it is a helpful resource for those who want to learn more about the field
of programming languages and need an introduction to some of its ingrained abstractions.

\bibliography{references}
\bibliographystyle{plainnat}

\end{document}