-
Notifications
You must be signed in to change notification settings - Fork 0
/
motivation.tex
361 lines (308 loc) · 16.5 KB
/
motivation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
\newcommand{\codett}[1]{{\small\tt #1}}
Our work focuses on the theories underlying two practical gradual/migratory
typing systems: Typed Racket~\cite{tf-popl-2008, tfffgksst-snapl-2017} and
Transient Reticulated Python~\cite{vss-popl-2017,v-thesis-2019}.
Each adds a type system to an existing untyped language, satisfies a
non-trivial type soundness theorem, and satisfies
graduality~\cite{svcb-snapl-2015,na-icfp-2018} for simply-typed programs.
Figure~\ref{fig:tr-example} displays a Typed Racket program that consists
of three modules.
The module on the left represents an untyped library.
The module in the middle is a typed adapter module; it imports identifiers from
the library, specifies their types, and immediately re-exports them.
Adapters make libraries accessible to typed code, but untyped clients such as
the module on the right can also use them.
The creators of such modules are likely to rely on the type specifications;
the types in this adaptor, however, are faulty.
The purpose of this example is to illustrate how Typed Racket protects untyped
code from incorrect type annotations (\sectionref{sec:protects}).
\Figureref{fig:retic-example} is a three-module Reticulated program that
uses types in the same manner as the example in \figureref{fig:tr-example}.
The purpose of this example is to show how a type-sound first-order approach to runtime
checking can yield misleading error messages even when the types are correct
(\sectionref{sub:blame}).
Complete monitoring explains the difference between the two approaches.
Based on this insight, \sectionref{sub:results}
surveys our contributions and
\sectionref{sub:semantics-preview} motivates the Amnesic semantics.
%% -----------------------------------------------------------------------------
\subsection{What Protects the Untyped Code} \label{sec:protects}
\Figref{fig:tr-example} represents a highly simplified real-world
scenario. The module on the left, \codett{net/url}, is an excerpt from an
untyped library that has been part of Racket for two
decades.
The typed module in the middle defines type annotations for the untyped library.
Lastly, the module on the right represents the untyped prototype of a \codett{client}
application. It imports the typed library, possibly because the developer
% prefers type specifications over informal documentation or
intends to switch to Typed Racket eventually.
%% -----------------------------------------------------------------------------
\def\fname#1#2{$\underline{\hbox to #2in{#1}}$\\[-2ex]}
\begin{figure}[htb]\footnotesize
\vspace{-1mm}
\begin{minipage}[t]{1.90in}
\begin{alltt}\fname{net/url}{1.8}
#lang racket
_ _ _ 600 lines of code _ _ _
_ _ _ plus dependencies _ _ _
(define (call/input-url url c h)
;; connect to the url via c
;; process the data via h
_ _ _)
\end{alltt}
\end{minipage}\begin{minipage}[t]{1.90in}
\begin{alltt}\fname{typed/net/url}{1.7}
#lang typed/racket
(define-type URL _ _ _)
(require/typed/provide
;; from this library
net/url
;; import the following:
[string->url
(-> String URL)]
[call/input-url
(\(\forall\) (A)
(-> URL
(-> String Input-Port)
(-> Input-Port A)
A))])
\end{alltt}\end{minipage}\begin{minipage}[t]{1.65in}
\begin{alltt}\fname{client}{1.65}
#lang racket
(require html typed/net/url)
;; connect to url, read html
(define (main)
(call/input-url
URL
(\(\lambda\) (str) _ _ _)
read-html))
;; constants:
(define URL-str
"https://sr.ht")
(define URL
(string->url URL-str))
\end{alltt}\end{minipage}
\Description[Three Typed Racket code snippets.]{Three code snippets: an untyped library, a typed API, and an
untyped client. The client supplies a function to the API according to the
types, but the types are incorrect. The question is what happens at run
time.}
\caption{Using Typed Racket to define an API} \label{fig:tr-example}
\vspace{-1mm}
\end{figure}
%% -----------------------------------------------------------------------------
The \codett{main} function on the right calls the typed function to open a
network connection and read HTML. Semantically, the function
\codett{call/input-url} flows from \codett{net/url} to the typed module and then to
\codett{client}; the call itself sends \codett{main}'s arguments to the untyped
library code via the typed module. The application of \codett{call/input-url} clearly
relies on the type specification from \codett{typed/net/url}: the first
argument is a URL structure, the second a function that accepts a string,
and the third a function that maps an input port to an HTML
representation. The type declaration in \figref{fig:tr-example} is buggy,
however; the first callback of \codett{call/input-url} demands a URL, not a string.
Fortunately for the developer, Typed Racket compiles types to contracts and
thereby catches the mismatch. Here, the compilation of
\codett{typed/net/url} generates a higher-order function contract for
\codett{call/input-url}. The generated contract ensures that the untyped client
provides three type-matching argument values and that the library applies
the callback to a string. When the \codett{net/url} library eventually applies
the callback function to a URL structure, the function contract for the
callback discovers the mismatch and blames the boundary between \codett{client}
and \codett{typed/net/url}. The blame message says that \codett{net/url} broke the
contract on the back-channel from it to \codett{client}, but warns the
developer on the last line with ``assuming the contract is correct.''
A quick look confirms that the contract---that is, the type
from which the contract is derived---is wrong. Typed Racket is unusual in this regard;
other mixed-typed languages assume that ``well-typed programs can't be
blamed''~\cite{wf-esop-2009}.
One such language is Transient Reticulated.
Its compiler inserts runtime checks to protect typed code against ill-shaped
values from untyped code.
Technically, these checks compare the constructor of a value to the constructor
of the expected type at every typed elimination form.
Although this strategy guarantees that typed operations always receive inputs
within their domains, it may not discover when untyped code and type
annotations clash.
%% -----------------------------------------------------------------------------
\subsection{How Precise Can Blame Be When Gradual Typing Fails to Monitor All Channels}
\label{sub:blame}
% - ability to run code with un-checked types is key to GT usefulness,
% but when something goes wrong need to audit these assumptions
% - TR code, bug here, error later, error points to boundary. Good.
% - Retic code, bug here, error much different does not report boundary,
% again no better after adding types
% - turning blame on does not actually help but in principle would report
% all paths ... no filter for spurious paths
Figure~\ref{fig:retic-example} presents an arrangement of three Transient
Reticulated modules, similar to the code in figure~\ref{fig:tr-example}.
The module on the left exports a function that retrieves data from a URL; this
function accepts several optional and keyword arguments. The typed adapter
module in the middle formulates types for one valid use of the function; a
client may supply a URL as a string and a timeout as a pair of floats.
% pair = (connect timeout, read timeout)
These types are correct.
The rest of this subsection explains via the code on the right what happens
when a client module supplies a tuple that contains an integer and a string.
\begin{figure}[htb]\footnotesize
\vspace{-1mm}
\begin{minipage}[t]{1.8in}
\begin{alltt}\fname{requests}{1.7}
_ _ _ +2,000 lines of code _ _
def get(url, *args, **kwargs):
# Sends a GET request.
_ _ _
\end{alltt}
\end{minipage}\begin{minipage}[t]{1.8in}
\begin{alltt}\fname{typed_requests}{1.7}
import requests as r
def get(url:Str,
to:Tuple(float, float)):
return r.get(url, to)
\end{alltt}\end{minipage}\begin{minipage}[t]{1.8in}
\begin{alltt}\fname{client}{1.7}
from typed_requests import get
wait_times = (2, "zero")
get("https://sr.ht",
wait_times)
\end{alltt}\end{minipage}
\Description[Three Reticulated code snippets.]{Three code snippets: an untyped library, a typed API, and an
untyped client. This time the types are correct, but the untyped client sends
an incorrect value across. Type soundness does not require the typed API to
detect the problem, but if the types do not find the bug, what will?}
\caption{Using Reticulated to define an API} \label{fig:retic-example}
\vspace{-1mm}
\end{figure}
Reticulated's runtime checks ensure that the typed \codett{get} receives a
string and a tuple, but do {\em not\/} validate the tuple's contents. Next,
these same arguments pass to the untyped \codett{get} function in the
\codett{requests} module. When the untyped \codett{get} eventually uses the string
\codett{"zero"} as a float, Python's runtime system raises an exception that
originates from the \codett{requests} module.
In this example, the programmer is lucky. The call to the typed version of
\codett{get} is still visible in the stack trace because Python fails to
implement tail calls properly. The presence of typed \codett{get} provides a hint
that it might be at fault. If the maintainers of Python ever changed their mind
about tail calls, this hint would disappear. Even worse, if the code implemented the
channel from \codett{client} to \codett{requests} via \codett{typed\_requests} with OO
callbacks instead of pairs, the intermediate typed call may not be on the
stack no matter how the compiler handles tail calls.
In sum, types in Transient Reticulated
do not monitor all channels of communication among modules.
Consequently, errors within one module might be due to false type assumptions.
Reticulated attempts to address this problem with a global map
from heap addresses to type obligations.
Based on the complete-monitor analysis framework, the technical part of the
paper demonstrates, however, that even this map may provide misleading blame
assignments.
%% -----------------------------------------------------------------------------
\subsection{Informal Overview of Results} \label{sub:results}
Our {\bf first contribution} is the introduction of {\em complete monitors for
gradual types\/}. The starting point is the notion of complete monitoring for
contract systems. Roughly speaking, a contract system is complete if (1) it is
possible to attach a contract to every channel of communication between
program components and (2) the discovery of a contract violation points to a
mismatch between the obligations on a channel between components and a value
that passes through this channel. A mixed-typed language satisfies complete
monitoring if the translation of types into runtime checks monitors
every channel of communication between components.
The first-order approach of Transient Reticulated fails to protect all
channels. As the example in the preceding section illustrates, it does not
protect channels of communication through a pair.
A precise statement of complete monitoring depends on a notion of
\emph{ownership\/} of a value by a component.
The notion of ownership in this paper is standard~\cite{dfff-popl-2011,dtf-esop-2012}.
Technically, a semantics (of a mixed-typed language) satisfies complete monitoring
if every reduction step yields a \emph{consistent}\/ ownership assignment in
which every expression has a unique owner.
When a value traverses
a communication channel, the obligations are checked. If all of them can be
checked off, ownership is transferred; otherwise the receiving component must
share ownership with the sending one.
Only a wrapping semantics, such as Typed Racket's, implements
the ownership consistency requirement of complete monitoring.
Our {\bf second contribution} is a technique to assess the quality of blame
assignments in systems that fail complete monitoring.
If a value may have multiple owners and a blame assignment may
point to multiple components, there are two possibilities:
\begin{itemize}
\item A mixed-typed language's blame system is {\em sound\/} if all
reported blame labels are a {\em subset\/} of the ownership labels of
the witness value.
\item It is {\em complete\/} if all reported blame labels are a {\em superset\/}
of the ownership labels of this value.
\end{itemize}
Our {\bf third contribution} is a semantics that satisfies
the same type soundness property as Typed Racket but checks the same first-order
properties as Transient.
This proof-of-concept semantics is dubbed \Aname{}; it is inspired by
forgetful and heedful contracts~\cite{g-popl-2015}.
In sum, our formal results justify the entries in \tableref{tbl:informal-results}. The
rows align with the four identified
properties: type soundness, complete monitoring, blame soundness, and blame
completeness.
\begin{wraptable}{r}{1.7in}\footnotesize
\vspace{-10pt}
\caption{\footnotesize Informal summary} \label{tbl:informal-results}
\vspace{-10pt}
{\newcommand{\Y}{\bigcheckmark}
\newcommand{\N}{\scalebox{0.8}{\bigxmark}}
\begin{tabular}{l@{ }|@{\hspace{1.5mm}}c@{\hspace{1.5mm}}c@{\hspace{1.5mm}}c@{\hspace{3mm}}c}
& $\hsfsym$ & $\tsfsym$ & $\asfsym$ & $\esfsym$ \\\hline
type sound & $\sidproj$ & $\stagproj$ & $\sidproj$ & $\sdynproj$ \\
complete monitor & \Y & \N & \N & \N \\
sound blame & \Y & \N & \Y & \N \\
complete blame & \Y & \N & \Y & \N \\
\end{tabular}}
\vspace{-10pt}
\end{wraptable}%
The columns represent four semantics:
$\hsfsym$\/ (\Nname{}) for the Typed Racket approach;
$\tsfsym$\/ for Transient Reticulated;
$\asfsym$\/ for Amnesic; and
$\esfsym$\/ for Erasure, the semantics of optional type systems.
As the first row shows, \Nname{} and \Aname{} satisfy the {\em same\/} type
soundness property, while the properties for \Tname{} and Erasure differ
from those and each other. (The notation is explained in
\secref{sec:soundness}; suffice it to say that $\sidproj$ promises the
strongest guarantees for types and $\sdynproj$ the weakest.)
Otherwise, only Typed Racket satisfies all properties while Amnesic is
blame-sound and blame-complete.
%% -----------------------------------------------------------------------------
\subsection{Informal Overview of the \Aname{} Semantics} \label{sub:semantics-preview}
Our three semantics are based on different strategies for mixing typed and
untyped program components at runtime.
% Their differences arise from the
% constraints that strong guarantees place on an implementation.
The \Nname{} semantics strictly enforces the boundaries between typed and
untyped code. A value may cross a boundary if all properties can be checked.
If not, a \emph{monitor wrapper}\/ provides controlled access to the value.
A client may send input to the monitor, which: checks the input,
forwards input to the value, and checks the result before returning to the client.
This well-known
wrapping strategy thus guarantees the required ownership consistency guarantee.
The \Tname{} semantics enforces the outermost constructor of types with
shallow \emph{tag}\/ checks in statically-typed code. These checks guard every
untyped-to-typed boundary and every typed elimination form (function
application, pair projection). If a check succeeds, \Tname{} records the
fact in a global blame map from heap addresses to sets of boundaries. If a
later check fails, the map provides some information about what may have
caused the failure.
The contrast between \Nname{} and \Tname{} is striking. While the former may
wrap a value in an unbounded number of proxy layers,\footnote{Proxies may be
encoded to save space~\cite{stw-pldi-2015,htf-hosc-2010,g-popl-2015}.}
the latter uses no
wrappers and still protects typed code. The only significant drawback to
\Tname{} appears to be the imprecise blame map.
The \Aname{} semantics is a compromise between these two extremes. \Aname{}
performs the same tag checks as \Tname{}. Instead of a global map, though, it
attaches blame metadata to values. In our model, a \emph{trace} wrapper records
the boundaries that a value has previously crossed.
If an untyped function enters a typed component, \Aname{} wraps the
function in a monitor. If the function travels back to untyped code, \Aname{} replaces the
monitor with a trace wrapper that records two boundaries.
Future round-trips extend the trace.
Conversely, a typed function that flows to untyped code and back $N{+}1$ times gets
three wrappers: an outer monitor to protect its
current typed client, a middle trace to record its last $N$ trips, and an
inner monitor to protect its body.
Thus \Aname{} limits the depth of wrappers and tracks relevant blame information.