-
Notifications
You must be signed in to change notification settings - Fork 3
/
limitations.tex
368 lines (320 loc) · 16 KB
/
limitations.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
\chapter{Limitations and future directions}
\label{sec:limitations}
\section{Metatheory}
\label{sec:metatheory}
We have not rigorously done proofs on \Backpack{}'s metatheory; as such,
we can only conjecture that in the fragment of Haskell \emph{without type
classes and type families}, successful separate typechecking of
components implies successful linking as well. Conventionally,
soundness proofs for module systems are done by elaboration to an
appropriate internal language; however, it is difficult to say what an
internal language which supports all of GHC Haskell's features would
look like, and in any case, a proof about this elaboration would say
little about the actual typechecking algorithm that GHC
implements. \OldBackpack{}'s proof
of soundness gives evidence that \Backpack{} is sound, but we leave
an actual formalization to future work.
Note that the soundness of \emph{compiled} \Backpack{} code reduces to
the soundness of GHC Haskell, because we retypecheck and compile every
instantiation of a component. Thus, at the end of the day, compilation
of \Backpack{} code reduces to ordinary Haskell
compilation.\footnote{This is similar to the situation with Java
generics, where erasure at the JVM level means that an unsound type
checking algorithm for generics will only result in an invalid cast
exception at runtime; unlike Java, any errors in \Backpack{} are caught
at instantiation-time.} However, one should still ask tough questions
about the soundness of recursive modules, which is a feature in its own
right; and indeed, there is at least one known bug where \verb|hs-boot|
files can cause
unsoundness.\footnote{\url{https://ghc.haskell.org/trac/ghc/ticket/9562}}
\subsection{Type classes and type families}
Note that we stated that \Backpack{} is sound \emph{without} type classes
and type families. The soundness result we desire is not true for full
Haskell, and in fact, it cannot be made to be true as Haskell exists today.
Ordinary proofs of soundness rely on weakening: the ability of a signature
to hide information about its implementation.
Open type families~\cite{schrijvers+:typefamilies}
introduce axioms to Haskell which cannot be hidden by signatures:
\begin{lstlisting}
signature A where
f :: F Bool
module B where
import A
type instance F Bool = Int -> Bool
x = f 2
\end{lstlisting}
%
Now, should the following module be a permissible implementation of the
signature?
\begin{lstlisting}
module A where
type instance F Bool = Int
f :: F Bool
f = 42
\end{lstlisting}
%
Comparing only the signature and the module, it might seem that, yes,
the module is a permissible implementation, as it provides all of the
declarations required in the signature. But if we do allow it,
then \verb|B| is clearly type unsafe, attempting to use an integer
as a function.
Is this problem fixable? One ``fix'' is to require signatures to
specify every type family instance that an implementation could possibly provide:
but it should be clear that such signatures would contain many ``irrelevant''
instances to the parametrization at hand. Another fix is to change the rules for
permissible type family instances, so that instances defined in separate
modules are guaranteed not to conflict, no matter what. But this would
constitute a fundamental change to how open type families work in
Haskell and rule out existing Haskell code.
So, why isn't the lack of modularity with respect to type classes a
total disaster for \Backpack{}? Why won't Haskell programmers riot in
the streets when their programs typecheck separately, but fail to link
together? Based on our experiences writing programs with
\Backpack{}, there are two reasons:
\begin{itemize}
\item By in large, the Haskell ecosystem tries very hard to
\emph{not} have conflicting instances in the modules we define,
eschewing \emph{orphan instances} which are the primary mechanism
by which conflicts can arise.\footnote{A non-orphan instance is one that
is defined along side the type or class it corresponds to, and can
be thought of as an ``intrinsic'' property of the type/class it was
deifned with, making it difficult---though not impossible---to
create a conflict.}
Generally, instantiating a \Backpack{} package with a module will
not result in an instance conflict, in the same way that adding an
import to a module will not result in an instance conflict.
\item Assuming that there are no conflicting instances (a \emph{big}
assumption, but one that seems to be true in practice), the only thing
that a parametrized package must do is ensure that it is indifferent
to the addition of new instances (rather than assert that we can thin
them out.) In practice, these means following the same best practices
which prevent conflicts from occurring---e.g., avoiding orphan
instances:
\begin{lstlisting}
signature A where
data T
module B where
import A
instance Show T where ... -- bad!
\end{lstlisting}
%
This is not difficult to do!
\end{itemize}
%
Thus, while we cannot give \emph{technical} rules which ensure the modularity
of \Backpack{}, the \emph{social} rules of Haskell allow us to maintain
modularity and separate compilation.
One final word: there is an unrelated gotcha related to type refinement
and instance declarations:
\begin{lstlisting}
signature A where
data T
data S
module B where
import A
class K a
instance K T
instance K S
\end{lstlisting}
%
This could fail to typecheck if T and S refine to the same type. Arguably,
GHC should be able to tell when an overlap could occur, but it does not today.
To fix the code, you can either newtype T and S, or hoist K out of the
parametrized package.
% The upshot is that, without changing how open type families work, we
% must allow for the possibility that linking can fail, even if the
% components typechecked separately.
% Trouble: declaring a local QuickCheck instances
% First thrust:
% - Define appropriate orphanhoodness check sufficient to
% prevent conflicting instances from being defined
% - Trouble:
% signature A where
% module M where
% import A
% f = show True
% Then suppose that A is implemented with a module which has
% incoherent instances for Show Bool. Need to *eagerly*
% report conflict. But now you're in the same situation
% as applying the eagerness globally (effectively)
% - Suppose that "somehow conflicts don't happen" (this is
% the case for type families.) You can still get in trouble
% if instantiation causes incompatible universes to collide:
% signature A where
% module M where
% import Inst1
% -- ... use instance
% if A is filled with something using (incompatible) Inst2,
% once again we blow up when we instantiate
\section{Mutually recursive packages}
We have not described how to handle mutual recursion in \Backpack{},
although we believe that \Backpack{} can be extended to accommodate it,
based off of GHC's existing support for \verb|hs-boot| files.
Indeed, we showed in \cref{sec:recursive-uids} that recursive
\uid{}s can be handled in the same way as they were done in \OldBackpack{}.
One challenge that we face is how to handle the ``double vision
problem'' in the presence of type synonyms. \OldBackpack{} solved the
double vision problem for original names via shaping pass; however,
languages like MixML which permit transparent type synonyms also have
required a type pre-computation pass to to solve double-vision at the
type level.
Interestingly, GHC's type checking algorithm is already designed in
such a way that it can solve the double vision problem in the following
way: an abstract type is opaque except within the \emph{implementing
module} (and any module which imports this module) at which point one
sees the value of the type synonym.
\begin{lstlisting}[language=Haskell]
-- A.hs-boot
module A where
data T
f :: T -> T
-- B.hs
module B(f) where
import {-# SOURCE #-} A
-- A.hs
module A(T, f) where
import qualified B
type T = Int
f :: Int -> Int
f x = B.f x
\end{lstlisting}
%
When typechecking \verb|A.hs|, GHC resolves the type \verb|T| within
\verb|B.f| to be the type synonym \verb|type T = Int|, \emph{in the very
same module it is typechecking}. In a sense, GHC has ``solved'' the
double vision problem, though it's worth noting that systems like RMC
and MixML address a stronger version of the double vision problem which
is inapplicable to Haskell due to the lack of sealing (see
Appendix~\ref{sec:double-vision}). (In reality, GHC rejects \verb|A.hs|
because it does not allow abstract data to be implemented with type
synonyms, but in principle this restriction could be relaxed.)
\section{The signature language}
It is reasonably well understood how to reuse \emph{code}: functions and
values that need to be used multiple times are bundled into modules, and then
used by their downstream clients. Less clear is how one goes about
reusing \emph{signatures}.
In \Backpack{}, we offer some limited facilities for signature reuse.
The most basic, essential form of signature reuse in a package language
based on mix-ins is \emph{signature merging}; this functionality is
essential for composing libraries with requirements, since without it,
we would have to manually rewrite the signature every time we put two
such libraries together. In degenerate form, this allows us to place a
signature in a ``signature package'', to be reused by other packages
which depend on this package.
However, we have noticed that there are other operations on signatures
which end users might find useful:
\begin{enumerate}
\item Given an implementation of a library, we might like to \emph{infer}
the signature representing the API provided by this library,
without having to copy paste the signatures of the library into a
signature package of their own.
\item A signature package may provide more functions than a client
needs. In this case, the client would prefer to \emph{thin} the
functions from the signature package, rather than have to copy paste
the subset they need into a signature of their own (especially if
some of the types are quite large!)
\item There might be multiple implementations of a particular
interface; a client might be interested in taking the
\emph{intersection} of these signatures, using only functionality
that is available from all implementations.
\end{enumerate}
%
Each of these operations poses design questions which I do not have
satisfactory answers to. For example, what should the syntax for signature
thinning be? We might turn to the module export mechanism in Haskell today
for some ideas. For example, here is the syntax for reexporting all the
exports of \verb|B|, except \verb|f|:
\begin{lstlisting}
module A (module B) where
import B hiding (f)
\end{lstlisting}
%
But this cannot be straightforwardly be adapted to signatures, where the
``reexports'' of inherited signatures happens implicitly: there is no
import statement to reexport! Additionally, if a type is thinned from a
signature, we must transitively thin out any declarations which refer to
that type, and so forth. Should this thinning occur automatically, or
should the user be obligated to explicitly thin them out?
We think there is an opportunity to design a more sophisticated language
of signatures, and we leave this as future work for \Backpack{}.
\label{sec:relaxed-matching}
\section{Relaxing signature matching}
When the type signature \verb|[Char] -> Bool| is written in a signature,
the type of the implementing function must be exactly \verb|[Char] -> Bool|;
a more polymorphic function (e.g., \verb|[a] -> Bool|) will be rejected
by \Backpack{}.
In practice, this can be an annoying limitation; for example, \verb|null|
is a polymorphic function in the standard library that can be used
perfectly well on \verb|[Char]|, but it cannot be used directly; instead,
a user must first monomorphize the function to the correct type in the
implementing module.
How might we go about relaxing this restriction? One plausible rule is
that we should accept an implementation with type $\tau$ for a signature
type $\tau'$, so long as there exists a \emph{coercion} from $\tau$ to
$\tau'$ and augment our subtyping relation accordingly. However, this will
not work with our merging rules as stated in \cref{typing:merging}.
Presently, merging picks an arbitrary type to add to the context; the reasoning
is that if all the types were actually type equal, it doesn't matter which
one we pick. With subtyping on Haskell level type definitions, this no
longer holds: we must at least pick the most general type. Even then,
merging would be terribly incomplete without the ability to compute
greatest lower bounds of the types in question.
In short, merging signatures at the Haskell type signature level is difficult
when the source language was not designed with the subtyping lattice (grestest
lower bounds) in mind. Perhaps there may be more limited fixes which can
alleviate the annoyance described above (for example, only allow relaxed
matching when instantiating a signature).
\label{sec:sealing}
\section{Sealing}
In contrast to ML-like module systems, both \Backpack{} and \OldBackpack{} do not
support \emph{sealing}, the capability of taking a module and making
types opaque, so that their underlying representation is not visible
to clients of the module. If a client wants to treat a module abstractly,
their only choice is to depend on a signature (unfortunately, there is no
way to say, ``And also, this signature must be implemented with $M$.'')
An interesting feature which would help Haskell users independently of
\Backpack{} would be the ability to take a function like:
\begin{lstlisting}
f :: (Int -> Int) -> Int
\end{lstlisting}
%
and create a sealed \verb|f| with the following signature:
\begin{lstlisting}
newtype T = T Int
f :: (T -> T) -> T
\end{lstlisting}
%
implemented as a series of coercions on \verb|f|. This would help
eliminate a bit of newtype wrapping and unwrapping boilerplate
which is required when defining a \verb|newtype| of an existing type.
\label{sec:beyond-haskell}
\section{Beyond Haskell}
The defining characteristic of \Backpack{} is the abstraction barrier
between the package manager and the compiler; in particular, we have
stated that \emph{mixin linking}, which occurs in the package manager,
is indifferent to Haskell source code. This raises the question: what
do we \emph{really} need from a programming language to implement
a system like \Backpack{}?
The most instructive comparison you can make here is to apply the
Backpack design to ML, which already has a module system and is
thus very close to supporting a Backpack-like design. For example,
unit identifiers translate directly into functor applications in
ML's module system; furthermore, the applicative nature of
Backpack packages can be simulated in a generative module system
by having the package manager instantiate any given functor once.
Hoewever, there is one major feature that \Backpack{} relies
upon, which traditional ML module systems lack: signature merging.
While module-to-signature linking translates into functor application,
signature-to-signature linking requires some extra language support
which we think would be useful even in a traditional, non-mixin-linked
module language.
Moving beyond ML, it is clear that to support \Backpack{}, the host
language must some sort of functor application and type checking
against interfaces. These represent new additions to the programming
language (even if they do not come with any \emph{syntactic} changes),
but we have seen that, at least for Haskell, we were able to add these
features as a set of orthogonal extensions to the compiler,
without needing to make major changes to how GHC was fundamentally
structured. Thus, the application of the \Backpack{} design to
non-Haskell languages is a promising topic, albeit one not explored
in this thesis.