-
Notifications
You must be signed in to change notification settings - Fork 0
/
2019-03-03-the-core-language.lhs
294 lines (244 loc) · 10 KB
/
2019-03-03-the-core-language.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
---
title: "Implementing A Functional Language Part I: The Core Language"
publish: true
---
Implementing a Functional Language I: The Core Language
=======================================================
This is part 1 of a series in implementing a functional language. The introduction is
[here][intro]. This is a literate Haskell file - you can download the source
[here](https://github.com/hmac/hmac.github.io/blob/src/posts/2019-03-03-the-core-language.lhs).
To load it into GHCi and play around, you can use the following command:
```
stack --resolver lts-12.2 \
ghci --package prettyprinter \
--package text \
2019-03-03-the-core-language.lhs
```
---
> {-# LANGUAGE OverloadedStrings #-}
> module Core.Language where
> import Data.Text (Text)
The Core Language
=================
Let's look at the language we're going to compile. It should look familiar if you've used
Haskell or another ML-like language before.
This is a simple Core program that evaluates to 6:
```
main = addTwo 4 ;
addTwo n = n + 2
```
This is a more complex program showcasing all of Core's features:
```
main = let f = g Pack{1,0}
in f 4 ;
-- Lines starting like this are comments
g a b = case a of
<1> -> b + 1 ;
<2> -> b + 2
```
We'll briefly describe each in turn.
Expressions and application
---------------------------
All valid constructs are expressions - they evaluate to a result. Functions are applied to
arguments via juxtaposition: `f 1 2`. A Core program consists of a series of
_definitions_. Definitions are delineated by a semicolon `;`. Semicolons are also used
where necessary in other constructs. You'll see examples of this as we go on.
Global definitions
------------------
Functions can be defined at the global level, like `g` in the example above. The syntax is the
same as any ML-like language.
```
functionName arg1 ... argN = functionBody
```
where `arg1 ... argN` are bound in `functionBody`.
Values can be defined similarly, resembling nullary functions.
```
five = 5 ;
seven = five + 2
```
A valid Core program must have a nullary top level definition called `main`. This is the
entrypoint of the program.
Local definitions
-----------------
`let` statements allow defining values in a local scope.
```
f = let a = 4 ;
double x = x + x
in double a
```
`letrec` statements are similar, but allow recursive definitions. We make the distinction
between the two because non-recursive `let`s are often easier to implement and we can take
an approach that leads to more performant code than the recursive case[^1].
Data Types
----------
In Haskell, a data type might be defined as follows:
```
data Colour = Red | Green | Blue
data Coord = MkCoord Int Int
data Tree a
= Leaf a
| Branch (Tree a)
(Tree a)
```
The first case defines a type with three distinct nullary constructors. The second
case defines a type with a single constructor taking two arguments. The third case
combines these two, and is also parameterised over an abstract type `a`.
In Core we assume that typechecking has already happened, and so we don't need to
distinguish between constructors of different types. The constructor names are also not
important[^2]. Core doesn't support polymorphism, so type parameters don't need to be
considered. As such, we can simplify data type declaration. In Core, all type constructors
take the form `Pack{t,n}` where `t` is the _tag_ of the constructor and `n` is the _arity_
of the constructor. The tag is an integer starting at 1 and distinguishes between
constructors of the same type. The arity of a constructor is the number of arguments it
takes. For example,
```
data Colour = Red | Green | Blue
```
can be represented as
```
Red = Pack{1,0} ;
Green = Pack{2,0} ;
Blue = Pack{3,0}
```
Similarly,
```
data Tree
= Leaf Int
| Branch Tree Tree
```
translates to
```
Leaf = Pack{1,1} ;
Branch = Pack{2,2}
```
Case expressions
----------------
We construct data with `Pack`, and deconstruct it with `case`. This is the only way to
inspect data in Core, and subsumes all forms of pattern matching in the higher level
language (e.g. multiple function definitions)[^3]. Case expressions look like this:
```
isLeaf tree = case tree of
<1> v -> True ;
<2> t1 t2 -> False
```
A `case` expression consists of a _scrutinee_ (`tree` in the example) followed by one or more
_alternatives_ separated by a semicolon. Alternatives take the form
```
<t> x1 ... xn -> e
```
where `t` is the tag of the constructor and `x1 ... xn` are the arguments to the
constructor. The number of arguments must match the arity of the constructor. If the tag
of the constructor on the scrutinee matches the tag on the alternative, the whole case
expression evaluates to `e`. Every constructor must have a matching alternative - the
behaviour when a scrutinee's tag has no corresponding alternative is not defined (our
program will probably just crash).
Built-in primitives
-------------------
Core has built in support for integers, and a small set of arithmetic and boolean
operations defined as binary functions. The representation of Booleans themselves will
vary between implementations - often we will map them to the integers 0 and 1.
Operation Symbol
---------- -------
Addition `+`
Subtraction `-`
Multiplication `*`
Division `/`
Equality `==`
Greater than `>`
Greater than or equal to `>=`
Less than `<`
Less than or equal to `<=`
And `&`
Or `|`
Integer negation is performed via the primitive function `negate`.
Lambda abstractions
-------------------
Notably absent from Core is _lambda abstraction_, or anonymous functions. Functions are
always supercombinators. This is an explicit choice to simplify the implementation, and we
do not lose any expressive power as a result. Lambda abstractions can be mechanically
removed from a program through a process known as _lambda lifting_. We will cover this in
more detail in later section.
Modelling the language
======================
Each implementation of the compiler will take as input a representation of a Core program,
so we need to define what that is. Everything else will be built around this type.
> data Expr a
> = EVar Name -- variables
> | ENum Int -- numbers
> | EConstr -- constructor
> Int -- tag
> Int -- arity
> | EAp (Expr a) -- applications
> (Expr a)
> | ELet -- let(rec) expressions
> Recursive -- recursive (letrec) or nonrecursive (let)
> [(a, Expr a)] -- definitions
> (Expr a) -- body
> | ECase -- case expressions
> (Expr a) -- expression to scrutinise
> [Alter a] -- alternatives
> deriving (Show)
>
> data Recursive = Recursive | NonRecursive
> deriving (Show)
>
> type Name = Text
`Expr` describes an expression in the Core language. It is parameterised over the type of
its _binders_ - a binder is a name given to a variable on the left hand side of a
`let(rec)` expression or function definition. This will allow us to model variable binding
more sophisticatedly in later parts without changing the definition of `Expr`. For now we
can make do with simple binders, so we define a type synonym for convenience.
> type CoreExpr = Expr Name
Function application is modelled by the `EAp` constructor. Applications of more than one
argument are transformed into nested `EAp` nodes.
```
f 1 2
-- becomes
EAp (EAp (EVar "f") (ENum 1)) (ENum 2)
```
Case expressions consist of a scrutinee and a list of alternatives, modelled by `Alter`,
which is a tuple of the constructor tag, the constructor arguments and the result
expression.
> type Alter a = (Int, [a], Expr a)
> -- 0 1 2
>
> -- case s of
> -- [0] [1] [2]
> -- <1> x y -> e
A _supercombinator_ is a function with no free variables - in Core all global definitions
are supercombinators.
> type ScDefn a = (Name, [a], Expr a)
> type CoreScDefn = ScDefn Name
A Core program is a collection of supercombinator definitions, one of which is called
"main".
> type Program a = [ScDefn a]
> type CoreProgram = Program Name
We can now define a complete Core program, as an example.
> -- main = double 21 ;
> -- double x = x + x
> exampleProgram :: CoreProgram
> exampleProgram
> = [
> ("main", [], EAp (EVar "double") (ENum 21))
> , ("double", ["x"], EAp (EAp (EVar "+") (EVar "x")) (EVar "x"))
> ]
Parsing and Pretty Printing
===========================
We want to be able to parse and pretty print Core, but this is not a parsing or pretty
printing tutorial. Instead of going through this in detail, we will rely on the
[`Parse`][parse] and [`Print`][print] modules. Each of these are defined in literate
Haskell files and you can read them if you want to. However I'd recommend seeking out a
proper introduction to these topics if you're not familiar with them. If I come across or
recall any good ones I may link to them here for reference. The [book][book] also has an
extensive section on each of these topics, though the style is somewhat outdated.
[^1]: The higher-level source language might have a single unified `let` construct, with
an analysis phase to determine which cases must be converted to `letrec` in Core.
[^2]: Except for providing useful error messages. Since you'd have to build this support
into the compiler for your higher level language, we don't consider it here.
[^3]: Compiling more expressive pattern matching into case expressions is a well studied
problem and there are established algorithms for doing so. For more information on this,
see Chapter 5 of [The Implementation of Functional Programming Languages][book].
[intro]: 2019-03-02-implementing-a-functional-language.html
[book]: https://www.microsoft.com/en-us/research/publication/the-implementation-of-functional-programming-languages/
[parse]: 2019-03-03-parsing-core.html
[print]: 2019-03-03-printing-core.html