-
Notifications
You must be signed in to change notification settings - Fork 11
/
gm.src
3775 lines (3184 loc) · 145 KB
/
gm.src
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% $Date: 92/01/13 16:22:55 $
% $Revision: 1.21 $
% (c) 1991 Simon Peyton Jones & David Lester.
% Note: some definitions for functions have been indented diffrently
% from the original file, because otherwise gofer would reject them.
H3-7> {-# LANGUAGE NPlusKPatterns #-}
H1> module Gm1 where
H2> module Gm2 where
H3> module Gm3 where
H4> module Gm4 where
H5> module Gm5 where
H6> module Gm6 where
H7> module Gm7 where
H> import Language
H> import Utils
\chapter{The G-machine\index{G-machine}}
\label{sect:g-machine}
In this chapter we introduce our first compiler-based implementation,
the G-machine, which was developed at the Chalmers
Institute of Technology, G\"oteborg, Sweden, by Augustsson and
Johnsson. The material in this chapter is based on their series of
papers \cite{Augustsson:LFP84,Johnsson:CC84} culminating in their Ph.D.
theses \cite{Augustsson:thesis87,Johnsson:thesis87}.
\section{Introduction to the G-machine}
The fundamental operation of the template instantiation machine was to
construct an instance of a supercombinator body, implemented by the
@instantiate@ function. This is a rather slow operation, because
@instantiate@ must recursively traverse the template {\em each time an
instantiation is performed}. When we think of the machine instructions
that are executed by @instantiate@, we see that they will be of two
kinds: those concerned with traversing the template, and those
concerned with actually constructing the instance.
The `Big Idea' of the G-machine\index{G-machine!the Big Idea}, and other
compiled implementations, is this:
\begin{important}
Before running the program, translate each supercombinator body to a
sequence of instructions which, when executed, will construct an
instance of the supercombinator body.
\end{important}
Executing this code should be faster than calling an instantiation
function, because all the instructions are concerned with constructing
the instance. There are no instructions required to traverse the
template, because all that has been done during the translation
process. Running a program is thereby split into two stages. In the
first stage a compiler is used to produce some intermediate form of the
program; this is referred to as {\em compile-time}\index{compile-time}.
In the second stage the intermediate form is executed; this is called
{\em run-time}\index{run-time}.
Since all we ever do to a supercombinator is to instantiate it, we can
discard the original supercombinators once the translation is done,
keeping only the compiled code.
In principle, then, we use a {\em G-machine compiler\/}\index{G-machine
compiler} to turn a program in our source language into a sequence of
{\em machine language instructions}\index{machine language
instructions}. Because we may wish to implement our language on many
different pieces of hardware (68000 based, or VAX, etc.) it is
useful to have an {\em abstract machine}\index{abstract machine}. A
good abstract machine has two properties: firstly, it can be easily
translated into any concrete machine code (for example 68000
assembler); secondly, it is easy to generate the abstract machine code
from the source.
Notice that we are faced with a trade-off here. We can ideally satisfy
the first property (easy concrete code generation) by making the
abstract machine the {\em same\/} as the real machine. But this makes
the second property much harder to fulfil. An abstract machine is
therefore a stepping-stone between the source language and a
particular machine code.
\subsection{An example}
Here is a small example of the G-machine compiler in
action\index{Example execution!of G-machine}. Consider the function
\begin{verbatim}
f g x = K (g x)
\end{verbatim}
This would be compiled to the sequence of G-code instructions:
\begin{verbatim}
Push 1
Push 1
Mkap
Pushglobal K
Mkap
Slide 3
Unwind
\end{verbatim}
In Figure~\ref{gm:fg:1example}, we show how this code will execute. On
the left-hand side of each diagram is the stack, which grows
downwards. The remainder of each diagram is the heap. The application
nodes are represented by an @@ character, expressions are labelled
with lower-case letters, and supercombinators are labelled with
upper-case letters.
%\begin{figure}[htbp]
\begin{figure} % \raggedright
\input{gm_ex}
\vspace{0.25in}
\input{gm_exa}
\caption{Execution of code for the @f@ supercombinator}\label{gm:fg:1example}
\end{figure}
In Figure~\ref{gm:fg:1example}, diagram (a), we see the state
of the machine before executing the sequence of instructions for @f@.
The spine has been unwound, just as it was in the template machine.
The top two items on the stack are pointers to the application nodes,
whose right-hand parts are the expressions to be bound for @g@ and
@x@.
The @Push@ instruction uses addressing relative to the top of the
stack. Ignoring the pointer to the supercombinator node @f@, the first
stack item is numbered @0@, the next is numbered @1@ and so on. The
next diagram (b) shows the changed stack, after executing a @Push 1@
instruction. This pushes a pointer to the expression @x@ onto the
stack, @x@ being two stack items down the stack. After another @Push 1@
we have a pointer to @g@ on top of the stack; again this is two
stack items down the stack, because the previous instruction pushed a
new pointer onto the stack. The new diagram is (c).
Diagram (d) shows what happens when a @Mkap@ instruction is
executed. It takes two pointers from the stack and makes an
application node from them; leaving a pointer to the result on the
stack. In diagram (e) we execute a @Pushglobal K@ instruction,
with the effect of pushing a pointer to the @K@ supercombinator.
Another @Mkap@ instruction completes the instantiation of the body of
@f@, as shown in diagram (f).
We can now replace the original expression, @f g x@, with the newly
instantiated body: @K (g x)@. In the first version of the G-machine
-- which is not lazy -- we simply slide the body down three places on the
stack, discarding the three pointers that were there. This is achieved
by using a @Slide 3@ instruction, as shown in diagram (g). The final
@Unwind@ instruction
will cause the machine to continue to evaluate.
This concludes a brief overview of the execution of the G-machine.
\subsection{Further optimisations}
A modest performance gain can be achieved by eliminating the
interpretive overhead of traversing the template\index{interpretive
overhead!template traversal}, as we have discussed. However, it turns
out that compilation also opens the door to a whole host of short-cuts
and optimisations which are simply not available to the
template instantiation machine. For example, consider the following
definition:
\begin{verbatim}
f x = x + x
\end{verbatim}
The template machine would evaluate @x@ twice; on the second occasion
it would of course find that it was already evaluated. A compiled
implementation can spot at compile-time that @x@ will already be
evaluated, and omit the evaluation step.
\section{Code sequences for building templates}
\label{background-gm}
We recall that the template instantiator operates in the following
way:
\begin{itemize}
\item The machine has {\em terminated\/}\index{termination condition!of
G-machine} when the single item on top of the stack is a pointer to an
integer.
\item If this is not the case then we {\em unwind\/}\index{unwind!in
G-machine} any application nodes we come across until we reach a
supercombinator node. We then {\em instantiate\/}\index{instantiation!in
G-machine} a copy of the supercombinator body, making substitutions
for its arguments.
\end{itemize}
At the heart of the Mark~1 template machine are the two functions
@scStep@~and~@instantiate@, which are defined on pages
\pageref{page:sc-step}~and~\pageref{page:instantiate}. If we take a
look at the definitions of @scStep@ and @instantiate@, we can give the
following description to the operation of instantiating a
supercombinator:
\begin{enumerate}
\item Construct a {\em local environment\/}\index{local environment!of
G-machine} of variable names to addresses in the heap.
\item Using this local environment, make an instance of the
supercombinator body in the heap. Variables are not copied; instead
the corresponding address is used.
\item Remove the pointers to the application nodes and the
supercombinator node from the stack.
\item Push the address of the newly created instance of the
supercombinator onto the stack.
\end{enumerate}
In the template instantiator, making an instance of a supercombinator
involves traversing the tree structure of the expression which is the
body of the supercombinator. Because expressions are defined
recursively, the tree-traversal function @instantiate@ is defined
recursively. For example, look at the definition of @instantiate@ -- on
page~\pageref{page:instantiate} -- for the case of @EAp@~@e1@~@e2@. First
we call @instantiate@ for @e1@ and then for @e2@, holding on to the
addresses of the graph for each sub-expression. Finally we combine
the two addresses by building an application node in the graph.
We would like to compile a {\em linear sequence of
instructions\/}\index{linear sequence of instructions} to perform the
operation of instantiating an expression.
\subsection{Postfix evaluation of arithmetic\index{postfix
evalution!of arithmetic expressions}}
The desire to construct a linear sequence of instructions to
instantiate an expression is reminiscent of the postfix evaluation of
arithmetic expressions. We explore this analogy further before
returning to the G-machine.
The language of arithmetic expressions consists of: numbers, addition
and multiplication. We can represent this language as the type
@aExpr@.
M0> aExpr ::= Num num | Plus aExpr aExpr | Mult aExpr aExpr
GH0> data AExpr = Num Int
GH0> | Plus AExpr AExpr
GH0> | Mult AExpr AExpr
It is intended that the language should have an `obvious' meaning;
we can give this using the function @aInterpret@.
M0> aInterpret :: aExpr -> num
GH0> aInterpret :: AExpr -> Int
0> aInterpret (Num n) = n
0> aInterpret (Plus e1 e2) = aInterpret e1 + aInterpret e2
0> aInterpret (Mult e1 e2) = aInterpret e1 * aInterpret e2
Alternatively, we can compile the expression into a postfix sequence
of operators (or instructions). To evaluate the expression we use the
compiled operators and a stack of values. For example, the arithmetic
expression $2+3\times 4$ would be represented as the sequence
\[
[@INum@~2,\,@INum@~3,\,@INum@~4,\,@IMult@,\,@IPlus@]
\]
We can give the instructions for our postfix machine as the type
@aInstruction@.
M0> aInstruction ::= INum num | IPlus | IMult
GH0> data AInstruction = INum Int
GH0> | IPlus
GH0> | IMult
\par
The state of the evaluator is a pair, which is a sequence of operators
and a stack of numbers. The meaning of a code sequence is then given
in the following transition rules.
\aerule{\aestate{[]}{[n]}}{\aestate{{ }}{n}}
\aerule{\aestate{@INum@~n:i}{ns}}{\aestate{i}{n:ns}}
\aerule{\aestate{@IPlus@:i}{n_0:n_1:ns}}{\aestate{i}{(n_1+n_0):ns}}
\aerule{\aestate{@IMult@~:i}{n_0:n_1:ns}}{\aestate{i}{(n_1\times n_0):ns}}
Translating these transition rules into Miranda gives:
M0> aEval :: ([aInstruction], [num]) -> num
GH0> aEval :: ([AInstruction], [Int]) -> Int
0> aEval ([], [n]) = n
0> aEval (INum n:is, s) = aEval (is, n: s)
0> aEval (IPlus: is, n0:n1:s) = aEval (is, n1+n0:s)
0> aEval (IMult: is, n0:n1:s) = aEval (is, n1*n0:s)
\par
To generate the sequence of postfix code for an expression we must
define a compiler. This takes an expression and delivers a sequence of
instructions, which when executed will compute the value of the
expression.
M0> aCompile :: aExpr -> [aInstruction]
GH0> aCompile :: AExpr -> [AInstruction]
0> aCompile (Num n) = [INum n]
0> aCompile (Plus e1 e2) = aCompile e1 ++ aCompile e2 ++ [IPlus]
0> aCompile (Mult e1 e2) = aCompile e1 ++ aCompile e2 ++ [IMult]
The key idea from this is given by the type of the @aCompile@ function.
It returns a list of instructions.
\begin{important}
The postfix representation of expressions is a way of {\em
flattening}\index{flattening, of a tree} or {\em
linearising}\index{linearising, of a tree} an expression tree, so that
the expression can be represented by a flat sequence of operators.
\end{important}
\begin{exercise}\label{gm:X:structural-induction}
Using structural induction, or otherwise, prove that the postfix evaluation
of arithmetic expressions results in the same answer as the tree evaluation
of expressions. That is: prove that for all expressions @e@ of type @aExpr@,
\[
@aInterpret@~@e@ = @aEval@~(@aCompile@~@e@,~[])
\]
This is an example of a {\em congruence proof}\index{congruence proof, of compiler correctness}.
\end{exercise}
\begin{exercise}\label{gm:X:structural-induction-hard}
Extend the functions @aInterpret@, @aCompile@ and @aEval@ to handle
@let@ expressions. Prove that for all expressions in @e@ of type @aExpr@, these
new functions satisfy the relation:
\[
@aInterpret@~@e@ = @aEval@~(@aCompile@~@e@,~[])
\]
Can you extend the language to even more complicated expressions,
e.g. @letrec@ expressions? Can you prove that you have correctly
implemented these extensions?
\end{exercise}
\subsection{Using postfix code to construct graphs\index{postfix
code!to construct graphs}}
We can use the same technique to create an instance of a
supercombinator body. In this case the `values' on the stack will be
addresses of parts of the expression being instantiated.
The operations of the template construction instructions will be
different from those we saw in the arithmetic example above, in that
the instructions generally have the side-effect of allocating nodes in
the heap. As an example, consider introducing an @Mkap@ instruction.
This instruction makes an application node, in the heap, from the top
two addresses on the stack. It leaves a pointer to this new node on
the stack upon completion.
There is no reason to invent a new evaluation stack of addresses, as
our template instantiation machine already has such a stack. However,
there is an important point to remember if we do make use of this
stack:
\begin{important}
The map of the {\em stack locations\/}\index{stack locations!in G-machine}
corresponding to variable names will change as we pop and push objects
from the stack. We must therefore keep track of this when we are
compiling expressions.
\end{important}
Our access to items in the stack is relative to the top of the stack.
So, if an item is added, the offset to reach that item is increased by
one; similarly, when an item is popped, the offset is decreased by one.
\subsection{What happens after an instantiation has been made?}
Once the instantiation of the supercombinator body has been made we
must tidy up the stack, and arrange the continuation of the evaluation
process. On completing the evaluation of the postfix sequence for a
supercombinator with $n$ arguments, the stack will have the following
form:
\begin{itemize}
\item On top there will be the address in heap of the newly
instantiated body, @e@.
\item Next there are the $n+1$ pointers. From these we can access the
arguments used in the instantiation process.
\item The last of the $n+1$ pointers points to the root of the
expression we have just instantiated.
\end{itemize}
This is shown in Figure~\ref{gm:fg:stack}.
\begin{figure} %\centering
\input{gm_stack}
\caption{The stack layout for the Mark~1 machine\index{G-machine stack
layout!Mark 1}}\label{gm:fg:stack} \end{figure}
We must replace the redex with the newly instantiated body, and pop
off $n$ items from the stack, using the @Slide@ instruction. To find
the next supercombinator we must now start unwinding again, using the
@Unwind@ instruction. By adding operations to do the tidying and
unwinding to the postfix operator sequence, we have transformed the
template instantiator into our Mark~1 G-machine.
The code for the function @f x1 ... xn = e@ is:
\begin{verbatim}
<code to construct an instance of e>
Slide n+1
Unwind
\end{verbatim}
\section{Mark 1: A minimal G-machine\index{G-machine!Mark 1}}
\label{minimal-gm}
We now present the code for a complete G-machine and its compiler. It
does not perform updates (which are introduced in
Section~\ref{gm:sc:mark2}) or arithmetic (which is introduced in
Section~\ref{gm:sc:primitives}).
\subsection{Overall structure\index{G-machine!toplevel}}
At the top level the G-machine is very similar to the template instantiator;
as usual the whole system is knitted together with a @run@ function.
M> run :: [char] -> [char]
M> run = showResults . eval . compile . parse
GH> -- The function run is already defined in gofers standard.prelude
GH> runProg :: [Char] -> [Char]
GH> runProg = showResults . eval . compile . parse
The parser data structures and functions are included because we will
need access to them.
M> %include "language" || parser data types
GH> -- :a language.lhs -- parser data types
\subsection{Data type definitions}
Fundamental to graph reduction implementation techniques is the graph.
We use the @heap@ data type, amongst others, from the utilities
provided in Appendix~\ref{sect:utils}.
M> %include "utils" || heap data type and other library functions
GH> -- :a util.lhs -- heap data type and other library functions
The Mark~1 G-machine uses the five-tuple, @gmState@, as its state. A
@gmState@ holds all the information that we need during the execution
of the compiled program.
M1-3> gmState == (gmCode, || Current instruction stream
M1-3> gmStack, || Current stack
M1-3> gmHeap, || Heap of nodes
M1-3> gmGlobals, || Global addresses in heap
M1-3> gmStats) || Statistics
GH1-3> type GmState
GH1-3> = (GmCode, -- Current instruction stream
GH1-3> GmStack, -- Current stack
GH1-3> GmHeap, -- Heap of nodes
GH1-3> GmGlobals, -- Global addresses in heap
GH1-3> GmStats) -- Statistics
In describing the G-machine, we will make use of state {\em access
functions\/}\index{access functions} to access the components of a
state. The advantage of this approach is that when we modify the state
to accommodate new components, we may reuse most of the original code
we have written. We will use the prefix @get@ to denote an access
function that gets a component from a state, and the prefix @put@ to
replace a component in a state.
We consider the type definitions of each of the five components of the
state, and their access functions, in turn.
\begin{itemize}
\item The instruction stream is of type @gmCode@ and is simply a list
of @instruction@s.
M> gmCode == [instruction]
GH> type GmCode = [Instruction]
To get convenient access to the code, when the state is later
augmented with extra components, we define two functions: @getCode@ and
@putCode@.
M> getCode :: gmState -> gmCode
GH> getCode :: GmState -> GmCode
1-3> getCode (i, stack, heap, globals, stats) = i
M> putCode :: gmCode -> gmState -> gmState
GH> putCode :: GmCode -> GmState -> GmState
1-3> putCode i' (i, stack, heap, globals, stats)
1-3> = (i', stack, heap, globals, stats)
\par
There are only six instructions initially. We will describe these in
more detail in subsection~\ref{gm:ss:eval1}.
M1> instruction ::= Unwind
M1> | Pushglobal name
M1> | Pushint num
M1> | Push num
M1> | Mkap
M1> | Slide num
GH1> data Instruction
GH1> = Unwind
GH1> | Pushglobal Name
GH1> | Pushint Int
GH1> | Push Int
GH1> | Mkap
GH1> | Slide Int
GH1> instance Eq Instruction
GH1> where
GH1> Unwind == Unwind = True
GH1> Pushglobal a == Pushglobal b = a == b
GH1> Pushint a == Pushint b = a == b
GH1> Push a == Push b = a == b
GH1> Mkap == Mkap = True
GH1> Slide a == Slide b = a == b
GH1> _ == _ = False
\item The G-machine stack @gmStack@ is a list of addresses in the heap.
M> gmStack == [addr]
GH> type GmStack = [Addr]
To get convenient access to the stack, when the state is later
augmented with extra components, we define two functions @getStack@ and
@putStack@
M> getStack :: gmState -> gmStack
GH> getStack :: GmState -> GmStack
1-3> getStack (i, stack, heap, globals, stats) = stack
M> putStack :: gmStack -> gmState -> gmState
GH> putStack :: GmStack -> GmState -> GmState
1-3> putStack stack' (i, stack, heap, globals, stats)
1-3> = (i, stack', heap, globals, stats)
\item Just as we did in the case of the template instantiator, we use
the heap data structure from @utils@ to implement heaps.
M> gmHeap == heap node
GH> type GmHeap = Heap Node
Again, to access this component of the state we define access
functions.
M> getHeap :: gmState -> gmHeap
GH> getHeap :: GmState -> GmHeap
1-3> getHeap (i, stack, heap, globals, stats) = heap
M> putHeap :: gmHeap -> gmState -> gmState
GH> putHeap :: GmHeap -> GmState -> GmState
1-3> putHeap heap' (i, stack, heap, globals, stats)
1-3> = (i, stack, heap', globals, stats)
In the minimal G-machine there are only three types of nodes: numbers,
@NNum@; applications, @NAp@; and globals, @NGlobal@.
M1> node ::= NNum num || Numbers
M1> | NAp addr addr || Applications
M1> | NGlobal num gmCode || Globals
GH1> data Node
GH1> = NNum Int -- Numbers
GH1> | NAp Addr Addr -- Applications
GH1> | NGlobal Int GmCode -- Globals
Number nodes contain the relevant number; application nodes apply the
function at the first address to the expression at the second address.
The @NGlobal@ node contains the number of arguments that the global
expects and the code sequence to be executed when the global has
enough arguments. This replaces the @NSupercomb@ nodes of the template
instantiator, which held a template instead of the arity and code.
\item Because we will later be making a lazy implementation it is
important that there is only one node for each global. The address of
a global can be determined by looking up its value in the association
list @gmGlobals@. This corresponds to the @tiGlobals@ component
of the template machine.
M> gmGlobals == assoc name addr
GH> type GmGlobals = ASSOC Name Addr
The access function we use is @getGlobals@; in the Mark~1 machine, this
component is constant so we do not need a corresponding put function.
M> getGlobals :: gmState -> gmGlobals
GH> getGlobals :: GmState -> GmGlobals
1-3> getGlobals (i, stack, heap, globals, stats) = globals
\item The statistics component of the state is implemented as an
abstract data type.
M> abstype gmStats
M> with statInitial :: gmStats
M> statIncSteps :: gmStats -> gmStats
M> statGetSteps :: gmStats -> num
GH> statInitial :: GmStats
GH> statIncSteps :: GmStats -> GmStats
GH> statGetSteps :: GmStats -> Int
The implementation of @gmStats@ is now given.
M> gmStats == num
GH> type GmStats = Int
> statInitial = 0
> statIncSteps s = s+1
> statGetSteps s = s
To access this component we define @getStats@ and @putStats@:
M> getStats :: gmState -> gmStats
GH> getStats :: GmState -> GmStats
1-3> getStats (i, stack, heap, globals, stats) = stats
M> putStats :: gmStats -> gmState -> gmState
GH> putStats :: GmStats -> GmState -> GmState
1-3> putStats stats' (i, stack, heap, globals, stats)
1-3> = (i, stack, heap, globals, stats')
\end{itemize}
\subsection{The evaluator\index{G-machine!evaluator}}
\label{gm:ss:eval1}
The G-machine evaluator, @eval@, is defined to produce a list of
states. The first one is the one constructed by the compiler. If
there is a last state, then the result of the evaluation will be on
the top of the stack component of the last state.
M> eval :: gmState -> [gmState]
GH> eval :: GmState -> [GmState]
> eval state = state: restStates
> where
M> restStates = [], gmFinal state
M> = eval nextState, otherwise
GH> restStates | gmFinal state = []
GH> | otherwise = eval nextState
> nextState = doAdmin (step state)
The function @doAdmin@ uses @statIncSteps@ to modify the statistics
component of the state.
M> doAdmin :: gmState -> gmState
GH> doAdmin :: GmState -> GmState
> doAdmin s = putStats (statIncSteps (getStats s)) s
The important parts of the evaluator are the functions @gmFinal@ and
@step@ which we will now look at.
\subsubsection{Testing for a final state\index{termination
condition!in G-machine}}
The G-machine interpreter has finished when the code sequence that it
is executing is empty. We express this condition in the @gmFinal@
function.
M> gmFinal :: gmState -> bool
M> gmFinal s = getCode s = []
GH> gmFinal :: GmState -> Bool
GH> gmFinal s = case (getCode s) of
GH> [] -> True
GH> otherwise -> False
\subsubsection{Taking a step}
The @step@ function is defined so that it makes a state transition
based on the instruction it is executing.
M> step :: gmState -> gmState
GH> step :: GmState -> GmState
1-6> step state = dispatch i (putCode is state)
1-6> where (i:is) = getCode state
\par
We @dispatch@ on the current instruction @i@ and replace the current
code sequence with the code sequence @is@; this corresponds to
advancing the program counter in a real machine.
M1> dispatch :: instruction -> gmState -> gmState
GH1> dispatch :: Instruction -> GmState -> GmState
1> dispatch (Pushglobal f) = pushglobal f
1> dispatch (Pushint n) = pushint n
1> dispatch Mkap = mkap
1> dispatch (Push n) = push n
1> dispatch (Slide n) = slide n
1> dispatch Unwind = unwind
As we can see, the @dispatch@ function simply selects a state
transition to execute.
Let us begin by looking at the transition rules for the postfix
instructions. There will be one for each syntactic object in
@instruction@. We begin with the @Pushglobal@ instruction, which uses the
@globals@ component of the state to find the unique @NGlobal@ node in the
@heap@ that holds the global $f$. If it cannot find one, it prints a
suitable error message.
\gmrule%
{\gmstate{@Pushglobal@\ f:i}{s}{h}{m[f:a]}}%
{\gmstate{i}{a:s}{h}{m}}
We implement this rule using the @pushglobal@ function.
M> pushglobal :: name -> gmState -> gmState
GH> pushglobal :: Name -> GmState -> GmState
> pushglobal f state
> = putStack (a: getStack state) state
> where a = aLookup (getGlobals state) f (error ("Undeclared global " ++ f))
\par
The remaining transitions are for constructing the body of a
supercombinator. The transition for @Pushint@ places an integer node
into the heap.
\gmrule%
{\gmstate{@Pushint@\ n:i}{s}{h}{m}}%
{\gmstate{i}{a:s}{h[a:@NNum@\ n]}{m}}
The corresponding function is @pushint@.
The number is placed in the new heap @heap'@ with address @a@. We then
place the heap and stack back into the state.
M> pushint :: num -> gmState -> gmState
GH> pushint :: Int -> GmState -> GmState
> pushint n state
> = putHeap heap' (putStack (a: getStack state) state)
> where (heap', a) = hAlloc (getHeap state) (NNum n)
\par
The @Mkap@ instruction uses the two addresses on the top of the stack
to construct an application node in the heap. It has the following
transition rule.
\gmrule%
{\gmstate{@Mkap@:i}{a_1:a_2:s}{h}{m}}%
{\gmstate{i}{a:s}{h[a: @NAp@\ a_1\ a_2]}{m}}
This transition becomes @mkap@. Again @heap'@ and @a@ are respectively
the new heap and the address of the new node.
M> mkap :: gmState -> gmState
GH> mkap :: GmState -> GmState
> mkap state
> = putHeap heap' (putStack (a:as') state)
M> where (heap', a) = hAlloc (getHeap state) (NAp a1 a2)
M> (a1:a2:as') = getStack state
GH> where (heap', a) = hAlloc (getHeap state) (NAp a1 a2)
GH> (a1:a2:as') = getStack state
\par
The @Push@ instruction is used to take a copy of an argument which was
passed to a function. To do this it has to `look through' the
application node which is pointed to from the stack. We must also
remember to skip over the supercombinator node which is on the stack.
\gmrule%
{\gmstate{@Push@\ n:i}{a_0:\ldots:a_{n+1}:s}{h[a_{n+1}:@NAp@\ a_n \ a'_n]}{m}}%
{\gmstate{i}{a'_n:a_0:\ldots:a_{n+1}:s}{h}{m}}
M1-2> push :: num -> gmState -> gmState
GH1-2> push :: Int -> GmState -> GmState
1-2> push n state
1-2> = putStack (a:as) state
1-2> where as = getStack state
1-2> a = getArg (hLookup (getHeap state) (as !! (n+1)))
This uses the auxiliary function @getArg@ to select the required
expression from an application node.
M> getArg :: node -> addr
GH> getArg :: Node -> Addr
> getArg (NAp a1 a2) = a2
\begin{important}
Because of the stack structure we have changed the addressing mode of
the @Push@ instruction from that used in \cite{PJBook}.
\end{important}
Next, the tidying up of the stack, which occurs after a supercombinator
has been instantiated and before continuing unwinding, is performed by
the @Slide@ instruction.
\gmrule%
{\gmstate{@Slide@\ n:i}{a_0:\ldots:a_n:s}{h}{m}}%
{\gmstate{i}{a_0:s}{h}{m}}
M> slide :: num -> gmState -> gmState
GH> slide :: Int -> GmState -> GmState
> slide n state
> = putStack (a: drop n as) state
> where (a:as) = getStack state
\par
@Unwind@ is the most complex instruction because it replaces the outer
loop of our template instantiator. The @Unwind@ instruction is always
the last instruction of a sequence, as we shall see in the next
section. The @newState@ constructed depends on the item on top of the
stack; this depends on the transition rule that is fired, which also
depends on the item on top of the stack.
M1> unwind :: gmState -> gmState
GH1> unwind :: GmState -> GmState
1> unwind state
1> = newState (hLookup heap a)
1> where
1> (a:as) = getStack state
1> heap = getHeap state
\par
We first consider the case where there is a number on top of the
stack. In this case, we are finished; the G-machine has terminated,
and we place $[]$ in the code component to signify this fact.
\gmrule%
{\gmstate{[@Unwind@]}{a:s}{h[a: @NNum@\ n]}{m}}%
{\gmstate{[]}{a:s}{h}{m}}
1> newState (NNum n) = state
\par
If there is an application node on top of the stack then we must
continue to unwind from the next node.
\gmrule%
{\gmstate{[@Unwind@]}{a:s}{h[a: @NAp@\ a_1\ a_2]}{m}}%
{\gmstate{[@Unwind@]}{a_1:a:s}{h}{m}}
1> newState (NAp a1 a2) = putCode [Unwind] (putStack (a1:a:as) state)
\par
The most complicated rule occurs when there is a global node on top of
the stack. There are two cases to consider, depending on whether there are
enough arguments to reduce the supercombinator application.
Firstly, if there are not enough arguments to reduce the
supercombinator application then the program was ill-typed. We will
ignore this case for the Mark~1 G-machine. Alternatively, when there
are enough arguments, it is possible to reduce the supercombinator, by
`jumping to' the code for the supercombinator. In the transition
rule this is expressed by moving the supercombinator code into the
code component of the machine.
\gmrule%
{\gmstate{[@Unwind@]}{a_0:\ldots:a_n:s}%
{h[a_0 : @NGlobal@\ n\ c]}{m}}%
{\gmstate{c}{a_0:\ldots:a_n:s}{h}{m}}
M1> newState (NGlobal n c) = error "Unwinding with too few arguments",
M1> #as < n
M1> = putCode c state, otherwise
GH1> newState (NGlobal n c)
GH1> | length as < n = error "Unwinding with too few arguments"
GH1> | otherwise = putCode c state
We have now seen how the instructions are defined, but we have not
seen how to generate the postfix sequences of operators, or
instruction sequences as we shall refer to them from now on. This is
the subject of the next subsection.
\subsection{Compiling a program\index{G-machine!compiler}}
\label{gm:ss:compile}
We describe the compiler using a set of {\em compilation
schemes}\index{compilation schemes!in G-machine}. Each
supercombinator definition is compiled using the compilation scheme
\tSC{}.
The compiled code generated for each supercombinator is defined in
Figure~\ref{gm:fg:schemes1}. Corresponding to the compilation schemes
\tSC{}, \tR{} and \tC{} are {\em compiler functions\/}\index{compiler
functions!in G-machine} @compileSc@, @compileR@ and @compileC@. We
consider each of these in turn.
\begin{figure*} %\centering
$\begin{array}{|rcll|}
\hline
&&&\\
\multicolumn{4}{|l|}{\parbox[t]{29pc}{%
$\SC{d}$ is the G-machine code for the supercombinator definition $d$.}}\\
&&&\\
\multicolumn{4}{|l|}{\SC{f\ x_1\ \ldots\ x_n\ =\ e}
\: = \: \R{e}\ [x_1\mapsto 0,\,\ldots,\,x_n\mapsto n-1]\ n}\\
&&&\\
\hline
&&&\\
\multicolumn{4}{|l|}{\parbox[t]{29pc}{%
$\R{e}~\rho~d$ generates code which instantiates the expression $e$ in
environment $\rho$, for a supercombinator of arity $d$, and then proceeds
to unwind the resulting stack.}}\\
&&&\\
\R{e}\ \rho\ d & = & \C{e}\ \rho \append [@Slide@\ d+1,\, @Unwind@] & \\
&&&\\
\hline
&&&\\
\multicolumn{4}{|l|}{\parbox[t]{29pc}{%
$\C{e}~\rho$ generates code which constructs the graph of $e$ in environment
$\rho$, leaving a pointer to it on top of the stack.}}\\
&&&\\
\C{f}\ \rho & = & [@Pushglobal@\ f] &
\tr{where $f$ is a supercombinator} \\
%
\C{x}\ \rho & = & [@Push@\ (\rho\ x)] &
\tr{where $x$ is a local variable} \\
%
\C{i}\ \rho & = & [@Pushint@\ i] &\\
\C{e_0~ e_1}\ \rho & = & \C{e_1}\ \rho \append
\C{e_0}\ \rho^{+1} \append [@Mkap@] &
\tr{where $\rho^{+n}\ x = (\rho\ x) + n$} \\
&&&\\
\hline
\end{array}$
\caption{The \tSC{}, \tR{} and \tC{} compilation schemes}
\label{gm:fg:schemes1}
\end{figure*}
The @compile@ function turns a program into an initial state for the
G-machine. The initial code sequence finds the global @main@ and then
evaluates it. The heap is initialised so that it contains a node for
each global declared. @globals@ contains the map from global names to the
@NGlobal@ nodes provided for them.
M1-3> compile :: coreProgram -> gmState
GH1-3> compile :: CoreProgram -> GmState
1-3> compile program
1-3> = (initialCode, [], heap, globals, statInitial)
1-3> where (heap, globals) = buildInitialHeap program
\par
To construct the initial heap and to provide the map of the global
nodes for each global defined we use @buildInitialHeap@. This is just
as it was in the template machine.
M1-6> buildInitialHeap :: coreProgram -> (gmHeap, gmGlobals)
GH1-6> buildInitialHeap :: CoreProgram -> (GmHeap, GmGlobals)
1-6> buildInitialHeap program
1-6> = mapAccuml allocateSc hInitial compiled
1-6> where compiled = map compileSc (preludeDefs ++ program) ++
1-6> compiledPrimitives
The @buildInitialHeap@ function uses @mapAccuml@ to allocate nodes for
each compiled global; the compilation occurring (where necessary) in
@compiled@, which has type @[gmCompiledSC]@.
M> gmCompiledSC == (name, num, gmCode)
GH> type GmCompiledSC = (Name, Int, GmCode)
The function @allocateSc@ allocates a new global for its compiled
supercombinator argument, returning the new heap and the address where
the global is stored.
M> allocateSc :: gmHeap -> gmCompiledSC -> (gmHeap, (name, addr))
GH> allocateSc :: GmHeap -> GmCompiledSC -> (GmHeap, (Name, Addr))
> allocateSc heap (name, nargs, instns)
> = (heap', (name, addr))
> where (heap', addr) = hAlloc heap (NGlobal nargs instns)
\par
In the initial state, we want the machine to evaluate the value of the
program. We recall that this is just the value of the global @main@.
M1-3> initialCode :: gmCode
GH1-3> initialCode :: GmCode
1-3> initialCode = [Pushglobal "main", Unwind]
\par
Each supercombinator is compiled using @compileSc@, which implements
the \tSC{} scheme of Figure~\ref{gm:fg:schemes1}. It returns a triple
containing the supercombinator name, the number of arguments the
supercombinator needs before it can be reduced, and the code sequence
associated with the supercombinator.
M> compileSc :: (name, [name], coreExpr) -> gmCompiledSC
GH> compileSc :: (Name, [Name], CoreExpr) -> GmCompiledSC
> compileSc (name, env, body)
> = (name, length env, compileR body (zip2 env [0..]))
This in turn uses @compileR@, which corresponds to the \tR{} scheme of
Figure~\ref{gm:fg:schemes1}.
M1-4> compileR :: gmCompiler
GH1-4> compileR :: GmCompiler
M1> compileR e env = compileC e env ++ [Slide (#env + 1), Unwind]
M6> compileR e env = compileC e env ++ [Slide (#env + 1), Unwind]
GH1> compileR e env = compileC e env ++ [Slide (length env + 1), Unwind]
GH6> compileR e env = compileE e env ++ [Slide (length env + 1), Unwind]
Each of the compiler schemes has the same type: @gmCompiler@.
M> gmCompiler == coreExpr -> gmEnvironment -> gmCode
GH> type GmCompiler = CoreExpr -> GmEnvironment -> GmCode
\par
We use the fact that we can represent the map $\rho$ from the
compilation scheme as an association list. Not only can we look up the
offsets for a variable from this list, but we may also calculate how
many arguments there are on the stack. This is used in @compileR@ to
find out how many stack elements to squeeze out with a @Slide@
instruction. The list has type @gmEnvironment@, which is defined as:
M> gmEnvironment == assoc name num
GH> type GmEnvironment = ASSOC Name Int
This constructs the instantiation of the supercombinator body using
@compileC@, which corresponds to the \tC{} scheme
of Figure~\ref{gm:fg:schemes1}.
M1-2> compileC :: gmCompiler
M1-2> compileC (EVar v) env = [Push n], member (aDomain env) v
M1-2> = [Pushglobal v], otherwise
M1-2> where n = aLookup env v (error "Can't happen")
GH1-2> compileC :: GmCompiler
GH1-2> compileC (EVar v) env
GH1-2> | elem v (aDomain env) = [Push n]
GH1-2> | otherwise = [Pushglobal v]
GH1-2> where n = aLookup env v (error "Can't happen")
1-2> compileC (ENum n) env = [Pushint n]
1-2> compileC (EAp e1 e2) env = compileC e2 env ++
1-2> compileC e1 (argOffset 1 env) ++
1-2> [Mkap]