-
Notifications
You must be signed in to change notification settings - Fork 0
/
thesis.tex
1877 lines (1725 loc) · 110 KB
/
thesis.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[12pt]{report}
%\documentclass{article}
\usepackage{uwmthesis}
\usepackage{amsmath,amsthm}
\usepackage{amssymb}
\usepackage{xcolor}
\usepackage{listings}
\usepackage{algorithm}
\usepackage[noend]{algpseudocode}
\usepackage{flexisym}
\usepackage{graphicx}
\usepackage{courier}
\usepackage{tikz}
\usepackage{pgfplots}
\algorithmspagefalse
% "define" Scala
\lstdefinelanguage{scala}{
morekeywords={abstract,case,catch,class,def,%
do,else,extends,false,final,finally,%
for,if,implicit,import,match,mixin,%
new,null,object,override,package,%
private,protected,requires,return,sealed,%
super,this,throw,trait,true,try,%
type,val,var,while,with,yield},
otherkeywords={=>,<-,<\%,<:,>:,\#,@},
sensitive=true,
morecomment=[l]{//},
morecomment=[n]{/*}{*/},
morestring=[b]",
morestring=[b]',
morestring=[b]"""
}
\lstset{basicstyle=\footnotesize\ttfamily}
\makeatletter
\def\tagform@#1{\maketag@@@{[\ignorespaces#1\unskip\@@italiccorr]}}
\makeatother
\begin{document}
\title{\textit{h}-CFA\@: A Simplified Approach for\\ Pushdown Control Flow Analysis}
\author{Fei Peng}
\Program{Computer Science}
\majorprof{Tian Zhao}
\degree{Master of Science}
\submitdate{August 2016}
\Abstract{
In control flow analysis (CFA), call/return mismatch is a problem that reduces analysis precision.
So-called \textit{k}-CFA uses bounded call-strings to obtain limited call/return matching, but it has a serious performance problem
due to its coupling of call/return matching with context-sensitivity of values.
CFA2 and PDCFA are the first two algorithms that bring pushdown (context-free reachability) approach to the CFA area, which provide perfect call/return mathcing.
However, CFA2 and PDCFA both need significant engineering effort to implement.
The abstracting abstract machine (AAM), a configurable framework for constructing abstract interpreters,
introduces store-allocated continuations that make the soundness of abstract interpreters easily obtainable. \pagebreak
Recently, two related approaches (AAC and P4F) provide call/return matching using AAM by modeling the call-stack as a pushdown system.
However, AAC incurs high overhead and is hard to understand, while P4F cannot compute monovariant analysis.
To overcome the above shortcomings, we developed a new method, \textit{h}-CFA, to address the call/return mismatch problem.
\textit{h}-CFA records the program execution history during abstract interpretation and uses it to avoid control flow merging that causes call/return mismatch.
Our method uses AAM and is very easy to implement for ANF style program.
ANF is a popular intermediate representation of programs that converts all complex intra-procedural control flows to linear let-bindings and sets a syntactic variable to each sub-expression. In addition, our method reveals an essential property of any pushdown CFA,
which we exploited in the development of a static analyzer for JavaScript, named JsCFA\@.
This application of the essential property avoids recording the program execution history, so source programs are no long required being the ANF form.
Meanwhile, JsCFA adopts a technique to solve the environment problem or fake rebinding, which eliminates more defects of monovariant analysis.
This, in cooperation with exact call/return matching, yield more precise analysis and better performance.
Moreover, JsCFA supports a configurable interface to add context-sensitivity to selected areas of programs.
JsCFA applies the interface to improve the analysis precision for runtime object extensions.
Finally, we quantitatively evaluated the performance of JsCFA\@.
}
\beforepreface
%\prefacesection{Preface}
%This thesis tells you all you need to know about...
\afterpreface
\chapter{Introduction}
\label{Introduction}
%\section{Control Flow Analysis}
%\label{sub:CFA}
Dynamic programming languages, such as JavaScript, Python, and Ruby, play a significant role in computing areas, such as system management, web development, and scientific computing.
Therefore, developers who are using these languages increasingly demand tools for improving code quality, such as security auditing, error- checking, debugging, refactoring, and more.
However, certain features of dynamic languages (e.g.\ duck-typing, first-class functions, and highly dynamic object models) make achieving these requirements difficult.
For example, static programming languages are able to report certain semantic errors before executing programs, but in dynamic languages all the semantic errors just can be found during runtime, which is too risky for large-scale commercial software.
To this end, control flow analysis~\cite{midtgaard2012control}
(CFA) has been used to detect deep semantic information before the actual running of programs written in dynamic languages.
CFA is a class of algorithms that give conservative approximation to inter-procedural information of programs
before running them.
Statically detecting the precise target of a function call is difficult for programs written in
higher-order (functional) languages.
To illustrate this problem, consider the following example in Scheme.
\lstset{language=Lisp,
mathescape,
keywords={let,lambda}}
\begin{lstlisting}
(let* ((f (lambda (x) (x 1)))
(g (lambda (y) (+ y 2)))
(h (lambda (z) (+ z 3))))
(+ (f g) (f h)))
\end{lstlisting}
In the body of function \verb|f|, the call site \verb|(x 1)| will transfer control to function bodies that variable \verb|x| potentially refers to.
However, the next step in the control flow is not obvious because \verb|x| is the formal parameter of function \verb|f| and will be bound to
unknown values.
Shivers invented \textit{k}-CFA~\cite{shivers1991control}
as the first popular solution to the control flow problem.
\textit{k}-CFA applies an abstract interpretation~\cite{cousot1977abstract} approach to simulate program execution statically and
provides conservative approximations with a configurable hierarchy of precision.
Shivers chose finite call-strings~\cite{sharir1978two} to represent runtime contexts for the abstract interpretation.
Call-strings with length of \textit{k} record latest \textit{k} call sites, which make the state space of \textit{k}-CFA finite, and longer call-strings yield more precise analysis with higher overhead.
0-CFA is a special case of \textit{k}-CFA that uses empty call-strings (\textit{k} is zero).
\begin{figure}
\lstset{language=Lisp,
mathescape,
keywords={let,lambda}}
\begin{lstlisting}
(let* ((id (lambda (x) x))
(a (id 1)$^1$)
(b (id #t)$^2$))
$\dots$)
\end{lstlisting}
\caption{An example program showing imprecision of 0-CFA}
\label{fig:eg1}
\end{figure}
%\section{Problems of CFA}
%\label{sub:}
\paragraph{Problems of CFA}
The 0-CFA and \textit{k}-CFA without enough context information are imprecise for realistic programs.
For example, call/return mismatch is always a problem in \textit{k}-CFA that dramatically reduces the precision of analysis.
Consider the trivial example in Figure~\ref{fig:eg1}, where, in 0-CFA, the \verb|id| function is called twice and \verb|#t| eventually flows into variable \verb|a| because there is a spurious flow from call site
\verb|(id #t)| to \verb|(id 1)|.
In 1-CFA ($k = 1$), the values of the local variable \verb|x| are distinguished by different call site environments.
In this example, the two calls to the \verb|id| function are labeled with $1$ and $2$ respectively.
Different versions of the variable $x$ in different calls are separated by the call site labels.
For example, $(x, [2]) \mapsto \{\#t\}$ because the value of \verb|x| is \verb|#t| at call site 2.
Original \textit{k}-CFA also uses the variables' environment to filter inter-procedural control flows,
which means that the value of \verb|x| from call site 2 only can be returned to \verb|(id #t)|.
In this case (a non-recursive program), call-string with size 1 is enough to provide precise call/return flow
(both data and control flow).
However, longer call sequences or recursive calls propagate spurious information to the whole program.
\begin{figure}
\lstset{language=Lisp,
mathescape,
keywords={let,lambda}}
\begin{lstlisting}
(let* ((id (lambda (x) x))
(f (lambda (y) (id y)$^1$))
(a (f 1)$^2$)
(b (f #t)$^3$))
$\dots$)
\end{lstlisting}
\caption{An example program showing imprecision of 1-CFA}
\label{fig:eg2}
\end{figure}
Consider the example in Figure~\ref{fig:eg2}, the \verb|id| function is called by \verb|f|, and there are two calls to function \verb|f|.
1-CFA is no longer precise for this program because $(x, [1])$ can be generated by call site 2 and 3 both.
Then abstract value \verb|{1, #t}| eventually flows into variable \verb|a| and \verb|b|.
In this example, 2-CFA can distinguish the two call sites, which $(x, [2, 1]) \mapsto \{1\}$ and $(x, [3, 1]) \mapsto \{\#t\}$ indicate correct data and control flows.
Therefore, we can achieve precise call/return matching with a large enough \textit{k} on non-recursive programs.
However, recursive function invocations can make any call-string ``overflow'', which call-strings will be filled by duplicated recursive call sites and lose earlier context information.
Particularly, the recursion is ubiquitously existing in functional programs.
Meanwhile, the performance of \textit{k}-CFA is unacceptable even when $k = 1$~\cite{van2008deciding}.
In addition, \textit{k}-CFA is tightly bound with call-site sensitivity, other context-sensitivity strategies~\cite{agesen1995cartesian, milanova2005parameterized,smaragdakis2011pick, lhotak2003scaling, wright1998polymorphic} is hard to be applied.
\paragraph{Existing techniques}
There is a family of algorithms that attempt to perfectly match return flows with their true call site entries in static analysis, which is referred to as pushdown or context-free approach~\cite{reps1995precise, sagiv1996precise}.
CFA2~\cite{vardoulakis2010cfa2} is the first attempt that brings precise call/return matching to monovariant analysis in exponential time complexity.
Because monovariant analysis still merges too many data flows even if it has accurate inter-procedural control flows, CFA2 introduces \emph{stack filtering} to eliminate the imprecision of local variables.
Additionally, there are other three approaches (PDCFA~\cite{earl2010pushdown}, AAC~\cite{johnson2015abstracting}, and P4F~\cite{gilray2016pushdown}) that provide accurate call/return matching by modeling the call-stack as a pushdown system.
However, PDCFA and CFA2 need significant engineering effort to implement~\cite{gilray2016pushdown}.
AAC and P4F is easy to implement in the abstracting abstract machine (AAM)~\cite{van2010abstracting} framework, but AAC incurs high overhead (see Section~\ref{sec:Related} and Section~\ref{sub:swh}) and is difficult to understand while P4F cannot compute monovariant analysis and just has limited call/return matching strength.
\paragraph{A simplified approach}
In this paper, we introduce a new method to address the call/return mismatch problem.
In terms of implementation, this method is as simple as writing concrete interpreters in CESK machine style~\cite{felleisen1987calculus}.
It provides perfect call/return matching for monovariant and polyvariant control flow analysis.
Since this method records \emph{program execution histories} through the abstract interpretation process and uses it to encode continuation addresses, we name it \textit{h}-CFA\@.
The program execution history can be regarded as call-strings with automatically determined length.
For non-recursive calls, the execution history always provides enough context information, no matter how deep the call sequence is.
Moreover, the history automatically stops growth for recursive calls while the worklist iteration (Section~\ref{subs:Abstracting Abstract Machine}) is responsible for finding the fixed-point of recursive computation.
\paragraph{Application}
To verify the practicability of our theory, we implemented a static analyzer for a subset of JavaScript (ECMAScript 3) in Scala, and we call it JsCFA\@.
JsCFA not only uses pushdown CFA but also adopts other techniques to improve analysis precision for real-world programs.
JsCFA usually computes monovariant control-flow facts that incurs critical imprecision for realistic programs and libraries that are written in dynamic higher-order languages.
For example, even the abstract interpreter can perfectly match call/return flows, monovariant or polyvariant analysis without enough context information may also generate spurious data flows from false environments, which is referred to as \emph{environment problem}~\cite{shivers1991control, might2007environment}.
To illustrate this problem, consider the analyzing process of Figure~\ref{fig:eg1} again: assuming the analyzer always matches return flows with correct call sites, which function \verb|id| called by call site 1 only returns to \verb|a| and call site 2 only returns to \verb|b|.
Variable \verb|a| will get abstract value \verb|{1}|, but \verb|{1, #t}| flows into variable \verb|b| because the local variable \verb|x| retains the value from call site 1 during abstractly interpreting call site 2.
This spurious data flow injures the practicability of pushdown CFA and causes other control flow problems (see details in Section~\ref{sub:filtering}).
We solved this problem by introducing abstract garbage collection~\cite{might2006improving} on the value store into JsCFA\@.
JsCFA works with abstract GC to remove those local bindings that pollute subsequent data flows, but abstract GC collaborating with \textit{k}-CFA is not safe~\cite{might2006improving}.
Meanwhile, we also applied abstract GC on the continuation store, which indirectly implements \textit{h}-CFA without recording program execution histories.
Finally, a benchmark test is provided to show the practicability of \textit{h}-CFA\@.
\paragraph{Outline}
In the rest of the thesis,
Section~\ref{sec:Related} describes the state-of-art techniques for CFA, which tend to improve the precision and reduce the overhead of \textit{k}-CFA\@. It also discusses existing pushdown approaches.
Section~\ref{sec:PushdownAAM} presents the abstracting abstract machine (AAM) technique in detail, including abstract syntax, semantics of the abstract machine, and store widening.
This section provides necessary preliminary knowledge to help readers understand our techniques because \textit{h}-CFA is also developed in the AAM framework.
Moreover, it summarizes advantages and disadvantages of AAM and reveals an essential drawback that introduces spurious return flows.
Section~\ref{sec:hcfa} formalizes \textit{h}-CFA and explains how it works with a simple example.
Meanwhile, we compare our technique with other related works in several dimensions and give a performance evaluation via benchmark results.
Section~\ref{sec:JsCFA} details the design and implementation of JsCFA, which applies our techniques in a JavaScript static analyzer.
This implementation not only uses pushdown CFA, but also adopts techniques
such as abstract garbage collection.
Then we describe an approach for implementing \textit{h}-CFA without recording the program execution history, which simplifies the intermediate representation and semantics of JsCFA\@.
At the end of this section, a benchmark test of JsCFA is provided.
Finally, we list several potential approaches for improving \textit{h}-CFA and JsCFA in Section~\ref{sec:Future}, and Section~\ref{sec:Conclude} concludes.
\chapter{Related Work}
\label{sec:Related}
In order to address the precision problem of original \textit{k}-CFA, many techniques are introduced from different perspectives.
Some algorithms tend to find better contexts for context-sensitive (polyvariant) analysis.
For example, call-site sensitivity~\cite{shivers1991control}, argument sensitivity~\cite{agesen1995cartesian}, object sensitivity~\cite{milanova2005parameterized,smaragdakis2011pick}, and field sensitivity~\cite{lhotak2003scaling} contribute different benefits to precision or performance for different situations.
Other techniques attempt to improve both monovariant and polyvariant in alternative ways.
One of the most popular method of this group is pushdown-based CFA (a.k.a.\ context-free language reachability), which introduces pushdown system into abstract interpretation.
The original \textit{k}-CFA algorithm abstracts each program as a finite-state machine
so that the abstract interpreter is guaranteed to terminate.
The abstraction of \textit{k}-CFA is only precise for programs with bounded call stacks.
However, many language constructs (i.e.\ function invocation, exception handling, and first-class continuation, etc.) can generate
\emph{recursive} control flows.
Since the abstraction of \textit{k}-CFA is not precise for recursive structures, pushdown-based CFA is a better choice.
The first contribution of this paper is a new method for implementing pushdown-based CFA, referred to as \textit{h}-CFA,
that provides perfect call/return matching.
Before describing our technique, we discuss the existing algorithms for the pushdown CFA and
preliminary knowledge of \textit{h}-CFA\@.
\paragraph{Pushdown CFA Algorithms}
The core idea of pushdown CFA is to mimic function call/return as an unbounded call stack for ordinary calls and summarizing call stacks to finite height for recursive calls because an unbounded call stack is not computable in static analysis.
CFA2~\cite{vardoulakis2010cfa2} is the first algorithm that employs a pushdown system for CFA\@.
CFA2 models the call stack as an implicit pushdown system, and summarizes the call stack with a tabulation algorithm for recursive functions.
PDCFA (pushdown control flow analysis~\cite{earl2010pushdown})
is another strategy that approximates unbounded stack model to be computable.
PDCFA analyzes programs using a \emph{Dyck state graph}~\cite{earl2010pushdown}, and tracks all of the reachable states in the graph.
Meanwhile, edges of the Dyck state graph that connect program states are annotated with stack actions (push, pop, and no action).
These stack actions explicitly represent a pushdown system and summarize recursive structures of the graph.
Both CFA2 and PDCFA introduce extra semantics for target languages, which makes the abstract interpreter hard to implement.
For this drawback, Van Horn and Might
invented the Abstracting Abstract Machine (AAM)~\cite{reps1995precise, van2010abstracting}
as a configurable framework for constructing abstract interpreters in the CESK abstract machine~\cite{felleisen1987calculus} style.
Since AAM not only allocates values in the store (as the original \textit{k}-CFA does), but also represents control flow using store-allocated continuations.
In AAM, each CESK state does not directly carry any continuation, but a continuation address that refers to a set of concrete continuations.
Merging several continuations in one continuation address achieves the effect of approximating control flows.
Meanwhile, AAM brings two benefits to control flow analysis.
On the one hand, it makes the soundness of abstract interpreters easily prove because values and continuations are both in the store and
the store size is fixed.
Hence, the number of machine states that abstract interpreters generate is always finite.
On the other hand, store-allocated continuations separate the context-sensitivity (polyvariance) strategy from
the call/return matching technique.
Additionally, implementing a static analyzer in AAM style is as easy as writing concrete interpreters.
AAC (Abstracting Abstract Control~\cite{johnson2015abstracting}) and P4F (pushdown control flow analysis for free~\cite{gilray2016pushdown})
are both pushdown CFA techniques based on AAM\@, which convert the call/return matching problem to a continuation-address allocation problem.
In other words, AAC and P4F just modify the continuation-allocation function of AAM to acquire call/return matching.
However, AAC has high asymptotic upper bound $O(n^9)$ in monovariance (this complexity is claimed in~\cite{gilray2016pushdown} that cites to an unpublished article) and converges slowly in practice (see Section~\ref{sub:swh}).
P4F has better performance in polyvariant analysis but it has limited call/return matching strength
and is not useful for monovariant analysis.
\chapter{Pushdown CFA in AAM}
\label{sec:PushdownAAM}
The AAM methodology considerably simplifies the implementation of abstract interpreters by introducing store-allocated values
and continuations. At the same time, the soundness of AAM is relatively easy to prove.
Therefore, we also use this theory as the foundation to develop \textit{h}-CFA and a JavaScript analyzer, JsCFA\@.
In this section we will review abstract interpretation in the setting of AAM to help readers to understand our techniques.
\section{Abstracting Abstract Machine}
\label{subs:Abstracting Abstract Machine}
In this section, we describe pushdown CFA algorithms using
lambda calculus in the style of Administrative Normal Form (ANF)~\cite{flanagan1993essence}.
\[
\tag{expressions}
\begin{aligned}
\label{eq:bnf}
e \in Exp ::= {}& (let\ ((x\ (f\ \ae)))\ e) {} \\
| & (let\ ((y\ \ae))\ e) {}\\
| & \ \ae
\end{aligned}
\]
\[
\tag{atomic expressions}
f, \ae \in AExp ::= x\ |\ lambda
\]
\[
\tag{lambda abstractions}
lambda \in Lambda ::= (\lambda\ (x)\ e)
\]
\[
\tag{variables}
x,y \in Var \mbox{ is a set of identifiers}
\]
Above syntax definition just focuses on three kinds of expressions, calls, declarations, and returns.
Other syntactic components, such as tail calls, conditional branching, do not complicate our semantics, so we leave them out.
ANF sets a unique label for every intermediate expression, and these unique labels help we to implement and express \textit{h}-CFA easily.
Moreover, all of the intra-procedural control flows (the order of operations) are already compiled into \verb|let| forms, which simplifies the semantics and accelerates our implementation.
Abstracting abstract machine (AAM) describes abstract interpreters that run and approximate a language on CESK abstract machine style.
The abstract interpreter operates over CESK machine states $\tilde{\varsigma}$.
\[
\tag{states}
\tilde{\varsigma}\in\tilde{\sigma} \triangleq Exp \times \widetilde{Env} \times \widetilde{Store}
\times \widetilde{KStore} \times \widetilde{KAddr}
\]
\[
\tag{environments}
\tilde{\rho} \in \widetilde{Env} \triangleq Var \to \widetilde{Addr}
\]
\[
\tag{stores}
\tilde{\sigma} \in \widetilde{Store} \triangleq \widetilde{Addr} \to \widetilde{Value}
\]
\[
\tag{abstract values}
\tilde{v} \in \widetilde{Value} \triangleq \mathcal{P}(\widetilde{Closure})
\]
\[
\tag{closures}
\widetilde{clo} \in \widetilde{Closure} \triangleq Lambda \times \widetilde{Env}
\]
\[
\tag{continuation stores}
\tilde{\sigma}_k \in \widetilde{KStore} \triangleq \widetilde{KAddr} \to \widetilde{Kont}
\]
\[
\tag{abstract continuations}
\widetilde{k} \in \widetilde{Kont} \triangleq \mathcal{P}(\widetilde{Frame})
\]
\[
\tag{stack frames}
\widetilde{\phi} \in \widetilde{Frame} \triangleq Var \times Exp \times \widetilde{Env} \times \widetilde{KAddr}
\]
\[
\tag{value addresses}
\tilde{a} \in \widetilde{Addr} \mbox{ is a finite set}
\]
\[
\tag{continuation addresses}
\tilde{a}_k \in \widetilde{KAddr} \mbox{ is a finite set}
\]
Environments ($\tilde{\rho}$) map variables to their binding address ($\tilde{a}$) in the scope.
The original AAM paper uses just one store in a state to contain values and continuations both, but we prefer to separate it to value store ($\tilde{\sigma}$) and continuation store ($\tilde{\sigma}_k$) to clarify our algorithm.
Value stores save every value ($\tilde{v}$) into a slot encoded by an address. Environments cooperate with values implementing the semantics of variable access.
Closure ($\widetilde{clo}$) is the only value form of pure lambda calculus, which pairs a lambda abstraction with the environment form its defining point to implement static scoping.
In our semantics, continuations ($\widetilde{k}$) just represent call stack frames because intra-procedural continuations are already converted to \verb|let| sequences.
Each frame ($\widetilde{\phi}$) includes:
(1) a return point that is a variable to accept and bind the result of current application,
(2) an expression the control flow returns to,
(3) an environment to restore,
(4) a continuation address that points to ``next'' continuations and builds up the linked stack structure.
Therefore, each state carries a continuation address ($\tilde{a}_k$) to replace the continuation component of concrete CESK machine state, which the a continuation address point to the actual continuations (frames) inhabiting in the continuation store.
This technique is referred to as \emph{store-allocated continuation}.
Transition rules of CESK abstract machine operate over an input state and generate a success state.
However, an abstracting abstract machine has to output a set of states due to the non-deterministic semantics of abstract interpretation.
Function application transition rule is defined below.
\[
\overbrace{
\big((let\ ((y\ (f\ \ae))\ e)), \tilde{\rho}, \tilde{\sigma}, \tilde{\sigma}_k, \tilde{a}_k \big)
}^{\tilde{\varsigma}}
\leadsto \big(e\textprime,\tilde{\rho}\textprime, \tilde{\sigma}\textprime, \tilde{\sigma}_k\textprime, \tilde{a}_k\textprime \big), \mbox{ where}
\]
\[
\big((\lambda\ (x)\ e\textprime), \tilde{\rho}_{\lambda} \big) \in \widetilde{eval}(f, \tilde{\rho}, \tilde{\sigma})
\]
\[
\tilde{\rho}\textprime = \tilde{\rho}_{\lambda}[x \mapsto \tilde{a}]
\]
\[
\tilde{\sigma}\textprime = \tilde{\sigma} \sqcup [\tilde{a} \mapsto \widetilde{eval}(\ae, \tilde{\rho}, \tilde{\sigma})]
\]
\[
\tilde{a} = \widetilde{alloc}(x, \tilde{\varsigma})
\]
\[
\widetilde{\phi} = {(y, e, \tilde{\rho}, \tilde{a}_k)}
\]
\[
\tilde{\sigma}_k\textprime = \tilde{\sigma}_k \sqcup [\tilde{a}_k\textprime \mapsto \widetilde{\phi}]
\]
\[
\tilde{a}_k\textprime = \widetilde{kalloc}(\tilde{\varsigma}, e\textprime, \tilde{\rho}\textprime, \tilde{\sigma}\textprime)
\]
When we start to analyze call sites, $\widetilde{eval}$ firstly extracts closures from $f$ that is always an atomic expression in ANF programs.
The helper $\widetilde{eval}$ directly computes values of atomic expressions that is either a variable access point or lambda abstraction in pure lambda calculus.
\[
\widetilde{eval} : AExp \times \widetilde{Env} \times \widetilde{Store} \to \widetilde{Value}
\]
\[
\widetilde{eval}(x, \tilde{\rho}, \tilde{\sigma}) \triangleq \tilde{\sigma}(\tilde{\rho}(x))
\]
\[
\widetilde{eval}(lambda, \tilde{\rho}, \tilde{\sigma}) \triangleq \{(lambda, \tilde{\rho})\}
\]
Then argument is also evaluated and stored in a corresponding address.
Environments restored from closures are extended by the formal parameter and actual parameter's address.
In monovariant analysis, the address is only determined by expression's syntactic label, so the value addresses are always context-insensitive. Furthermore, we can use certain context information of program execution to separate values into different dimensions of addresses.
For example, following definition of $\widetilde{alloc_1}$ encodes the closest call site into value addresses to implement 1-call-site sensitive analysis (1-CFA).
\[
\widetilde{alloc} : Var \times \tilde{\Sigma} \to \widetilde{Addr}
\]
\[
\widetilde{alloc_0} (x, \tilde{\varsigma}) = x
\]
\[
\widetilde{alloc_1} (x, \tilde{\varsigma}) = (x, \tilde{\varsigma})
\]
Following the semantics of call-by-value lambda calculus, after achieving values of callees and arguments, a call stack frame ($\widetilde{\phi}$) is pushed on the top ($\tilde{a}_k$) of stack (continuation store, $\tilde{\sigma}_k$).
Meanwhile, a new stack top ($\tilde{a}_k\textprime$) is allocated by $\widetilde{kalloc}$.
The standard method of allocating continuation addresses in AAM is shown below, which represents the function entry point by its own syntactic label. Then, the entry point representation will be propagated to return states of the application.
\[
\widetilde{kalloc} : \tilde{\Sigma} \times Exp \times \widetilde{Env} \times \widetilde{Store} \to \widetilde{KAddr}
\]
\[
\widetilde{kalloc}((e, \tilde{\rho}, \tilde{\sigma}, \tilde{\sigma}_k, \tilde{a}_k), e\textprime, \tilde{\rho}\textprime, \tilde{\varsigma}\textprime) = e\textprime
\]
Additionally, AAM implements over-approximation of abstract interpretation by a join operation over value and continuation stores.
The join is defined as follows.
\[
\tilde{\sigma} \sqcup \tilde{\sigma}\textprime = \lambda \tilde{a}.\ \tilde{\sigma}(\tilde{a}) \cup \tilde{\sigma}\textprime(\tilde{a})
\]
\[
\tilde{\sigma}_k \sqcup \tilde{\sigma}_k\textprime = \lambda \tilde{a}_k.\ \tilde{\sigma}_k(\tilde{a}_k) \cup \tilde{\sigma}_k\textprime(\tilde{a}_k)
\]
The declaration transition rule is very simple, which just spreads context information of abstract interpretation along \verb|let| forms.
\[
\overbrace{
\big((let\ ((y\ \ae)\ e)), \tilde{\rho}, \tilde{\sigma}, \tilde{\sigma}_k, \tilde{a}_k \big)
}^{\tilde{\varsigma}}
\leadsto \big(e,\tilde{\rho}\textprime, \tilde{\sigma}\textprime, \tilde{\sigma}_k, \tilde{a}_k \big), \mbox{ where}
\]
\[
\tilde{\rho}\textprime = \tilde{\rho}[y \mapsto \tilde{a}]
\]
\[
\tilde{\sigma}\textprime = \tilde{\sigma} \sqcup [\tilde{a} \mapsto \widetilde{eval}(\ae, \tilde{\rho}, \tilde{\sigma})]
\]
\[
\tilde{a} = \widetilde{alloc}(y, \tilde{\varsigma})
\]
The transition of return point is another crucial rule.
\[
\overbrace{
(\ae, \tilde{\rho}, \tilde{\sigma}, \tilde{\sigma}_k, \tilde{a}_k)
}^{\tilde{\varsigma}}
\leadsto (e, \tilde{\rho}\textprime, \tilde{\sigma}\textprime, \tilde{\sigma}_k, \tilde{a}_k\textprime)
\]
\[
(x, e, \tilde{\rho}_k, \tilde{a}_k\textprime) \in \tilde{\sigma}_k(\tilde{a}_k)
\]
\[
\tilde{\rho}\textprime = \tilde{\rho}_k[x \mapsto \tilde{a}]
\]
\[
\tilde{\sigma}\textprime = \tilde{\sigma} \sqcup [\tilde{a} \mapsto \widetilde{eval}(\ae, \tilde{\rho}, \tilde{\sigma})]
\]
\[
\tilde{a} = \widetilde{alloc}(x, \tilde{\varsigma})
\]
The top frame is retrieved in continuation store with the current continuation address ($\tilde{a}_k$).
Firstly, we acquire a return point variable $x$ to refer the return value of current application, and extend environment $\tilde{\rho}_k$ with the return point to $\tilde{\rho}\textprime$.
Then, computation keeps going on expression $e$ with environment $\tilde{\rho}\textprime$, store $\tilde{\sigma}\textprime$, stack $\tilde{\sigma}_k$, and ``next'' continuation address $\tilde{a}_k\textprime$.
When we launch AAM on a program, $inject$ takes the program to create an initial state.
\[
inject : Exp \to \tilde{\Sigma}
\]
\[
inject(e) = (e, \varnothing, \bot, \bot, \tilde{a_k}_{init})
\]
The abstract interpreter starts to analyze a program from the initial state with empty environment, bottom stores, and a special continuation address.
The address $\tilde{a_k}_{init}$ represents the bottom of call stack.
The transition relation we defined above is a monotonic function that is used by a worklist algorithm.
\begin{algorithm}
\caption{Worklist Algorithm}
\begin{algorithmic}
%\Procedure{MyProcedure}{}
\State $\textit{initState} \gets inject(program)$
\State $\textit{todo} \gets \textit{initState} :: Nil$
\State $\textit{seen} \gets \textit{initState} :: Nil$
\While{$\textit{todo} \neq Nil$}
\State $\textit{state} \gets head(\textit{todo})$
\State $\textit{todo} \gets tail(\textit{todo})$
\State $\textit{nexts} \gets transition_{AAM}(\textit{state})$
\For{$\textit{n} \in \textit{nexts}$}
\If{$\textit{n} \notin \textit{seen}$}
\State $\textit{seen} \gets \textit{n} :: \textit{seen}$
\State $\textit{todo} \gets \textit{n} :: \textit{todo}$
\EndIf
\EndFor
\EndWhile
\end{algorithmic}
\end{algorithm}
Because AAM saves everything (values and continuations) in store, the number of $\tilde{\Sigma}$ is finite if store size is limited.
Therefore, the worklist algorithm is always able to terminate even though the input program cannot terminate in concrete semantics.
\section{Store-widening}
\label{sub:Store-widenning}
Theoretically, naive implementations of AAM take exponential time in the input program size.
The time complexity of worklist algorithm is determined by the number of reachable machine states.
\[
O(
\overbrace{n}^{|Exp|} \times \overbrace{n}^{|\widetilde{Env}|} \times \overbrace{n^n}^{|\widetilde{Store}|}
\times \overbrace{n^n}^{|\widetilde{KStore}|} \times \overbrace{n}^{|\widetilde{KAddr}|}
)
\]
In monovariant analysis, values are always stored in locations that are only determined by syntactic positions of expressions.
Meanwhile, environments map each variable to only one corresponding address.
Because monovariant analysis does not carry any execution context during abstract interpretation, each expression always take only one environment.
Likewise, continuation addresses are also allocated on syntactic positions.
Thus, a tighter bound is:
\[
O(
(\overbrace{n}^{|Exp|} + \overbrace{n}^{|\widetilde{Env}|} + \overbrace{n}^{|\widetilde{KAddr}|}) \times \overbrace{n^n}^{|\widetilde{Store}|} \times \overbrace{n^n}^{|\widetilde{KStore}|}
)
\]
This complexity bound is still obviously exponential.
Consequently, AAM implementations usually adopt widening on stores.
Store widening uses a global store (single-threaded store, Shivers~\cite{shivers1991control}) rather than per-state stores for values and continuations respectively.
Global-store widening reduces the number of combinations of possible bindings in a store to $O(n^2)$,
which is proved in~\cite{van2010abstracting, gilray2016pushdown}.
\[
O(
(\overbrace{n}^{|Exp|} + \overbrace{n}^{|\widetilde{Env}|} + \overbrace{n}^{|\widetilde{KAddr}|}) \times (\overbrace{n^2}^{|\widetilde{Store}|} + \overbrace{n^2}^{|\widetilde{KStore}|})
)
\]
Eventually, the time complexity of AAM is $O(n^3)$ in monovariance.
\section{A Defect of AAM}
\label{sub:Defect}
Although, AAM imports store-allocated values and store-allocated continuations that make call/return matching orthogonal from context-sensitivity, $\widetilde{kalloc}$ (continuation address allocating strategy) cannot depend upon context information to implement limited call/return matching (like \textit{k}-CFA does).
P4F attempts to narrow the gap between original \textit{k}-CFA and AAM, so it defines the very simple $\widetilde{kalloc_{P4F}}$:
\[
\widetilde{kalloc_{P4F}} ((e, \tilde{\rho}, \tilde{\sigma}, \tilde{\sigma}_k, \tilde{a}_k), e\textprime, \tilde{\rho}\textprime, \tilde{\sigma}\textprime) = (e\textprime, \tilde{\rho}\textprime)
\]
Continuation addresses are represented by $(e\textprime, \tilde{\rho}\textprime)$ that the most obvious change is it packing callee function with ``target environment'' ($\tilde{\rho}\textprime$) of the current application.
Firstly, environments ($Var \to \widetilde{Addr}$) map variable names to value addresses in CESK abstract machines, and AAM encodes polyvariant strategy (e.g.\ call-site sensitive, object-sensitive, argument-sensitive, etc.) into value's addresses.
Thus, P4F can be regarded as an adaptive pushdown control flow analysis algorithm that automatically achieves finite call/return matching support from values' polyvariant strategy. %(implementing of $\widetilde{alloc}$ function).
Secondly, P4F also reveals a significant fact why original AAM misses call/return flow matching.
One of the most important contributions of AAM is that separates analysis context requirements from termination of abstract interpreters.
All things (values and continuations) allocated in the store make termination of abstract interpreters easily reached because the fixed size of stores lead finite number of abstract machine states, so any implementation of $\widetilde{alloc}$ and $\widetilde{kalloc}$ is sound.
However, the original $\widetilde{kalloc}$ function of AAM that mimics generating call stack frames of concrete interpreters does not acquire any benefit from values' polyvariance for getting more precise call/return flows.
P4F fixed the problem by introducing polyvariance into continuation store, which brings context information in target environment to distinguish continuations under different contexts.
Although P4F cannot infinitely match call/return flows, it still discovers the essence of pushdown control flow analysis in AAM\@: \emph{continuations also need to be polyvariant (context-sensitive) to achieve more precise static analysis results}.
\chapter{Pushdown CFA based on Program Execution History}
\label{sec:hcfa}
Inspired by P4F, we deem that pushdown analysis (polyvariant continuation store) is orthogonal from polyvariant store. In other words, control flow analysis can achieve call/return matching without polyvariant values.
At the same time, we try to find the proper contexts for polyvariant continuations.
This section describes CESK$^H$ machines that record ``program execution history'' into each abstract machine state.
The program execution history records and summarizes execution path from the beginning of program to the current state.
During the evaluation of function calls, the program execution history can be used to uniquely represent current call site in the continuation store.
\section{Program Execution History}
\label{sub:Program Execution History}
First, we modify the CESK machine defined in Section~\ref{subs:Abstracting Abstract Machine} to CESK$^H$ machine.
Data types and notations of CESK$^H$ are defined below. We changed parts of CESK definitions and indicate them with superscript $H$.
\[
\tag{states}
\widetilde{\varsigma^H}\in\widetilde{\Sigma^H} \triangleq Exp \times \widetilde{Env} \times \widetilde{Store}
\times \widetilde{KStore^H} \times \widetilde{KAddr^H} \times \widetilde{History}
\]
\[
\tag{environments}
\tilde{\rho} \in \widetilde{Env} \triangleq Var \to \widetilde{Addr}
\]
\[
\tag{stores}
\tilde{\sigma} \in \widetilde{Store} \triangleq \widetilde{Addr} \to \widetilde{Value}
\]
\[
\tag{abstract values}
\tilde{v} \in \widetilde{Value} \triangleq \mathcal{P}(\widetilde{Closure})
\]
\[
\tag{closures}
\widetilde{clo} \in \widetilde{Closure} \triangleq Lambda \times \widetilde{Env}
\]
\[
\tag{continuation stores}
\widetilde{\sigma_k^H} \in \widetilde{KStore^H} \triangleq \widetilde{KAddr^H} \to \widetilde{Kont^H}
\]
\[
\tag{abstract continuations}
\widetilde{k^H} \in \widetilde{Kont^H} \triangleq \mathcal{P}(\widetilde{Frame^H})
\]
\[
\tag{stack frames}
\widetilde{\phi^H} \in \widetilde{Frame^H} \triangleq Var \times Exp \times \widetilde{Env} \times \widetilde{History} \times \widetilde{KAddr^H}
\]
\[
\tag{histories}
\tilde{h} \in \widetilde{History} \triangleq Var \to \widetilde{Addr}
\]
\[
\tag{value addresses}
\tilde{a} \in \widetilde{Addr} \ is\ a\ finite\ set
\]
\[
\tag{continuation addresses}
\widetilde{a_k^H} \in \widetilde{KAddr} \ is\ a\ finite\ set
\]
In ANF programs, environment naturally maintains intra-procedural execution history because ANF explicitly extracts intra-procedural control flows in let-bindings and saves each intermediate result in a local variable.
Consequently, the program execution histories can be implemented as propagating environments by $\widetilde{History}$ field of CESK$^H$ machine states.
We consider execution histories as call-strings with automatically determined length. For non-recursive calls, execution history always provides enough precise context information, no matter how deep the call sequences. On the other hand, program execution histories can automatically stop growing for recursive calls, and the worklist algorithm will be responsible for finding the fixed-point of recursive computation.
The following definitions describe the abstract semantics of CESK$^H$ machine.
\paragraph{Calls}
\[
\overbrace{
\big((let\ ((y\ (f\ \ae))\ e)), \tilde{\rho}, \tilde{\sigma}, \widetilde{\sigma^H_k}, \widetilde{a^H_k}, \tilde{h} \big)
}^{\widetilde{\varsigma^H}}
\leadsto \big(e\textprime,\tilde{\rho}\textprime, \tilde{\sigma}\textprime, \widetilde{\sigma_k^H}\textprime, \widetilde{a_k^H}\textprime, \tilde{h}\textprime \big), where
\]
\[
\big((\lambda\ (x)\ e\textprime), \tilde{\rho}_{\lambda} \big) \in \widetilde{eval}(f, \tilde{\rho}, \tilde{\sigma})
\]
\[
\tilde{\rho}\textprime = \tilde{\rho}_{\lambda}[x \mapsto \tilde{a}]
\]
\[
\tilde{\sigma}\textprime = \tilde{\sigma} \sqcup [\tilde{a} \mapsto \widetilde{eval}(\ae, \tilde{\rho}, \tilde{\sigma})]
\]
\[
\tilde{a} = \widetilde{alloc}(x, \widetilde{\varsigma^H})
\]
\[
\widetilde{\phi^H} = {(y, e, \tilde{\rho}, \tilde{h}, \widetilde{a_k^H})}
\]
\[
\widetilde{\sigma_k^H}\textprime = \widetilde{\sigma_k^H} \sqcup [\widetilde{a_k^H}\textprime \mapsto \widetilde{\phi^H}]
\]
\[
\widetilde{a_k^H}\textprime = \widetilde{kalloc_h}(\tilde{\varsigma}, e\textprime, \tilde{\rho}\textprime, \tilde{\sigma}\textprime)
\]
\[
\tilde{h}\textprime = \tilde{h}[x \mapsto \tilde{a}]
\]
The semantics of function calls propagate execution history by adding current ``intermediate variable'' to $\tilde{h}$, but execution history extension is different from environment extension, which recovers the base environment from function definition point.
\paragraph{Declarations}
\[
\overbrace{
\big((let\ ((y\ \ae)\ e)), \tilde{\rho}, \tilde{\sigma}, \widetilde{\sigma^H_k}, \widetilde{a^H_k}, \tilde{h} \big)
}^{\widetilde{\varsigma^H}}
\leadsto \big(e,\tilde{\rho}\textprime, \tilde{\sigma}\textprime, \widetilde{\sigma^H_k}, \widetilde{a^H_k}, \tilde{h}\textprime \big), \mbox{ where}
\]
\[
\tilde{\rho}\textprime = \tilde{\rho}[y \mapsto \tilde{a}]
\]
\[
\tilde{\sigma}\textprime = \tilde{\sigma} \sqcup [\tilde{a} \mapsto \widetilde{eval}(\ae, \tilde{\rho}, \tilde{\sigma})]
\]
\[
\tilde{a} = \widetilde{alloc}(y, \tilde{\varsigma})
\]
\[
\tilde{h}\textprime = \tilde{h}[y \mapsto \tilde{a}]
\]
Declarations are \verb|let| forms that just binds atomic expressions to variables.
Its semantics is very straightforward that propagates environments ($\tilde{\rho}$) and histories ($\tilde{h}$) through the linear control flow.
\paragraph{Returns}
\[
\overbrace{
(\ae, \tilde{\rho}, \tilde{\sigma}, \tilde{\sigma_k}, \widetilde{a_k^H}, \tilde{h})
}^{\widetilde{\varsigma^H}}
\leadsto (e, \tilde{\rho}\textprime, \tilde{\sigma}\textprime, \widetilde{\sigma_k^H}, \widetilde{a_k^H}\textprime, \tilde{h}\textprime)
\]
\[
(x, e, \tilde{\rho_k}, \tilde{h_k}, \widetilde{a_k^H}\textprime) \in \widetilde{\sigma_k^H}(\widetilde{a_k^H})
\]
\[
\tilde{\rho}\textprime = \tilde{\rho_k}[x \mapsto \tilde{a}]
\]
\[
\tilde{\sigma}\textprime = \tilde{\sigma} \sqcup [\tilde{a} \mapsto \widetilde{eval}(\ae, \tilde{\rho}, \tilde{\sigma})]
\]
\[
\tilde{a} = \widetilde{alloc}(x, \widetilde{\varsigma^H})
\]
\[
\tilde{h}\textprime = \tilde{h_k}[x \mapsto \tilde{a}]
\]
In {\em return}'s definition, an abstract interpreter restores up-level's history from the frames referred by the current continuation address, which is similar to restoring the environment.
The $\widetilde{kalloc_h}$ takes the execution history to compute the unique continuation address for corresponding call site.
\[
\widetilde{kalloc_h} ((e, \tilde{\rho}, \tilde{\sigma}, \widetilde{\sigma_k^H}, \widetilde{a_k^H}, \tilde{h}), e\textprime, \tilde{\rho}\textprime, \tilde{\sigma}\textprime) =
(e, e\textprime, \tilde{h})
\]
$\widetilde{KAddr^H}$ in CESK$^H$ machines is encoded by: (1) the call site $e$, (2) the callee function $e\textprime$, (3) and current execution history $\tilde{h}$. 0-CFA-like analysis in AAM just adopts $e\textprime$ to refer abstract continuations, so all the potential call sites that may invoke $e\textprime$ will merge with each others. Therefore, the $\widetilde{KAddr^H}$ definition distinguishes as many as possible call sites of $e\textprime$ via the very last call site $e$ and the rests encoded by $\tilde{h}$.
\section{Polyvariant Continuation}
\label{sub:Polyvariant Continuation}
In this section, we use a simple example in Figure~\ref{fig:anf-fib} to explain the analysis process of
\textit{h}-CFA.
\begin{figure}
\small
\lstset{language=Lisp,
keywords={letrec,lambda,let},
mathescape}
\begin{lstlisting}
(letrec ((fib (lambda (n)
(let ((res1 (< n 3)))
(if res1
1
(let* ((res2 ($-$ n 1))
(res3 (fib res2))
(res4 ($-$ n 2))
(res5 (fib res4))
(res6 (+ res3 res5)))
res6))))))
(let ((a (fib 10))
(b (fib 20)))
(fib 30)))
\end{lstlisting}
\caption[A recursive program example in ANF]{
An example written in ANF style defines a recursive function and calls it multiple times.
For convenient demonstrating, we use complete Scheme language with numbers and booleans instead of pure lambda calculus.
}
\label{fig:anf-fib}
\end{figure}
For simplicity, the execution histories are represented as variable sequences, and the called function (the second part of $\widetilde{KAddr^H}$) is replaced by its function name wearing a hat.
This simplification improves readability without modifying the abstract semantics of CESK$^H$ machine.
Through steps of the abstract interpretation, the first call site $(a\ (fib\ 10))$ carries the history $\{fib\}$,
which means that, at this program point, we have only finished computing the declaration of the function $fib$|.
Thus, the continuation (call stack frame) of the call site is allocated at $((fib\ 10), \widehat{fib}, \{fib\})$, and the stack frame looks like $(a, (let\ (b\ (fib\ 20))\ \dots), \widetilde{env_1}, \{fib\}, \widetilde{a^H_k{}_{init}})$, which is the only element in the continuation store so far.
The stack frame expresses that after completing this invocation,
(1) the return value will be stored in variable $a$,
(2) the computation will shift to $(let\ (b\ (fib\ 20))\ \dots)$ with environment $\widetilde{env_1}$,
(3) and the continuation address $\widetilde{a^H_k{}_{init}}$, a fake one for the top-level continuation, will be recovered.
After diving into the callee function, the second call appears at $(res3\ (fib\ res2))$. At this point, the execution history $\{fib, res1, res2\}$ is different from the history of last call site, so the continuation store contains two abstract continuations with distinct addresses.
\[
\widetilde{a^H_k{}_1} = ((fib\ 10), \widehat{fib}, \{fib\})
\]
\[
\widetilde{a^H_k{}_2} = ((fib\ res2), \widehat{fib}, \{fib, res1, res2\})
\]
\[
\begin{aligned}
\label{eq:show-stack}
\widetilde{\sigma_k^H} = \{ {}& \widetilde{a^H_k{}_1} \mapsto \{(a, (let\ (b\ (fib\ 20))\ \dots), \widetilde{env_1}, \{fib\}, \widetilde{a^H_k{}_{init}})\} {} \\
& \widetilde{a^H_k{}_2} \mapsto \{(res3, (let\ (res4\ (-\ n\ 2)) \dots), \widetilde{env_2}, \{fib, res1, res2\}, \widetilde{a^H_k{}_1})\} \}
\end{aligned}
\]
As above illustration shows, the continuation store is a stack with linked-list structure. Each frame has a $\widetilde{a^H_k}$ that points to the next frame in the stack. This stack-like structure perfectly mimics call stacks of concrete interpreters. Certainly, the call site $(fib\ res4)$ will also gets its own execution history after computation of $(fib\ res2)$ completes (i.e. after reaching its fixed-point).
\[
\widetilde{a^H_k{}_3} = ((fib\ res4), \widehat{fib}, \{fib, res1, res2, res3, res4\})
\]
However, $(res3\ (fib\ res2))$ is a recursive call site. So the execution history at this point will not add new element to distinguish $(res3\ (fib\ res2))$ from its variations at different recursive levels. Thus, control flows from multiple recursive levels of a call site are merged into one continuation address. Eventually, there are three frames merged into $\widetilde{a^H_k{}_2}$, but this merging does not lead ``static'' call/return mismatch. All of the frames merged into $\widetilde{a^H_k{}_2}$ can bring control flow back to set \verb|res3|.
\[
\begin{aligned}
\label{eq:show-stack}
\widetilde{a^H_k{}_2} \mapsto \{ {}& (res3, (let\ (res4\ (-\ n\ 2)) \dots), \widetilde{env_2}, \{fib, res1, res2\}, \widetilde{a^H_k{}_1}), {} \\
{}& (res3, (let\ (res4\ (-\ n\ 2)) \dots), \widetilde{env_3}, \{fib, res1, res2\}, \widetilde{a^H_k{}_2}) \}
\end{aligned}
\]
The merging expresses a fact that the invocation of $fib$| at point $(fib\ res2)$ may be made by $(a\ (fib\ 10))$ or $(res3\ (fib\ res2))$. Moreover, the second frame in the above illustration has the ``next'' pointer $\widetilde{a^H_k{}_2}$ that refers to itself. This cycle makes the continuation store no longer stack-like, but a graph.
After computing $(a\ (fib\ 10))$, the function $fib$ is called again by $(b\ (fib\ 20))$. At this point, $\widetilde{kalloc_h}$ generates a new continuation address.
\[
\widetilde{a^H_k{}_4} = ((fib\ 20), \widehat{fib}, \{fib, a\})
\]
The execution history of this point becomes $\{fib, a\}$ that summarizes the execution path of computing $(a\ (fib\ 10))$ to $a$. In other words, the program execution history just cares about which portions of the program we have done, but it ignores how we got them. This summarization limits the length of the execution histories under $O(\textit{n})$ (the size of input program) in the worst case.
Then the abstract interpreter restarts to execute the function and encounters call site $(res3\ (fib\ res2))$ again. At this time, the continuation address ($\widetilde{a^H_k{}_5}$) allocated for call site $(res3\ (fib\ res2))$ differs from last time, which makes sure that there are two distinct ``call stacks''. Consequently, function $fib$ called from $(b\ (fib\ 20))$ will never return to $(a\ (fib\ 10))$ and vice versa.
\[
\widetilde{a^H_k{}_5} = ((fib\ res2), \widehat{fib}, \{fib, a, res1, res2\})
\]
\section{Complexity and Precision of \textit{h}-CFA}
\label{sub:swh}
We have applied store-widening to \textit{h}-CFA for both the value store and the continuation store.
However, we have not obtained a polynomial time complexity for \textit{h}-CFA\@.
\begin{table}[t]
\centering
\label{compare}
\begin{tabular}{|c|c|c|c|c|c|}
\hline
Algorithm &
\begin{tabular}[c]{@{}c@{}}Match \\ Strength\end{tabular} &
\begin{tabular}[c]{@{}c@{}}Mono \\ -variance\end{tabular} &
\begin{tabular}[c]{@{}c@{}}Poly \\ -variance\end{tabular} &
Implementation &
\begin{tabular}[c]{@{}c@{}}Complexity on \\ Monovariance\end{tabular} \\ \hline
CFA2 & Infinite & \checkmark & & Difficult & Exponential \\ \hline
P4F & Limited & & \checkmark & Easy & $O(n^3)$ \\ \hline
PDCFA & Infinite & \checkmark & \checkmark & Difficult & $O(n^6)$ \\ \hline
AAC & Infinite & \checkmark & \checkmark & Easy & $O(n^9)$ \\ \hline
\textit{h}-CFA & Infinite & \checkmark & \checkmark & Easy & Exponential \\ \hline
\end{tabular}
\caption[Comparison of pushdown CFA algorithms]
{Comparison of pushdown CFA algorithms in terms of analysis precision, time complexity, and ease of implementation.}
\label{tab:comparison}
\end{table}
A comparison of the related pushdown CFA algorithms with \textit{h}-CFA is shown in Figure~\ref{tab:comparison}.
According to this table, our technique seems be worse than AAC in asymptotic upper bounds. However, in practice the performance of \textit{h}-CFA is better than AAC for most cases.
We have run both the \textit{h}-CFA implementation and AAC on test cases from {\em Larceny R6RS} benchmark suite
and some other examples.
Figure~\ref{0-benchmark} compares the number of states that \textit{h}-CFA and AAC explored in monovariant analysis and Figure~\ref{1-benchmark} shows the test with 1-call-site sensitivity.
Moreover, AAC has a major drawback, which is its space complexity in real world applications.
%AAC is another method for obtaining perfect stack precision in AAM\@.
The essential strategy of AAC is defined below~\cite{gilray2016pushdown}.
\[
\widetilde{kalloc}_{AAC}((e, \tilde{\rho}, \tilde{\sigma}, \tilde{a}_k), e\textprime, \tilde{\rho}\textprime, \tilde{\sigma}\textprime) = (e\textprime, \tilde{\rho}\textprime, e, \tilde{\rho}, \tilde{\sigma})
\]
The function $\widetilde{kalloc}_{AAC}$ encodes an unique continuation address for the call site with a target closure ($e\textprime, \tilde{\rho}\textprime$), a source closure ($e, \tilde{\rho}$), and store $\tilde{\sigma}$.
This strategy would work well if we use purely functional data structures to implement stores.
However, in realistic analyzers, functional data structures usually incurs considerable performance cost, and imperative stores will significantly increase the space complexity of AAC\@.
\begin{figure}
\centering
\pgfplotsset{width=6cm,compat=newest}
\minipage{0.48\textwidth}
\begin{tikzpicture}
\small
\begin{axis}[
symbolic x coords={mj09,eta,kcfa2,kcfa3,blur,loop2,sat,ack,cpstak,tak,fib,empty},
xtick=data,
ylabel=States,
enlargelimits=0.05,
legend style={at={(0.5,-0.33)},
anchor=north,legend columns=-1},
ybar interval=0.6,
x tick label style={rotate=90},
]
\addplot
coordinates {(mj09,54) (eta,55)
(kcfa2,100) (kcfa3,155) (blur,469) (loop2,72) (sat, 311)
(ack, 139) (cpstak, 109) (tak,379) (fib,206) (empty,0)};
\addplot
coordinates {(mj09,37) (eta,22)
(kcfa2,81) (kcfa3,168) (blur,197) (loop2,71) (sat, 548)
(ack, 86) (cpstak,98) (tak,239) (fib,108) (empty,0)};
\legend{AAC,\textit{h}-CFA}
\end{axis}
\end{tikzpicture}
\caption{Monovariant analysis performance comparison}
\label{0-benchmark}
\endminipage\hfill
\minipage{0.48\textwidth}
\begin{tikzpicture}
\small
\begin{axis}[
symbolic x coords={mj09,eta,kcfa2,kcfa3,blur,loop2,sat,ack,cpstak,tak,fib,empty},
xtick=data,
ylabel=States,
enlargelimits=0.05,
legend style={at={(0.5,-0.33)},
anchor=north,legend columns=-1},
ybar interval=0.7,
x tick label style={rotate=90},
]
\addplot
coordinates {(mj09,94) (eta,39)
(kcfa2,432) (kcfa3,1145) (blur,183) (loop2,354) (sat, 1999)
(ack, 645) (cpstak, 73) (tak, 1420) (fib,2273) (empty,0)};
\addplot
coordinates {(mj09,28) (eta,16)
(kcfa2,110) (kcfa3,348) (blur,63) (loop2,139) (sat, 734)
(ack, 363) (cpstak, 18) (tak, 945) (fib,1928) (empty,0)};
\legend{AAC,\textit{h}-CFA}
\end{axis}
\end{tikzpicture}
\caption{1-call-site sensitive analysis performance comparison}
\label{1-benchmark}
\endminipage\hfill
\end{figure}
Furthermore, we compared the call/return matching precision of several CFA algorithms in AAM, including \textit{k}-CFA-like, P4F, AAC, and \textit{h}-CFA\@.
Figure~\ref{0-precision} shows percentages of mismatching returns in monovariant analysis, where mismatching return states retrieve several different return points from their continuation address.
These different return points immediately produce spurious return flows, so the statistic data can represent the precision of call/return matching.
This figure indicate that \textit{h}-CFA does not have any mismatch return flows on all the programs (mismatching return percentages are 0\%), and P4F does not benefit to monovariant analysis due to its precision always same with 0-CFA-like analysis.
Figure~\ref{1-precision} provides the comparison on 1-call-site sensitive analysis, and \textit{h}-CFA is also the most accurate solution.
\begin{figure}
\center
\begin{tikzpicture}
\small
\begin{axis}[
symbolic x coords={mj09,eta,kcfa2,kcfa3,blur,loop2,sat,ack,cpstak,tak,fib,empty},
xtick=data,
ylabel=Percentage of Mismatching Returns,
yticklabel=\pgfmathparse{\tick}\pgfmathprintnumber{\pgfmathresult}\,\%,
enlargelimits=0.05,
legend style={at={(0.5,-0.25)},
anchor=north,legend columns=-1},
ybar interval=0.6,
x tick label style={rotate=90},
]
\addplot
coordinates {(mj09,37.5) (eta,13.3)
(kcfa2,85.7) (kcfa3,85.7) (blur,86.7) (loop2,0) (sat, 0)
(ack, 0) (cpstak, 0) (tak,75) (fib,90) (empty,0)};
\addplot
coordinates {(mj09,37.5) (eta,13.3)
(kcfa2,85.7) (kcfa3,85.7) (blur,86.7) (loop2,0) (sat, 0)
(ack, 0) (cpstak, 0) (tak,75) (fib,90) (empty,0)};
\addplot
coordinates {(mj09,5.6) (eta,0)
(kcfa2,23) (kcfa3,25) (blur,15.5) (loop2,0) (sat, 0)
(ack, 0) (cpstak, 0) (tak,30.7) (fib,0) (empty,0)};
\addplot
coordinates {(mj09,0) (eta,0)
(kcfa2,0) (kcfa3,0) (blur,0) (loop2,0) (sat, 0)
(ack, 0) (cpstak,0) (tak,0) (fib,0) (empty,0)};
\legend{\textit{k}-CFA-like, P4F, AAC,\textit{h}-CFA}
\end{axis}
\end{tikzpicture}
\caption{Monovariant analysis precision comparison}
\label{0-precision}
\end{figure}
\begin{figure}
\center
\begin{tikzpicture}
\small
\begin{axis}[
symbolic x coords={mj09,eta,kcfa2,kcfa3,blur,loop2,sat,ack,cpstak,tak,fib,empty},
xtick=data,
ylabel=Percentage of Mismatching Return,
yticklabel=\pgfmathparse{\tick}\pgfmathprintnumber{\pgfmathresult}\,\%,
enlargelimits=0.05,
legend style={at={(0.5,-0.25)},
anchor=north,legend columns=-1},
ybar interval=0.6,
x tick label style={rotate=90},
]
\addplot
coordinates {(mj09,26.1) (eta,20)
(kcfa2,98.9) (kcfa3,99.3) (blur,96.1) (loop2,0) (sat, 0)
(ack, 0) (cpstak, 0) (tak,98.2) (fib,97.6) (empty,0)};
\addplot
coordinates {(mj09,16.7) (eta,0)
(kcfa2,83.9) (kcfa3,91.7) (blur,0) (loop2,0) (sat, 0)
(ack, 0) (cpstak, 0) (tak,29.2) (fib,0) (empty,0)};
\addplot
coordinates {(mj09,0) (eta,0)
(kcfa2,0) (kcfa3,15.7) (blur,0) (loop2,0) (sat, 0)
(ack, 0) (cpstak, 0) (tak,23.2) (fib,0) (empty,0)};
\addplot
coordinates {(mj09,0) (eta,0)
(kcfa2,0) (kcfa3,0) (blur,0) (loop2,0) (sat, 0)
(ack, 0) (cpstak,0) (tak,0) (fib,0) (empty,0)};
\legend{\textit{k}-CFA-like, P4F, AAC,\textit{h}-CFA}
\end{axis}
\end{tikzpicture}
\caption{1-call-site sensitive analysis percision comparison}
\label{1-precision}
\end{figure}
To visually illustrate the call/return matching strength of P4F, PDCFA, PDCFA with abstract Garbage Collection (GC), and \textit{h}-CFA, we have
implemented them for Scheme language. We ran the four algorithms on a small program that is similar to the program showed in Figure~\ref{fig:anf-fib}.
The resulting state-transition graphs are shown in Figure~\ref{fig:state-graphs}.
\begin{figure}
\begin{center}
\begin{tabular}{ccc}
%\raisebox{1ex-\height}{
\includegraphics[height=2.5in]{1p4f.pdf}
%}
&
%\raisebox{1ex-\height}{
\includegraphics[height=2.5in]{pdcfaWOgc.pdf}
%}
\\
(a) P4F with 1-CFA
&
(b) PDCFA
\\
%\raisebox{1ex-\height}{
\includegraphics[height=2.5in]{pdcfa.pdf}
%}
&
%\raisebox{1ex-\height}{
\includegraphics[height=2.5in]{hcfa.pdf}
%}
\\
(c) PDCFA with \\Abstract Garbage Collection