Global Deadcode Elimination #1503

micahcantor · 2023-08-03T09:55:52Z

This PR adds a new optimization pass to perform a global and block-aware liveness analysis. The primary purpose of this is to be able to remove unused code from functors that are instantiated more than once, a limitation of the current deadcode elimination algorithm. This issue has been known for some time, see #595.

Since this change is somewhat involved, I will split this comment into a few sections to make it easier to review. The commit history is messy, but the PR should be ready to review file-by-file. I've worked closely with @vouillon and @OlivierNicole on these changes, but there are still some remaining questions to address.

New changes

The primary new contribution is found in deadcode_dgraph.ml. There are four main steps to the analysis it performs:

Collect all variable definitions and build the variable dependency graph (the usages function). A variable x is considered used in a variable y if either x appears in the definition of y, or x is applied as a block or closure argument to parameter y.
An initial liveness analysis is performed by traversing the program AST (the liveness function). Here we mark a variable x as Top if it's used in an impure expression or instruction (more details are given in the doc comment). Otherwise it is marked as dead. This pass uses information from Global_flow to determine whether a return value is used at its callsites.
The initial liveness state is propagated through the dependency graph using the flow graph solver (the solver and propagate functions). For each variable x in the graph, its liveness is defined by a join of it's current liveness and the contribution of each of it's usages y. More detail is given in the comment for contribution, but here we can determine if x is used only in a single field i of a block y, in which case x depends only on that field. Then it's marked as Live {i}.
Using the completed liveness table, the dead variables and unused block fields are "zeroed" out. This is done by adding a sentinal variable to the beginning of the program defined by the JS value undefined. Any dead variables are replaced by the sentinal. This means the existing deadcode elimination should be able to remove these usages from the code, reducing the size. At this stage, we also truncate blocks that end with one or more sentinal values to the last non-sentinal variable.

Outside of deadcode_dgraph.ml, there are a few changes in the driver and a few other functions to integrate this pass into the optimization pipeline. The interface to the pass Deadcode_dgraph.f is called in the function exact_calls in driver.ml. This function runs after most other optimizations, which is important since global flow will break if it is ran beforehand with a different number of variables than it expects. Also, it ensures that this pass, which doesn't expose much opportunity for further optimization, is run only once before a final elimination pass.

We also added an optimization that deletes sentinal fields in arrays. Since the sentinal variable has the value undefined, the following JS transformation is valid:

[a, sentinal, b] -> [a, ,b]

This helps save a few extra bytes in the generated code. There were a few changes I made to the interface in generate.ml
and mlvalue.ml to facilitate this optimization. There are a few other minor changes that expose information from global flow and expose an undefined primitive that it needs.

Results and benchmarking

The overall effect of this process is that deadcode elimination is now sensitive to the liveness of individual fields within a block (rather than a block being entirely live or dead), and it can mark this liveness in the inputs and outputs of block functions. In the IR, functors are represented as functions from one block to another, where the member functions constitute the elements of these blocks. With this new pass, we can mark which member functions are live and remove the rest.

In practice, this means that if you instantiate a functor (like Set.Make in the stdlib) and use just a few functions from its interface (like add, find, etc) then the other 30 functions provided by the functor will be eliminated from the JS. This already occurs if the functor is instantiated only once, since in that case the compiler can specialize the block function to just a block and eliminate unused code inside. However, this could not occur if the functor is instantiated more than once.

Here's a minimal example demonstrating the effect. This program instantiates integer and string sets, and uses a few functions from the Set interface.

module Int_set = Set.Make (Int)
module String_set = Set.Make (String)

let int_set = Int_set.singleton 1 in
let string_set = String_set.empty in
let string_set = String_set.add "hello" string_set in
print_endline (string_of_int (Int_set.find 1 int_set) ^ (String_set.find "hello" string_set))

If we compile this to JS with and without the new pass, we get the following results:

Name	Size	Size Difference	Size % Change
sets.js	32.6 KiB	0.0 B	0.0%
sets-gdc.js	23.6 KiB	-9.0 KiB	-27.66%

We see that a large portion of this small program was taken up by the definitions of all Set interface functions, which can now be removed. Indeed, we expect to see the most significant changes from this optimization when the input program is small and uses large functor interfaces.

It should be noted that the size of the code removed grows in relation to the number of functors used and (inversely) to the number of functions used from those interfaces. In a small program like this one, the 5-10kb removed by this optimization for each functor can be significant, but for larger programs the percent change will be much smaller.

For instance, another benchmark we used is the catala_web_interpreter:

Name	Size	Size Difference	Size % Change
catala_web_interpreter.js	3.9 MiB	0.0 B	0.0%
catala_web_interpreter-gdc.js	3.9 MiB	-8.8 KiB	-0.22%

In this case we can remove 5.5kb, or just 0.22% of the program.

We also observe a modest decrease in size can be seen in another benchmark on toplevel code using lwt. The source can be found in /toplevel/examples/lwt_toplevel/toplevel.ml:

Name	Size	Size Difference	Size % Change
toplevel.js	3.8 MiB	0.0 B	0.0%
toplevel-gdc.js	3.7 MiB	-28.2 KiB	-0.73%

Here we remove 28kb of code or a little less than 1% of the total size.

One real-world example that we saw encouraging results for was using the library ocamlgraph, which exposes a large functor interface to graph algorithms abstracted over the graph data structure. We compiled the demo found in the library source, and obtained the following:

Name	Size	Size Difference	Size % Change
ocamlgraph-demo.js	151.1 KiB	0.0 B	0.0%
ocamlgraph-demo-gdc.js	137.2 KiB	-13.9 KiB	-9.2%

Here we're able to remove about 10kb or 9.2% of the program, which instantiates several of the functors provided by the library.

Conclusion

Overall, we expect this optimization to be useful for small web programs that want to use a functor interface like Set, Map or ocamlgraph without unnecessarily increasing the code size by 5-20kb. Larger programs may see a significant change if they internally expose many functor interfaces where they don't use many of the provided functions.

This change may also cause a small increase in compile times for some programs. I tested this by compiling ocamlc and the toplevel example using the hyperfine benchmarking program, and these were the results:

Benchmark 1: js_of_ocaml `which ocamlc.byte` -o ocamlc.js
  Time (mean ± σ):      9.020 s ±  1.069 s    [User: 8.845 s, System: 0.149 s]
  Range (min … max):    8.319 s … 11.888 s    10 runs

Benchmark 1: js_of_ocaml --enable globaldeadcode `which ocamlc.byte` -o ocamlc.js
  Time (mean ± σ):      9.542 s ±  0.883 s    [User: 9.319 s, System: 0.169 s]
  Range (min … max):    8.872 s … 11.885 s    10 runs

Benchmark 1: js_of_ocaml ./bc/toplevel.bc -o toplevel.js
  Time (mean ± σ):     18.680 s ±  1.354 s    [User: 18.312 s, System: 0.324 s]
  Range (min … max):   17.916 s … 22.493 s    10 runs

Benchmark 1: js_of_ocaml --enable globaldeadcode ./bc/toplevel.bc -o toplevel.js
  Time (mean ± σ):     18.875 s ±  1.124 s    [User: 18.530 s, System: 0.313 s]
  Range (min … max):   18.327 s … 21.996 s    10 runs

So on ocamlc the pass adds about 0.5s and on toplevel it adds about .2s on average.

Future Work

We found during testing that the optimization can fail to remove code from nested functors, (i.e. functors that take other functors as arguments), such as in the interfaces exposed by tyxml. We made progress on implementing a fix for this, but we didn't finish, so decided not to include that here.

…caml into new_deadcode

hhugo · 2023-10-26T10:52:16Z

When turning the optimization on by default, I see tests failure in the following places:

compiler/tests-ocaml/lib-printf/
compiler/tests-ocaml/lib-format/
toplevel/test

They seem to all involve printf/format

diff --git a/toplevel/test/test_toplevel.reference b/toplevel/test/test_toplevel.reference
index 2ab06fc0ce..0298111ab7 100644
--- a/toplevel/test/test_toplevel.reference
+++ b/toplevel/test/test_toplevel.reference
@@ -3,7 +3,6 @@ external parseInt : float -> int = "parseInt"
 let f = 3.14
 let () = Printf.printf "parseInt(%f) = %d\n" f (parseInt f);;
 Dynlink: looking for symbol parseInt
-parseInt(3.140000) = 3
 external parseInt : float -> int = "parseInt"
 val f : float = 3.14

micahcantor · 2023-10-30T15:37:16Z

When turning the optimization on by default, I see tests failure in the following places:

* compiler/tests-ocaml/lib-printf/

* compiler/tests-ocaml/lib-format/

* toplevel/test

They seem to all involve printf/format

diff --git a/toplevel/test/test_toplevel.reference b/toplevel/test/test_toplevel.reference
index 2ab06fc0ce..0298111ab7 100644
--- a/toplevel/test/test_toplevel.reference
+++ b/toplevel/test/test_toplevel.reference
@@ -3,7 +3,6 @@ external parseInt : float -> int = "parseInt"
 let f = 3.14
 let () = Printf.printf "parseInt(%f) = %d\n" f (parseInt f);;
 Dynlink: looking for symbol parseInt
-parseInt(3.140000) = 3
 external parseInt : float -> int = "parseInt"
 val f : float = 3.14

Hm, I haven't seen this before in earlier versions with it on by default. I took a quick look and nothing stood out to me, I'll try to take a closer look again soon.

Edit: In some of the failed tests, it looks like it could be an ordering problem? (If I'm reading the test output correctly)

- 190 191 192 193 194 195 196 197 198
+ 190 191 192 193 194 195
+********* Test number 195 failed ***********
+ 196 197 198

Like here, are we seeing the output 196 197 198 come after the test failed output? Maybe that's just an artifact of how the test is run though.

compiler/lib/global_deadcode.ml

compiler/tests-compiler/gh1007.ml

hhugo · 2023-11-11T08:44:27Z

Thanks a lot for such a big contribution.

micahcantor · 2023-11-11T16:03:52Z

Thank you for all the help getting this merged!!

OlivierNicole · 2023-11-13T10:26:12Z

Congratulations @micahcantor for the merge! This is a great feature. Responding to the reviewers took substantial work, so thank you for spending time on this.

hhugo · 2023-11-29T07:53:22Z

The global DCE does not preserve tail calls. We don't care normally since JavaScript does not support tail calls, but this makes a difference for the CPS transformation.

Maybe we can disable this transformation when effects are enabled?

js_of_ocaml/compiler/lib/global_deadcode.ml

Line 355 in 8d1841f

| Return x, loc -> Return (zero_var x), loc

It seems that doing so can be incorrect because it could keep live some dead code that uses other zero-ed values. I'll this change,

hhugo · 2023-11-29T07:55:32Z

cc @vouillon @micahcantor, see my comment above

CHANGES: ## Features/Changes * Compiler: global dead code elimination (Micah Cantor, ocsigen/js_of_ocaml#1503) * Compiler: change control-flow compilation strategy (ocsigen/js_of_ocaml#1496) * Compiler: loop no longer absorb the whole continuation * Compiler: Dead code elimination of unused references (ocsigen/js_of_ocaml#2076) * Compiler: reduce memory consumption (ocsigen/js_of_ocaml#1516) * Compiler: support for import and export construct in the js parser/printer * Lib: add download attribute to anchor element * Misc: switch CI to OCaml 5.1 * Misc: preliminary support for OCaml 5.2 * Misc: support for OCaml 5.1.1 ## Bug fixes * Runtime: fix Dom_html.onIE (ocsigen/js_of_ocaml#1493) * Runtime: add conversion functions + strict equality for compatibility with Wasm_of_ocaml (ocsigen/js_of_ocaml#1492) * Runtime: Dynlink should be able to find symbols in jsoo_runtime ocsigen/js_of_ocaml#1517 * Runtime: fix Unix.lstat, Unix.LargeFile.lstat (ocsigen/js_of_ocaml#1519) * Compiler: fix global flow analysis (ocsigen/js_of_ocaml#1494) * Compiler: fix js parser/printer wrt async functions (ocsigen/js_of_ocaml#1515) * Compiler: fix free variables pass wrt parameters' default value (ocsigen/js_of_ocaml#1521) * Compiler: fix free variables for classes * Compiler: fix internal invariant (continuation) * Compiler: fix variable renaming for let, const and classes * Lib: Url.Current.set_fragment need not any urlencode (ocsigen/js_of_ocaml#1497)

micahcantor and others added 30 commits June 21, 2023 14:59

init

ca8948a

Merge branch 'ocsigen:master' into new_deadcode

a39b91a

more progress

31fe9f0

Merge branch 'new_deadcode' of https://github.com/micahcantor/js_of_o…

ee93b38

…caml into new_deadcode

prep for debugging

55e413d

more updates, debugging info

c63bd52

updates

dbdf3bb

more updates

4a389e3

initial support for annotating blocks

f8e73ef

more updates

7d6e5bf

join propagation

e42a15b

remove old comments

a37fa06

add block param defs

647d695

fix equality, dep contribution

ccc5dd5

move some logic from defs to deps

cacad1d

clean up small bugs

3b7e811

don't count closure params as used

85ed6ae

add initial elimination alg

eb7c895

rearranging

20ca40d

update deadcode sig

49d5f9a

expose expr print

bfd58d5

fix filter args and closure cont

b3db3dd

fix bug with cond

3a97925

add sentinal var

4e8097a

add basic compaction block pass

c940f38

fix pushtrap variable

a61e0b2

expose print constant

12fdc47

initial global support

56902fe

more global support

49464e0

update print uses

936c3e4

Compiler: add purity for jsoo-special primitives

99caa92

make globaldeadcode default on

ddc3957

vouillon reviewed Oct 30, 2023

View reviewed changes

compiler/lib/global_deadcode.ml Outdated Show resolved Hide resolved

micahcantor and others added 4 commits October 30, 2023 14:16

Fix offset_ref liveness

1c492f0

promote output changes from default on

70664e9

Merge branch 'master' into new_deadcode

acd3908

tune

8d1841f

vouillon approved these changes Nov 9, 2023

View reviewed changes

micahcantor and others added 2 commits November 9, 2023 19:53

change log

47a69f9

Merge branch 'master' into new_deadcode

de564a6

hhugo reviewed Nov 10, 2023

View reviewed changes

compiler/tests-compiler/gh1007.ml Outdated Show resolved Hide resolved

hhugo added 4 commits November 10, 2023 08:46

no global deadcode if no deadcode

d4c57b9

rm trailing space

dffa09a

accept

1a135ea

doc

a18be46

hhugo merged commit e2bd24e into ocsigen:master Nov 11, 2023
15 checks passed

This was referenced Dec 1, 2023

[new release] js_of_ocaml (7 packages) (5.5.0) ocaml/opam-repository#24883

Closed

[new release] js_of_ocaml (7 packages) (5.5.1) ocaml/opam-repository#24888

Closed

This was referenced Dec 5, 2023

Compiler: restore TCO of mutually recursive functions #1539

Merged

[new release] js_of_ocaml (7 packages) (5.5.2) ocaml/opam-repository#24896

Merged

hhugo mentioned this pull request Jan 21, 2024

Compiler: fix toplevel with globaldeacode #1556

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global Deadcode Elimination #1503

Global Deadcode Elimination #1503

micahcantor commented Aug 3, 2023 •

edited

hhugo commented Oct 26, 2023

micahcantor commented Oct 30, 2023 •

edited

hhugo commented Nov 11, 2023

micahcantor commented Nov 11, 2023

OlivierNicole commented Nov 13, 2023

hhugo commented Nov 29, 2023

hhugo commented Nov 29, 2023

Global Deadcode Elimination #1503

Global Deadcode Elimination #1503

Conversation

micahcantor commented Aug 3, 2023 • edited

New changes

Results and benchmarking

Conclusion

Future Work

hhugo commented Oct 26, 2023

micahcantor commented Oct 30, 2023 • edited

hhugo commented Nov 11, 2023

micahcantor commented Nov 11, 2023

OlivierNicole commented Nov 13, 2023

hhugo commented Nov 29, 2023

hhugo commented Nov 29, 2023

micahcantor commented Aug 3, 2023 •

edited

micahcantor commented Oct 30, 2023 •

edited