Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Deadcode Elimination #1503

Merged
merged 134 commits into from Nov 11, 2023
Merged

Global Deadcode Elimination #1503

merged 134 commits into from Nov 11, 2023

Conversation

micahcantor
Copy link
Contributor

@micahcantor micahcantor commented Aug 3, 2023

This PR adds a new optimization pass to perform a global and block-aware liveness analysis. The primary purpose of this is to be able to remove unused code from functors that are instantiated more than once, a limitation of the current deadcode elimination algorithm. This issue has been known for some time, see #595.

Since this change is somewhat involved, I will split this comment into a few sections to make it easier to review. The commit history is messy, but the PR should be ready to review file-by-file. I've worked closely with @vouillon and @OlivierNicole on these changes, but there are still some remaining questions to address.

New changes

The primary new contribution is found in deadcode_dgraph.ml. There are four main steps to the analysis it performs:

  1. Collect all variable definitions and build the variable dependency graph (the usages function). A variable x is considered used in a variable y if either x appears in the definition of y, or x is applied as a block or closure argument to parameter y.
  2. An initial liveness analysis is performed by traversing the program AST (the liveness function). Here we mark a variable x as Top if it's used in an impure expression or instruction (more details are given in the doc comment). Otherwise it is marked as dead. This pass uses information from Global_flow to determine whether a return value is used at its callsites.
  3. The initial liveness state is propagated through the dependency graph using the flow graph solver (the solver and propagate functions). For each variable x in the graph, its liveness is defined by a join of it's current liveness and the contribution of each of it's usages y. More detail is given in the comment for contribution, but here we can determine if x is used only in a single field i of a block y, in which case x depends only on that field. Then it's marked as Live {i}.
  4. Using the completed liveness table, the dead variables and unused block fields are "zeroed" out. This is done by adding a sentinal variable to the beginning of the program defined by the JS value undefined. Any dead variables are replaced by the sentinal. This means the existing deadcode elimination should be able to remove these usages from the code, reducing the size. At this stage, we also truncate blocks that end with one or more sentinal values to the last non-sentinal variable.

Outside of deadcode_dgraph.ml, there are a few changes in the driver and a few other functions to integrate this pass into the optimization pipeline. The interface to the pass Deadcode_dgraph.f is called in the function exact_calls in driver.ml. This function runs after most other optimizations, which is important since global flow will break if it is ran beforehand with a different number of variables than it expects. Also, it ensures that this pass, which doesn't expose much opportunity for further optimization, is run only once before a final elimination pass.

We also added an optimization that deletes sentinal fields in arrays. Since the sentinal variable has the value undefined, the following JS transformation is valid:

[a, sentinal, b] -> [a, ,b]

This helps save a few extra bytes in the generated code. There were a few changes I made to the interface in generate.ml
and mlvalue.ml to facilitate this optimization. There are a few other minor changes that expose information from global flow and expose an undefined primitive that it needs.

Results and benchmarking

The overall effect of this process is that deadcode elimination is now sensitive to the liveness of individual fields within a block (rather than a block being entirely live or dead), and it can mark this liveness in the inputs and outputs of block functions. In the IR, functors are represented as functions from one block to another, where the member functions constitute the elements of these blocks. With this new pass, we can mark which member functions are live and remove the rest.

In practice, this means that if you instantiate a functor (like Set.Make in the stdlib) and use just a few functions from its interface (like add, find, etc) then the other 30 functions provided by the functor will be eliminated from the JS. This already occurs if the functor is instantiated only once, since in that case the compiler can specialize the block function to just a block and eliminate unused code inside. However, this could not occur if the functor is instantiated more than once.

Here's a minimal example demonstrating the effect. This program instantiates integer and string sets, and uses a few functions from the Set interface.

module Int_set = Set.Make (Int)
module String_set = Set.Make (String)

let int_set = Int_set.singleton 1 in
let string_set = String_set.empty in
let string_set = String_set.add "hello" string_set in
print_endline (string_of_int (Int_set.find 1 int_set) ^ (String_set.find "hello" string_set))

If we compile this to JS with and without the new pass, we get the following results:

Name Size Size Difference Size % Change
sets.js 32.6 KiB 0.0 B 0.0%
sets-gdc.js 23.6 KiB -9.0 KiB -27.66%

We see that a large portion of this small program was taken up by the definitions of all Set interface functions, which can now be removed. Indeed, we expect to see the most significant changes from this optimization when the input program is small and uses large functor interfaces.

It should be noted that the size of the code removed grows in relation to the number of functors used and (inversely) to the number of functions used from those interfaces. In a small program like this one, the 5-10kb removed by this optimization for each functor can be significant, but for larger programs the percent change will be much smaller.

For instance, another benchmark we used is the catala_web_interpreter:

Name Size Size Difference Size % Change
catala_web_interpreter.js 3.9 MiB 0.0 B 0.0%
catala_web_interpreter-gdc.js 3.9 MiB -8.8 KiB -0.22%

In this case we can remove 5.5kb, or just 0.22% of the program.

We also observe a modest decrease in size can be seen in another benchmark on toplevel code using lwt. The source can be found in /toplevel/examples/lwt_toplevel/toplevel.ml:

Name Size Size Difference Size % Change
toplevel.js 3.8 MiB 0.0 B 0.0%
toplevel-gdc.js 3.7 MiB -28.2 KiB -0.73%

Here we remove 28kb of code or a little less than 1% of the total size.

One real-world example that we saw encouraging results for was using the library ocamlgraph, which exposes a large functor interface to graph algorithms abstracted over the graph data structure. We compiled the demo found in the library source, and obtained the following:

Name Size Size Difference Size % Change
ocamlgraph-demo.js 151.1 KiB 0.0 B 0.0%
ocamlgraph-demo-gdc.js 137.2 KiB -13.9 KiB -9.2%

Here we're able to remove about 10kb or 9.2% of the program, which instantiates several of the functors provided by the library.

Conclusion

Overall, we expect this optimization to be useful for small web programs that want to use a functor interface like Set, Map or ocamlgraph without unnecessarily increasing the code size by 5-20kb. Larger programs may see a significant change if they internally expose many functor interfaces where they don't use many of the provided functions.

This change may also cause a small increase in compile times for some programs. I tested this by compiling ocamlc and the toplevel example using the hyperfine benchmarking program, and these were the results:

Benchmark 1: js_of_ocaml `which ocamlc.byte` -o ocamlc.js
  Time (mean ± σ):      9.020 s ±  1.069 s    [User: 8.845 s, System: 0.149 s]
  Range (min … max):    8.319 s … 11.888 s    10 runs

Benchmark 1: js_of_ocaml --enable globaldeadcode `which ocamlc.byte` -o ocamlc.js
  Time (mean ± σ):      9.542 s ±  0.883 s    [User: 9.319 s, System: 0.169 s]
  Range (min … max):    8.872 s … 11.885 s    10 runs

Benchmark 1: js_of_ocaml ./bc/toplevel.bc -o toplevel.js
  Time (mean ± σ):     18.680 s ±  1.354 s    [User: 18.312 s, System: 0.324 s]
  Range (min … max):   17.916 s … 22.493 s    10 runs

Benchmark 1: js_of_ocaml --enable globaldeadcode ./bc/toplevel.bc -o toplevel.js
  Time (mean ± σ):     18.875 s ±  1.124 s    [User: 18.530 s, System: 0.313 s]
  Range (min … max):   18.327 s … 21.996 s    10 runs

So on ocamlc the pass adds about 0.5s and on toplevel it adds about .2s on average.

Future Work

We found during testing that the optimization can fail to remove code from nested functors, (i.e. functors that take other functors as arguments), such as in the interfaces exposed by tyxml. We made progress on implementing a fix for this, but we didn't finish, so decided not to include that here.

@hhugo
Copy link
Member

hhugo commented Oct 26, 2023

When turning the optimization on by default, I see tests failure in the following places:

  • compiler/tests-ocaml/lib-printf/
  • compiler/tests-ocaml/lib-format/
  • toplevel/test

They seem to all involve printf/format

diff --git a/toplevel/test/test_toplevel.reference b/toplevel/test/test_toplevel.reference
index 2ab06fc0ce..0298111ab7 100644
--- a/toplevel/test/test_toplevel.reference
+++ b/toplevel/test/test_toplevel.reference
@@ -3,7 +3,6 @@ external parseInt : float -> int = "parseInt"
 let f = 3.14
 let () = Printf.printf "parseInt(%f) = %d\n" f (parseInt f);;
 Dynlink: looking for symbol parseInt
-parseInt(3.140000) = 3
 external parseInt : float -> int = "parseInt"
 val f : float = 3.14

@micahcantor
Copy link
Contributor Author

micahcantor commented Oct 30, 2023

When turning the optimization on by default, I see tests failure in the following places:

* compiler/tests-ocaml/lib-printf/

* compiler/tests-ocaml/lib-format/

* toplevel/test

They seem to all involve printf/format

diff --git a/toplevel/test/test_toplevel.reference b/toplevel/test/test_toplevel.reference
index 2ab06fc0ce..0298111ab7 100644
--- a/toplevel/test/test_toplevel.reference
+++ b/toplevel/test/test_toplevel.reference
@@ -3,7 +3,6 @@ external parseInt : float -> int = "parseInt"
 let f = 3.14
 let () = Printf.printf "parseInt(%f) = %d\n" f (parseInt f);;
 Dynlink: looking for symbol parseInt
-parseInt(3.140000) = 3
 external parseInt : float -> int = "parseInt"
 val f : float = 3.14

Hm, I haven't seen this before in earlier versions with it on by default. I took a quick look and nothing stood out to me, I'll try to take a closer look again soon.

Edit: In some of the failed tests, it looks like it could be an ordering problem? (If I'm reading the test output correctly)

- 190 191 192 193 194 195 196 197 198
+ 190 191 192 193 194 195
+********* Test number 195 failed ***********
+ 196 197 198

Like here, are we seeing the output 196 197 198 come after the test failed output? Maybe that's just an artifact of how the test is run though.

@hhugo hhugo merged commit e2bd24e into ocsigen:master Nov 11, 2023
15 checks passed
@hhugo
Copy link
Member

hhugo commented Nov 11, 2023

Thanks a lot for such a big contribution.

@micahcantor
Copy link
Contributor Author

Thank you for all the help getting this merged!!

@OlivierNicole
Copy link
Contributor

Congratulations @micahcantor for the merge! This is a great feature. Responding to the reviewers took substantial work, so thank you for spending time on this.

@hhugo
Copy link
Member

hhugo commented Nov 29, 2023

The global DCE does not preserve tail calls. We don't care normally since JavaScript does not support tail calls, but this makes a difference for the CPS transformation.

Maybe we can disable this transformation when effects are enabled?

| Return x, loc -> Return (zero_var x), loc

It seems that doing so can be incorrect because it could keep live some dead code that uses other zero-ed values. I'll this change,

@hhugo
Copy link
Member

hhugo commented Nov 29, 2023

cc @vouillon @micahcantor, see my comment above

hhugo pushed a commit to hhugo/opam-repository that referenced this pull request Dec 4, 2023
CHANGES:

## Features/Changes
* Compiler: global dead code elimination (Micah Cantor, ocsigen/js_of_ocaml#1503)
* Compiler: change control-flow compilation strategy (ocsigen/js_of_ocaml#1496)
* Compiler: loop no longer absorb the whole continuation
* Compiler: Dead code elimination of unused references (ocsigen/js_of_ocaml#2076)
* Compiler: reduce memory consumption (ocsigen/js_of_ocaml#1516)
* Compiler: support for import and export construct in the js parser/printer
* Lib: add download attribute to anchor element
* Misc: switch CI to OCaml 5.1
* Misc: preliminary support for OCaml 5.2
* Misc: support for OCaml 5.1.1

## Bug fixes
* Runtime: fix Dom_html.onIE (ocsigen/js_of_ocaml#1493)
* Runtime: add conversion functions + strict equality for compatibility with Wasm_of_ocaml (ocsigen/js_of_ocaml#1492)
* Runtime: Dynlink should be able to find symbols in jsoo_runtime ocsigen/js_of_ocaml#1517
* Runtime: fix Unix.lstat, Unix.LargeFile.lstat (ocsigen/js_of_ocaml#1519)
* Compiler: fix global flow analysis (ocsigen/js_of_ocaml#1494)
* Compiler: fix js parser/printer wrt async functions (ocsigen/js_of_ocaml#1515)
* Compiler: fix free variables pass wrt parameters' default value (ocsigen/js_of_ocaml#1521)
* Compiler: fix free variables for classes
* Compiler: fix internal invariant (continuation)
* Compiler: fix variable renaming for let, const and classes
* Lib: Url.Current.set_fragment need not any urlencode (ocsigen/js_of_ocaml#1497)
mseri pushed a commit to ocaml/opam-repository that referenced this pull request Dec 6, 2023
CHANGES:

## Features/Changes
* Compiler: global dead code elimination (Micah Cantor, ocsigen/js_of_ocaml#1503)
* Compiler: change control-flow compilation strategy (ocsigen/js_of_ocaml#1496)
* Compiler: loop no longer absorb the whole continuation
* Compiler: Dead code elimination of unused references (ocsigen/js_of_ocaml#2076)
* Compiler: reduce memory consumption (ocsigen/js_of_ocaml#1516)
* Compiler: support for import and export construct in the js parser/printer
* Lib: add download attribute to anchor element
* Misc: switch CI to OCaml 5.1
* Misc: preliminary support for OCaml 5.2
* Misc: support for OCaml 5.1.1

## Bug fixes
* Runtime: fix Dom_html.onIE (ocsigen/js_of_ocaml#1493)
* Runtime: add conversion functions + strict equality for compatibility with Wasm_of_ocaml (ocsigen/js_of_ocaml#1492)
* Runtime: Dynlink should be able to find symbols in jsoo_runtime ocsigen/js_of_ocaml#1517
* Runtime: fix Unix.lstat, Unix.LargeFile.lstat (ocsigen/js_of_ocaml#1519)
* Compiler: fix global flow analysis (ocsigen/js_of_ocaml#1494)
* Compiler: fix js parser/printer wrt async functions (ocsigen/js_of_ocaml#1515)
* Compiler: fix free variables pass wrt parameters' default value (ocsigen/js_of_ocaml#1521)
* Compiler: fix free variables for classes
* Compiler: fix internal invariant (continuation)
* Compiler: fix variable renaming for let, const and classes
* Lib: Url.Current.set_fragment need not any urlencode (ocsigen/js_of_ocaml#1497)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants