IRC, iterated register coalescing (draft implementation) #678

xclerc · 2022-06-02T15:57:52Z

Overview

The code is mainly a naive/direct implementation/translation of the
paper (https://dl.acm.org/doi/abs/10.1145/229542.229546), except
the freeze_moves function which is actually taken from the book
(https://www.cs.princeton.edu/~appel/modern/ml/).
Small/trivial additions to the paper include:

handling of register classes, by using per-class values for k
(an early version was adding interference between registers of
different classes, which obviously results in a denser graph)
use of Proc.destroyed_at_xyz
handling of written-but-never-read temporaries
a "naive" splitting, whose objective is to introduce a number of
spill/reload moves linear in the number of destruction points, rather
than linear in the number of read/writes when a temporary is spilled
if there are destruction points in the function, rewrite is called
before main (indeed, otherwise the first round of the loop will
basically discover that the values live at the destruction points
needs to be spilled, then call rewrite before launching a second
round -- we hence essentially save one round by calling rewrite
beforehand when we know temporaries will be spilled)
lots of log statements

Code

New modules:

Cfg_regalloc_utils contains generic utility functions;
Cfg_irc_utils contains more utility functions, used specifically by
the IRC allocator;
Cfg_irc_state defines the state of the IRC allocator; it is abstract
so that it is (relatively) easy to change the representation (e.g.
of the working sets/lists) with no impact on the core of the allocator;
Cfg_irc_split implements a pre-processing phase that performs
a kind of "naive" or "pseudo" splitting at destruction points;
essentially, if a register is live at a destruction point (currently only
calls), we rewrite:
```
destruction_point
```
to:
```
t' <- t
destruction_point[t/t']
t <- t'
```
Cfg_irc contains the core of the algorithm.

Modified modules:

Asmgen is tweaked to allow the selection of the register allocator,
and collect metrics;
Cfg is augmented with utility functions;
Cfg_intf is tweaked so that two fields of instruction, namely
arg and res, are now mutable, and a new field is added to store
the work list the instruction belongs to;
Cfg_dataflow and Cfg_liveness are tweaked so that the transfer
function take an additional parameter indicating whether the instruction
can raise;
Cfg_equivalence is tweaked to disable the comparison of live set on
certain instructions;
Emit is augmented to collect metrics (number of instructions, moves,
spills and reloads);
Reg is tweaked so that a register now holds its class, its work list,
its color, and its alias.

Use

At this point, the register allocators are enabled and controlled by
environment variables. REGISTER_ALLOCATOR selects the register
allocator:

if the variable is unset, the existing register allocator will be used;
if the variable is set to irc, then the IRC allocator will be used
(note that case is ignored).

All allocators can use the REGALLOC_STATS variable to enable the production
of metrics. When the variable is set, its value is interpreted as the path to
a directory where quasi-CSV files will be produced. One file will be produced
by compiler invocation, and its base name will be the MD5 digest of the
command-line parameter. The first line of the file will contain the command-line
arguments, while the other lines are CSV contents. Each CSV line contains the
various metrics for a given function.

The IRC allocator can be further controlled through the following variables:

IRC_VERBOSE to output a lot of debug statements;
IRC_INVARIANTS to check invariants at each step of the process;
IRC_SPLIT to select the kind of splitting to apply before register
allocation proper (off for no splitting and naive for the pseudo-
or naive splitting described above);
IRC_SPILLING_HEURISTICS to select the heuristics to use when
choosing a register to spill (set-choose to randomly choose a
register and flat-uses to choose the least-used register).

Generated code

The current version of IRC is worse than the current register allocator
in at least two areas: spilling code and memory operands.

The original article will produce a spill (resp. reload) for each write (resp. read)
use of a register that is being spilled (this is the behaviour with IRC_SPLIT set
to off). The naive splitting will produce a spill/reload for each destruction
point of a register that is being spilled (this is the behaviour with IRC_SPLIT
set to naive). Both are worse than the split pass implemented by the existing
register allocator.

The logic from the Reload module is not reused / reimplemented, which means
that all operands are in hardware register and that we do not take advantage of
the fact that some operations can accept one operand on the stack. Put
differently, the current implementation always reloads the spilled value into a
register before the operation while the existing register allocator can use the
value directly from the stack.

Tests and benchmarks

The IRC allocator has been successfully tested (building the distribution,
and running the test suite from upsteam) with closure, flambda and flambda2.

More numbers will be collected in the upcoming weeks, but here are some
rough numbers collected while building the standard library:

splitting	one round	all
off	0.81-0.86	1.10-1.11
naive	1.18-1.20	3.65-3.70

(the values above are the irc_time / upstream_time ratios)
("one round" means that the value is about the functions for which
the register allocator needs only one round, while "all" is about all
functions)

Still on the standard library, if we break down per function and look
at the statistics, we get:

splitting	stat	one round	all
off	min	0.28-0.32	0.20-0.38
off	mean	0.86-0.89	0.97-0.99
off	max	11.60-15.49	11.60-15.49
naive	min	0.20-0.26	0.18-0.22
naive	mean	0.99-1.00	1.09-1.10
naive	max	19.11-29.7	31.53-31.99

The total build times for the compiler distribution (thus not only
measuring register allocation) are:

with existing allocator: 1
with IRC and no splitting: 1.04
with IRC and naive split: 1.05

TO DO

check whether the Reg.clas field is really useful
check whether storing the liveness in the instructions would
make a material difference
work out what could be gained from other representation (both
for the IRC state and CFG values)
implement the "hierarchical" heuristics for spilling
implement proper spilling
implement the equivalent of (or simply reuse) Reload so that
we do not always reload a spilled value into a register before an
operation that can accept an operand from the stack

xclerc · 2022-06-02T15:59:22Z

(Caveat: the elements in the "Tests and benchmarks"
section were collected before rebasing from a1d36c7
to 9072c5d.)

gretay-js

I reviewed the code of this PR, alongside the Tiger book, it looks good to me. I think this PR should be merged as is, after it is rebased and CI tests for cfg reenabled (they should work now, since PR#656 turned off equivalence checking by default). Further comments should be addressed in separate PRs.

I have a few minor questions in the diff of the PR, but they shouldn't block this from being merged. Additionaly, suggestions / questions for later:

Create a separate directory for regalloc (either under cfg or next to it, not sure)
For testing, can we run a version of reload pass once to guarantee that the constrains that Emit relies on are satisfied after IRC. This requires reimplementing the logic of reload on cfg.
Separate the stats from Emit. Add a separate pass.
Stats: spills in upstream are either Ispill or Imove with res on
stack. Similarly for reloads. It may be useful to separate Imove to
spill and reload.
What will ocaml-ir do if a pass is missing. e.g, Compiler_hooks.Mach_spill is never called in the IRC pipeline?
Is it possible to refactor Asmgen.compile_fundecl to move stats out of it into a helper function, or somehow tidy up the code, it's a bit hard to follow (maybe it'll go away).
In cfg_liveness: has_an_exceptional_successor why is it needed?
document magic number "16" in cfg_dataflow
cfg_equivalence, check_live: can you please add a comment why live is
not checked for poptrap/pushtrap/prologue. These instructions don't exists in Mach. Do they get wrong made-up values somehow in Linear_to_cfg or Cfgize?
if we have mutable fields in Cfg.instruction we should make live mutable too again and see if it helps.
remove_prologue_if_not_required : why does it need to be added on always, instead of adding it on after regalloc and only if required? (there was something about it being after debug related instructions).
cfg_regalloc_utils.ml: maybe rename Fetch -> Load
Stats separate module
Cfg_regalloc_utils Instruction: something is wrong in our Cfg interface if we need this.
Cfg_state: common interface to the various worklists?
Is there a check that there is no edge between two Reg.t that are precolored in different colors?
Assert the item is in the expected worklist before removing it (e.g., remove_spill_work_list).

backend/cfg/cfg_liveness.ml

backend/cfg/merge_straightline_blocks.ml

backend/reg.ml

backend/cfg/cfg_regalloc_utils.mli

backend/cfg/cfg_irc.ml

backend/cfg/cfg_irc_state.ml

backend/cfg/cfg_irc.ml

poechsel · 2022-07-07T10:34:53Z

What will ocaml-ir do if a pass is missing. e.g, Compiler_hooks.Mach_spill is never called in the IRC pipeline?

Nothing apart from outputting an error message. I don't think many people rely on OCaml-IR supporting Mach_spill so I'm fine with IRC skipping Mach_spill.

gretay-js

I see that the CI tests with -ocamlcfg still fail after the rebase.
Is it possible that this line went missing in the rebase:
https://github.com/ocaml-flambda/flambda-backend/blob/d1ec75ab7892b2c814ee340303d94fc6f6e0eb40/backend/asmgen.ml#L349

mshinwell

Approving the changes to dune only.

gretay-js

The CI tests for ocamlcfg pass.

backend/.ocamlformat-enable

backend/asmgen.ml

backend/cfg/cfg_irc.ml

xclerc added cfg backend labels Jun 2, 2022

xclerc mentioned this pull request Jun 3, 2022

Improve the speed of register allocation (replace Set with Hashtbl) #553

Merged

mshinwell changed the title ~~IRC (draft implementation)~~ IRC, iterated register coalescing (draft implementation) Jun 3, 2022

xclerc marked this pull request as ready for review June 28, 2022 13:39

xclerc requested review from gretay-js and mshinwell as code owners June 28, 2022 13:39

gretay-js approved these changes Jul 6, 2022

View reviewed changes

xclerc added 3 commits July 7, 2022 14:17

Rebase

e579d7b

ocamlformat

7773a4b

Disable CFG tests.

55a9ab0

xclerc force-pushed the regalloc-irc branch from df468aa to 55a9ab0 Compare July 7, 2022 13:20

xclerc added 2 commits July 7, 2022 14:37

Restore ocamlcfg tests.

b28daba

Prologue

856cd77

gretay-js requested changes Jul 11, 2022

View reviewed changes

mshinwell approved these changes Jul 11, 2022

View reviewed changes

xclerc added 2 commits July 20, 2022 16:55

Revert dataflow changes.

993eac6

Revert dataflow changes.

94fe091

gretay-js approved these changes Jul 20, 2022

View reviewed changes

gretay-js and others added 2 commits July 22, 2022 08:00

Format

146e4f0

Use the dedicated flag.

5f94e6e

gretay-js approved these changes Jul 24, 2022

View reviewed changes

xclerc added 6 commits July 25, 2022 10:19

Review

9480616

Review

97052cd

CRs

a60de35

Review

7ec5db2

ocamlformat

61585a2

Arm64

0b7efde

xclerc added 8 commits July 25, 2022 11:46

Review

6f3873f

Review

b202295

Review

717511b

Review

ccd57dd

ocamlformat

c5100fd

Fix review change

4c4bed0

Debugging tweaks.

f1a2c62

ocamlformat

e163875

gretay-js reviewed Aug 4, 2022

View reviewed changes

backend/.ocamlformat-enable Outdated Show resolved Hide resolved

backend/asmgen.ml Show resolved Hide resolved

backend/cfg/cfg_irc.ml Show resolved Hide resolved

backend/cfg/cfg_irc.ml Show resolved Hide resolved

gretay-js approved these changes Aug 4, 2022

View reviewed changes

gretay-js mentioned this pull request Aug 4, 2022

IRC & LS followup #758

Open

xclerc added 2 commits August 4, 2022 14:22

Review.

249b242

ocamlformat

68e980e

xclerc merged commit 6d40006 into main Aug 4, 2022

xclerc mentioned this pull request Aug 4, 2022

IRC: Better worklists #760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IRC, iterated register coalescing (draft implementation) #678

IRC, iterated register coalescing (draft implementation) #678

Uh oh!

xclerc commented Jun 2, 2022 •

edited

Loading

Uh oh!

xclerc commented Jun 2, 2022

Uh oh!

gretay-js left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poechsel commented Jul 7, 2022

Uh oh!

gretay-js left a comment

Uh oh!

mshinwell left a comment

Uh oh!

gretay-js left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IRC, iterated register coalescing (draft implementation) #678

IRC, iterated register coalescing (draft implementation) #678

Uh oh!

Conversation

xclerc commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Code

Use

Generated code

Tests and benchmarks

TO DO

Uh oh!

xclerc commented Jun 2, 2022

Uh oh!

gretay-js left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poechsel commented Jul 7, 2022

Uh oh!

gretay-js left a comment

Choose a reason for hiding this comment

Uh oh!

mshinwell left a comment

Choose a reason for hiding this comment

Uh oh!

gretay-js left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xclerc commented Jun 2, 2022 •

edited

Loading