Utilize dominators for constructing extended basic blocks #142

jserv · 2023-06-05T14:31:08Z

Quoted from Basic Blocks and CFG, the definition of extended basic block (EBB):

Extended basic block a maximal sequence of instructions beginning with a leader that contains no join nodes other than its first node.
Has a single entry, but possible multiple exit points.
Some optimizations are more effective on extended basic blocks.

We can identify loops by using dominators:

a node A in the flowgraph dominates a node B if every path from entry node to B includes A.
This relations is antisymmetric, reflexive, and transitive.

back edge: An edge in the flow graph, whose head dominates its tail (example - edge from B6 to B4).

A loop consists of all nodes dominated by its entry node (head of the back edge) and having exactly one back edge in it.

Intercept contains an effective dominator implementation. See

Usage:

void codegen_optimise(CodegenContext *ctx) {
  opt_inline_global_vars(ctx);
  opt_analyse_functions(ctx);

  /// Optimise each function individually.
  do {
    foreach_ptr (IRFunction*, f, ctx->functions) {
      if (f->is_extern) continue;

      DominatorInfo dom = {0};
      do {
        build_dominator_tree(f, &dom, true);
        opt_reorder_blocks(f, &dom);
      } while (
          opt_const_folding_and_strengh_reduction(f) ||
          opt_dce(f) ||
          opt_mem2reg(f) ||
          opt_jump_threading(f, &dom) ||
          opt_tail_call_elim(f)
      );
      free_dominator_info(&dom);
    }
  }

  /// Cross-function optimisations.
  while (opt_inline_global_vars(ctx) || opt_analyse_functions(ctx));
}

Similarly, blink comes with an approach to detect loops during code generation.

Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated. Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block. See: sysprog21#142

Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated. Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block. As shown in the performance results below, we gain 4% performance improvement when running coreMark and SciMark2 and lost 1% performance when running dhrysone. * Intel Core i7-11700 | Metric | origin | proposed |Speedup| |------------+---------+----------+-------| | CoreMark | 2193.28 | 2289.26 | +4% | | SciMark2 | 13.45 | 18.48 | +4% | | Dhrystone | 1413.11 | 1447.11 | -1% | See: sysprog21#142

qwe661234 · 2023-08-30T03:38:44Z

Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated.

Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block.

As shown in the performance results below, we gain 4% performance improvement when running coreMark and SciMark2 and lost 1% performance when running dhrysone.

Intel Core i7-11700

Metric	origin	proposed	Speedup
CoreMark	2193.28	2289.26	+4%
SciMark2	13.45	18.48	+4%
Dhrystone	1413.11	1447.11	-1%

Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated. Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block. As shown in the performance results below, we gain 4% performance improvement when running coreMark and 37% when running SciMark2, but we lost 1% performance when running dhrysone. * Intel Core i7-11700 | Metric | origin | proposed |Speedup| |------------+---------+----------+-------| | CoreMark | 2193.28 | 2289.26 | +4% | | SciMark2 | 13.45 | 18.48 | +37% | | Dhrystone | 1413.11 | 1447.11 | -1% | See: sysprog21#142

jserv · 2023-09-17T10:40:49Z

The effectiveness has been confirmed. We shall look for further faster approaches for loop detection.

jserv assigned qwe661234 Jun 5, 2023

qwe661234 mentioned this issue Jul 4, 2023

Use portable JIT compilation for accelerating RISC-V emulation #81

Open

qwe661234 mentioned this issue Aug 30, 2023

Avoid recursive jump for better loop detection #201

Merged

jserv closed this as completed Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize dominators for constructing extended basic blocks #142

Utilize dominators for constructing extended basic blocks #142

jserv commented Jun 5, 2023 •

edited

qwe661234 commented Aug 30, 2023

jserv commented Sep 17, 2023

Utilize dominators for constructing extended basic blocks #142

Utilize dominators for constructing extended basic blocks #142

Comments

jserv commented Jun 5, 2023 • edited

qwe661234 commented Aug 30, 2023

jserv commented Sep 17, 2023

jserv commented Jun 5, 2023 •

edited