compiler: introduce an IR for the code generators #551

zerbina · 2023-02-20T20:45:41Z

Summary

Add the CgNode intermediate representation, change astgen (which is
renamed to cgirgen) to output it instead of PNode, and update all
code generators to use CgNode. In order to keep the changes required
for the transition low, the differences of the new IR compared to
PNode are kept minimal.

The intent is to have an IR that can be evolved independently from sem
and the macro API. Only PNode is replaced so far, but both PType
and PSym are planned to also get a dedicated version for use by the
code generators. In addition, PNode can now be evolved without the
code generators having to be considered.

Details

The core of the changes is the introduction of CgNode. It, for now,
also uses a ref-tree-based data-representation and is very similar,
in both structure and naming, to the PNode-subset previously used by
the code generators, but with some small simplifications/renamings
already applied where it makes sense.

Naming differences:

nkNimNodeLit -> cnkAstLit
nkCurly -> cnkSetConstr
nkBracket -> cnkArrayConstr
nkClosure -> cnkClosureConstr
nkHiddenDeref -> cnkDerefView
nkHiddentStdConv -> cnkHiddenConv
nkDiscardStmt -> cnkVoidStmt
nkExprColonExpr -> cnkBinding

Structural differences:

the while statement is replaced with the repeat statement, since
conditional loops (i.e. while statements) don't exist after the MIR
phase. A cnkRepeatStmt node has a single sub-node, which is the
loop's body
there are no multi-branch if statements, since those don't exist
during and after the MIR phase. A cnkIfStmt has the same structure
as an nkElifBranch
a cnkReturnStmt node has no sub nodes
instead of emit being represented via a pragma statement, a
dedicated statement (cnkEmitStmt) is used
cnkPragmaStmts don't have sub-nodes, but instead directly store the
pragma's name
cnkObjConstr doesn't have an extra type slot in the first position
there is only a single literal-node kind for the int, uint,
float, and string types
there are no dedicated nodes for of and else in case statements,
both use cnkBranch
there are no nkVarSection|nkLetSection + nkIdentDefs counterparts.
Instead, definitions use the cnkDef node

Many more simplifications are possible, but they are left for future
PRs.

Producing the IR

astgen is changed to produce CgNode instead of PNode IR. The
canUseView and flattenExpr procedures for PNode are copied and
adjusted for CgNode.

For literals, which are transported through the MIR phase as PNodes,
translation-toCgNode logic is added (translateLit). For integer-like
literals, whether the output node is an cnkIntLit or cnkUIntLit node
is decided based on the input node's type, not on its node kind --
this helps catching issues with incorrectly typed integer literals. For
float32 literals, translateLit already narrows the value.

Since the module's name now doesn't apply anymore, it is changed to
cgirnode. In addition, the module is moved to the backend directory
and generateAST is renamed to generateIR -- the module's
documentation is also updated.

Code generation

All three code generators and their surroundings are adjusted to operate
on CgNode instead of PNode, which for the most part means replacing
occurrences of PNode with CgNode and adjusting for the structural
difference listed above. Unrelated refactorings and clean-ups are kept
to a minimum.

There are two sources of PNodes still reaching into the code
generators: constants and type AST. When generating the code for
constants, their AST is first translated to CgNode IR before being
passed on to normal code generation, while for the type AST case,
special-purpose routines that still use PNode are added.

Routines still used by the code generators that are only available for
PNode are copied into the new compat module and adjusted to use
CgNode. This is meant to be an interim solution, and they are planned
to be phased out and removed again in the future.

Both the C and JavaScript code generation orchestrators passed the
PNode body of the main procedure directly to the code generator. This
doesn't work anymore, and it is replaced with canonicalizing (which
produces a CgNode tree) the body first.

The operation now expects its operands to be `NimNode`s already. A new instruction (`opcDataToAst`) is introduced for creating the AST representation of VM data. While also simplifying the VM a bit, the main reason behind the change is to not having to provide the full AST of the template call expression, as doing so is not possible when the code generator no longer operates on `PNode` AST.

zerbina · 2023-07-12T17:31:49Z

Attempting to derive the first revision from the MIR was a mistake, as too much changes to the code generators are required for that to work. I'm going to revert all changes and start over, but now with a focus on keeping both the overall changeset and development of the new IR to a minimum.

However, there is still some decoupling left do to before the work here can resume. This includes things like making dynlib handling part of the unified backend processing pipeline, or replacing/removing dependencies on routines that are not part of the code generators and use PNode (e.g., dfa.aliases, TNodeTable, etc.).

The `nfAllFieldsSet` flag stopped reaching the code generator with the introduction of the MIR, meaning that the condition always evaluates to 'true'.

The C and JS code-gen orchestrators were passing the AST produced for the main procedure directly to the code generators. This is no longer going to work once the code generators don't work with `PNode` anymore, so `canonicalize` is now used on the AST.

Instead of an integer-literal node, the procedure now accepts the value directly.

`astgen` is adjusted to produce `CgNode` instead of `PNode`. For this, multiple `PNode` -> `CgNode` translation procedures had to be introduced and `canUseView` + `flattenExpr` duplicated and adjusted for `CgNode`. The general processing logic stays the same. The module's document comment is also adjusted and an outdated mention of "sections" (they are called "regions" in the MIR) fixed. `astgen` as the name doesn't make much sense anymore and is going to be changed to something more fitting.

`canonicalize` and `generateAST` now return `CgNode` trees. For debug rendering, a `treeRepr` procedure for `CgNode` is added to the `cgirutils` module.

The instruction-emission procedure now accept a `TLineInfo` as input directly, instead of, unnecessarily, requiring a `PNode`. Wrappers that still use `PNode` are added for convenience.

All three code generators now use the `CgNode` IR. The changes to the modules are kept minimal in order to make review easier. As an additional way to keep the amount of changes smaller, the `compat` module is introduced.

It's obsolete now.

In addition, the module is moved to the `backend` directory.

saem · 2023-08-03T18:27:35Z

i really like all the node simplification and rework, one question about it, did you already consider naming cnkVoidStmt as cnkVoidExpr? Asking because I read the node as voiding an expression.

zerbina · 2023-08-03T19:20:58Z

i really like all the node simplification and rework, one question about it, did you already consider naming cnkVoidStmt as cnkVoidExpr? Asking because I read the node as voiding an expression.

Hm, no, I didn't consider it. My thinking was that while it voids (discards) a value, it itself acts as a statement (returns no value), so, for consistency with the other statements, used the Stmt suffix.

saem

this really is an incredible milestone in decoupling sem and the backends, along with backend unification! 🎉

saem · 2023-08-03T20:19:10Z

compiler/backend/cgir.nim

+
+    cnkAsgn       ## a = b
+    cnkFastAsgn   ## fast assign b to a
+    # future direction: have ``cnkAsgn`` mean "assign without implying any


I wonder if a better distinction is assign vs initialize?

For context I'm simply broadly remarking upon nomenclature.

That's a good question. Everything related to assignment is very fuzzy at the moment, mainly because of "assignment" having different meaning depending on the used code generator (they don't operate on the same language level with regards to assignments).

Broadly speaking, cnkAsgn combines both "assign" (copy to non-empty destination) and "initialize" (copy to empty destination), while cnkFastAsgn means "create a shallow copy". Whether combining "assign" and "initialize" is a good idea, I'm not sure (it probably isn't), but PNode did it, so for initial compatibility, I carried it over.

Sounds like we're on the same page, needs some more thinking.

compiler/backend/cgir.nim

saem · 2023-08-03T20:22:09Z

compiler/backend/cgir.nim

+    of cnkWithItems:
+      childs*: seq[CgNode]
+
+  # future direction: move to a single-sequence-based, data-oriented design


compiler/backend/jsbackend.nim

saem · 2023-08-03T20:53:32Z

compiler/backend/jsgen.nim

@@ -704,12 +711,12 @@ proc genWhileStmt(p: PProc, n: PNode) =
  p.blocks[^1].isLoop = true
  let labl = p.unique.rope
  lineF(p, "Label$1: while (true) {$n", [labl])
-  p.nested: genStmt(p, n[1])
+  p.nested: genStmt(p, n[0])


The index change is because of the new structure of repeat vs while?

Yep, the loops body was previously in the second slot, but now it's in the first (and only) one.

The module is named `cgirgen` now. `debug.rst` also contained outdated mentions of `PNode` being the IR the code generators -- this is fixed too.

zerbina · 2023-08-04T20:22:10Z

Thank you for the review, @saem!

zerbina · 2023-08-06T15:23:59Z

/merge

github-actions · 2023-08-06T15:24:25Z

Merge requested by: @zerbina

Contents after the first section break of the PR description has been removed and preserved below:

Notes for reviewers

a significant milestone in the effort of unifying the backends

zerbina added 3 commits February 20, 2023 18:02

WIP: introduce an IR for the code generators

a8279ad

progress; the first bootstrap iteration works again

3059da1

zerbina added refactor Implementation refactor compiler/backend Related to backend system of the compiler labels Feb 20, 2023

haxscramper modified the milestones: C backend refactoring, MIR phase Feb 25, 2023

zerbina mentioned this pull request Mar 7, 2023

vm: make opcExpandToAst more flexible #570

Merged

zerbina mentioned this pull request Jul 8, 2023

internal: separate TLoc from TSym and TType #790

Merged

This was referenced Jul 13, 2023

backend: make dynlib handling target-agnostic #796

Merged

cgen: replace TNodeTable usage with Table #805

Merged

cgen: fix performance regression with x in {...} #807

Merged

internal: remove obsolete nkExprColonExpr detection #808

Merged

This was referenced Jul 22, 2023

vmjit: use MIR-based dependency discovery #810

Merged

internal: use custom rendering for --expandArc #813

Merged

prevent RVO and in-place construction via an MIR pass #815

Merged

mirpasses: make call-argument fixup a MIR pass #818

Merged

zerbina added 5 commits August 3, 2023 15:27

start over; revert all changes so far

90a7a08

Merge branch 'devel' into introduce-codegen-ir

792f2d3

ccgexprs: remove node flag usage

007098d

The `nfAllFieldsSet` flag stopped reaching the code generator with the introduction of the MIR, meaning that the condition always evaluates to 'true'.

vmaux: decouple findMatchingBranch from PNode

603b51e

Instead of an integer-literal node, the procedure now accepts the value directly.

zerbina added 9 commits August 3, 2023 15:31

mirbridge/backend: use CgNode

eb3099a

`canonicalize` and `generateAST` now return `CgNode` trees. For debug rendering, a `treeRepr` procedure for `CgNode` is added to the `cgirutils` module.

vmgen: decouple the emit procedures from PNode

8e301b2

The instruction-emission procedure now accept a `TLineInfo` as input directly, instead of, unnecessarily, requiring a `PNode`. Wrappers that still use `PNode` are added for convenience.

cgirutils: use CgNode

1d36474

update the code generators

7f36d1a

All three code generators now use the `CgNode` IR. The changes to the modules are kept minimal in order to make review easier. As an additional way to keep the amount of changes smaller, the `compat` module is introduced.

ast_types: remove the codegenExprNodeKinds set

756d2ab

It's obsolete now.

debugutils: add a frameMsg overload for CgNode

8499b48

astgen: rename generateAST to generateIR

941b3f7

rename astgen to cgirgen

4565463

In addition, the module is moved to the `backend` directory.

zerbina marked this pull request as ready for review August 3, 2023 17:10

saem approved these changes Aug 3, 2023

View reviewed changes

zerbina added 4 commits August 4, 2023 18:44

jsbackend: use a selective import for mirbridge

1e950cd

cgir: rename childs to kids

3b7139e

cgirgen: work around strict-side-effect analysis bug

1092b3f

docs: update mentions of astgen

6775403

The module is named `cgirgen` now. `debug.rst` also contained outdated mentions of `PNode` being the IR the code generators -- this is fixed too.

chore-runner bot added this pull request to the merge queue Aug 6, 2023

Merged via the queue into nim-works:devel with commit 9c790db Aug 6, 2023
18 checks passed

zerbina deleted the introduce-codegen-ir branch August 6, 2023 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: introduce an IR for the code generators #551

compiler: introduce an IR for the code generators #551

zerbina commented Feb 20, 2023 •

edited by github-actions bot

Loading

zerbina commented Jul 12, 2023

saem commented Aug 3, 2023

zerbina commented Aug 3, 2023

saem left a comment

saem Aug 3, 2023

zerbina Aug 4, 2023

saem Aug 4, 2023

saem Aug 3, 2023

saem Aug 3, 2023

zerbina Aug 4, 2023

zerbina commented Aug 4, 2023

zerbina commented Aug 6, 2023

github-actions bot commented Aug 6, 2023

Notes for reviewers

compiler: introduce an IR for the code generators #551

compiler: introduce an IR for the code generators #551

Conversation

zerbina commented Feb 20, 2023 • edited by github-actions bot Loading

Summary

Details

Producing the IR

Code generation

zerbina commented Jul 12, 2023

saem commented Aug 3, 2023

zerbina commented Aug 3, 2023

saem left a comment

Choose a reason for hiding this comment

saem Aug 3, 2023

Choose a reason for hiding this comment

zerbina Aug 4, 2023

Choose a reason for hiding this comment

saem Aug 4, 2023

Choose a reason for hiding this comment

saem Aug 3, 2023

Choose a reason for hiding this comment

saem Aug 3, 2023

Choose a reason for hiding this comment

zerbina Aug 4, 2023

Choose a reason for hiding this comment

zerbina commented Aug 4, 2023

zerbina commented Aug 6, 2023

github-actions bot commented Aug 6, 2023

Notes for reviewers

zerbina commented Feb 20, 2023 •

edited by github-actions bot

Loading