Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: introduce an IR for the code generators #551

Merged
merged 21 commits into from
Aug 6, 2023

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Feb 20, 2023

Summary

Add the CgNode intermediate representation, change astgen (which is
renamed to cgirgen) to output it instead of PNode, and update all
code generators to use CgNode. In order to keep the changes required
for the transition low, the differences of the new IR compared to
PNode are kept minimal.

The intent is to have an IR that can be evolved independently from sem
and the macro API. Only PNode is replaced so far, but both PType
and PSym are planned to also get a dedicated version for use by the
code generators. In addition, PNode can now be evolved without the
code generators having to be considered.

Details

The core of the changes is the introduction of CgNode. It, for now,
also uses a ref-tree-based data-representation and is very similar,
in both structure and naming, to the PNode-subset previously used by
the code generators, but with some small simplifications/renamings
already applied where it makes sense.

Naming differences:

  • nkNimNodeLit -> cnkAstLit
  • nkCurly -> cnkSetConstr
  • nkBracket -> cnkArrayConstr
  • nkClosure -> cnkClosureConstr
  • nkHiddenDeref -> cnkDerefView
  • nkHiddentStdConv -> cnkHiddenConv
  • nkDiscardStmt -> cnkVoidStmt
  • nkExprColonExpr -> cnkBinding

Structural differences:

  • the while statement is replaced with the repeat statement, since
    conditional loops (i.e. while statements) don't exist after the MIR
    phase. A cnkRepeatStmt node has a single sub-node, which is the
    loop's body
  • there are no multi-branch if statements, since those don't exist
    during and after the MIR phase. A cnkIfStmt has the same structure
    as an nkElifBranch
  • a cnkReturnStmt node has no sub nodes
  • instead of emit being represented via a pragma statement, a
    dedicated statement (cnkEmitStmt) is used
  • cnkPragmaStmts don't have sub-nodes, but instead directly store the
    pragma's name
  • cnkObjConstr doesn't have an extra type slot in the first position
  • there is only a single literal-node kind for the int, uint,
    float, and string types
  • there are no dedicated nodes for of and else in case statements,
    both use cnkBranch
  • there are no nkVarSection|nkLetSection + nkIdentDefs counterparts.
    Instead, definitions use the cnkDef node

Many more simplifications are possible, but they are left for future
PRs.

Producing the IR

astgen is changed to produce CgNode instead of PNode IR. The
canUseView and flattenExpr procedures for PNode are copied and
adjusted for CgNode.

For literals, which are transported through the MIR phase as PNodes,
translation-toCgNode logic is added (translateLit). For integer-like
literals, whether the output node is an cnkIntLit or cnkUIntLit node
is decided based on the input node's type, not on its node kind --
this helps catching issues with incorrectly typed integer literals. For
float32 literals, translateLit already narrows the value.

Since the module's name now doesn't apply anymore, it is changed to
cgirnode. In addition, the module is moved to the backend directory
and generateAST is renamed to generateIR -- the module's
documentation is also updated.

Code generation

All three code generators and their surroundings are adjusted to operate
on CgNode instead of PNode, which for the most part means replacing
occurrences of PNode with CgNode and adjusting for the structural
difference listed above. Unrelated refactorings and clean-ups are kept
to a minimum.

There are two sources of PNodes still reaching into the code
generators: constants and type AST. When generating the code for
constants, their AST is first translated to CgNode IR before being
passed on to normal code generation, while for the type AST case,
special-purpose routines that still use PNode are added.

Routines still used by the code generators that are only available for
PNode are copied into the new compat module and adjusted to use
CgNode. This is meant to be an interim solution, and they are planned
to be phased out and removed again in the future.

Both the C and JavaScript code generation orchestrators passed the
PNode body of the main procedure directly to the code generator. This
doesn't work anymore, and it is replaced with canonicalizing (which
produces a CgNode tree) the body first.

The operation now expects its operands to be `NimNode`s already. A new
instruction (`opcDataToAst`) is introduced for creating the AST
representation of VM data.

While also simplifying the VM a bit, the main reason behind the change is
to not having to provide the full AST of the template call expression,
as doing so is not possible when the code generator no longer operates
on `PNode` AST.
@zerbina zerbina added refactor Implementation refactor compiler/backend Related to backend system of the compiler labels Feb 20, 2023
@zerbina
Copy link
Collaborator Author

zerbina commented Jul 12, 2023

Attempting to derive the first revision from the MIR was a mistake, as too much changes to the code generators are required for that to work. I'm going to revert all changes and start over, but now with a focus on keeping both the overall changeset and development of the new IR to a minimum.

However, there is still some decoupling left do to before the work here can resume. This includes things like making dynlib handling part of the unified backend processing pipeline, or replacing/removing dependencies on routines that are not part of the code generators and use PNode (e.g., dfa.aliases, TNodeTable, etc.).

The `nfAllFieldsSet` flag stopped reaching the code generator with the
introduction of the MIR, meaning that the condition always evaluates to
'true'.
The C and JS code-gen orchestrators were passing the AST produced for
the main procedure directly to the code generators. This is no longer
going to work once the code generators don't work with `PNode` anymore,
so `canonicalize` is now used on the AST.
Instead of an integer-literal node, the procedure now accepts the value
directly.
`astgen` is adjusted to produce `CgNode` instead of `PNode`. For this,
multiple `PNode` -> `CgNode` translation procedures had to be
introduced and `canUseView` + `flattenExpr` duplicated and adjusted for
`CgNode`. The general processing logic stays the same.

The module's document comment is also adjusted and an outdated mention
of "sections" (they are called "regions" in the MIR) fixed.

`astgen` as the name doesn't make much sense anymore and is going to be
changed to something more fitting.
`canonicalize` and `generateAST` now return `CgNode` trees. For debug
rendering, a `treeRepr` procedure for `CgNode` is added to the
`cgirutils` module.
The instruction-emission procedure now accept a `TLineInfo` as input
directly, instead of, unnecessarily, requiring a `PNode`. Wrappers that
still use `PNode` are added for convenience.
All three code generators now use the `CgNode` IR. The changes to the
modules are kept minimal in order to make review easier.

As an additional way to keep the amount of changes smaller, the `compat`
module is introduced.
In addition, the module is moved to the `backend` directory.
@zerbina zerbina marked this pull request as ready for review August 3, 2023 17:10
@saem
Copy link
Collaborator

saem commented Aug 3, 2023

i really like all the node simplification and rework, one question about it, did you already consider naming cnkVoidStmt as cnkVoidExpr? Asking because I read the node as voiding an expression.

@zerbina
Copy link
Collaborator Author

zerbina commented Aug 3, 2023

i really like all the node simplification and rework, one question about it, did you already consider naming cnkVoidStmt as cnkVoidExpr? Asking because I read the node as voiding an expression.

Hm, no, I didn't consider it. My thinking was that while it voids (discards) a value, it itself acts as a statement (returns no value), so, for consistency with the other statements, used the Stmt suffix.

Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this really is an incredible milestone in decoupling sem and the backends, along with backend unification! 🎉


cnkAsgn ## a = b
cnkFastAsgn ## fast assign b to a
# future direction: have ``cnkAsgn`` mean "assign without implying any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a better distinction is assign vs initialize?

For context I'm simply broadly remarking upon nomenclature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Everything related to assignment is very fuzzy at the moment, mainly because of "assignment" having different meaning depending on the used code generator (they don't operate on the same language level with regards to assignments).

Broadly speaking, cnkAsgn combines both "assign" (copy to non-empty destination) and "initialize" (copy to empty destination), while cnkFastAsgn means "create a shallow copy". Whether combining "assign" and "initialize" is a good idea, I'm not sure (it probably isn't), but PNode did it, so for initial compatibility, I carried it over.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we're on the same page, needs some more thinking.

compiler/backend/cgir.nim Outdated Show resolved Hide resolved
of cnkWithItems:
childs*: seq[CgNode]

# future direction: move to a single-sequence-based, data-oriented design
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♥️

compiler/backend/jsbackend.nim Outdated Show resolved Hide resolved
@@ -704,12 +711,12 @@ proc genWhileStmt(p: PProc, n: PNode) =
p.blocks[^1].isLoop = true
let labl = p.unique.rope
lineF(p, "Label$1: while (true) {$n", [labl])
p.nested: genStmt(p, n[1])
p.nested: genStmt(p, n[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index change is because of the new structure of repeat vs while?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the loops body was previously in the second slot, but now it's in the first (and only) one.

The module is named `cgirgen` now. `debug.rst` also contained outdated
mentions of `PNode` being the IR the code generators -- this is fixed
too.
@zerbina
Copy link
Collaborator Author

zerbina commented Aug 4, 2023

Thank you for the review, @saem!

@zerbina
Copy link
Collaborator Author

zerbina commented Aug 6, 2023

/merge

@github-actions
Copy link

github-actions bot commented Aug 6, 2023

Merge requested by: @zerbina

Contents after the first section break of the PR description has been removed and preserved below:


Notes for reviewers

  • a significant milestone in the effort of unifying the backends

@chore-runner chore-runner bot added this pull request to the merge queue Aug 6, 2023
Merged via the queue into nim-works:devel with commit 9c790db Aug 6, 2023
18 checks passed
@zerbina zerbina deleted the introduce-codegen-ir branch August 6, 2023 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/backend Related to backend system of the compiler refactor Implementation refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants