Skip to content

Commit

Permalink
compiler: make injectdestructors a MIR pass
Browse files Browse the repository at this point in the history
## Summary
The previous `injectdestructors` pass operated on `PNode` AST, and
while it should have been able to work with the AST as generated by
`astgen`, it didn't, which required the canonicalization step to be
disabled when the `injecdestructors` pass was needed.

Since the goal is to have all code-generators operate on either the MIR
or another IR derived from the former, it is necessary that the
MIR -> AST step happens immediately before code-generation, with no
further transformation in-between.

The pass is rewritten from the ground up, fixing many issues / bugs the
previous implementation had and also making the the pass significantly
more time and memory efficient than it was previously. In addition,
the pass generates more efficient code due to the smarter omission of
`wasMoved` and `=sink` calls.

An additional change is that calling a procedure that only raises
`Defect`s is now also considered as having control-flow related side-
effects. They previously weren't, which meant that locations sometimes
weren't cleaned up (via invoking the destructor) when raising a
`Defect`.

## Fixes
Assignments to location with destructors where the right-hand side is a
complex expression now respect the strict left-to-right evaluation
order:
```nim
var
  i = 0
  s = "str"

# works now:
(i = 1; s) =
  if i == 1: "str2"
  else: doAssert false; ""

# also works now:
proc p(s: var string): var string =
  i = 2
  s

p(s) = block:
  doAssert i == 2
  "str2"

```

A mutation of the argument is now considered to happen when
control-flow reaches the call, not when the argument is passed to a
`var` parameter. The difference is subtle but important:
```nim
proc f_sink(x: var seq[int], y: sink seq[int]) =
  y[0] = 2
  doAssert x == [1, 2] # passes

proc main() =
  var s = @[1, 2]
  f_sink(s, s) # a full copy is now introduced for the second argument

main()
```

Globals are now only moved if it can proven that doing so does not
affect program semantics. Previously, assigning a location derived from
a global would perform a destructive move in certain circumstances,
such as:

```nim
# at module scope
var a = "str"
var b = a # moved

proc p() =
  discard

echo a # echoes an empty string
```

Due to the way the new implementation works, it is now possible for
locations derived from globals to be consumed (if safe) when passing
them to `sink` parameters.

As a side-effect of operating on the MIR, the alias analysis now
handles tuple access correctly:
```nim
proc prc(x: var string) =
  var tup = (x: 0, y: "str")
  # `tup[1]` and `tup.y` were treated as two disjoint locations,
  # causing the assignment below to be erroneously optimized into a
  # move
  x = tup[1]
  doAssert tup.y == "str"

var o: string
prc(o)
```

## Details
In contrast to the previous implementation, the new one splits the pass
up into multiple sub-passes. This makes the code easier to reason
about, more modular, and reduces the amount of state each sub-pass has
access to.

The core part, the lifetime-tracking hook injection, is split into a
*setup*, *analysis*, and *application* step (executed in that order).

During the *setup* step, a control-flow graph of the code is computed
and a database about the entities (i.e. locations) relevant to the pass
built. Using both of them, the *analysis* step (which is itself split up
into multiple sub-steps) computes for all values appearing in a
*consume* context (e.g. argument to a `sink` parameter, right-hand-side
of an assignment) whether they're owned and what locations still store
a value at the end of their scope (and thus needing destruction).

Finally, the *application* step uses all previously gather information
to rewrite assignments into copy or sink hook calls, inject copies for
unowned values passed to `sink` parameters, and to, where necessary,
inject calls to the destructors.

For both simplicity and efficiency, the assignment rewriting and copy
injection both perform additional data-flow analysis in order to
omit unnecessary calls to `wasMoved` and to turn sink assignments
into shallow assignments (i.e. `FastAsgn`).

### Data-flow analysis
Compared to the data-flow analysis implementation (`dfa.nim`) used
previously, definitions (`def`) and uses (`use`) are not encoded in the
control-flow graph but rather stored separately. The new forward
analysis also doesn't have quadratic time and memory complexity in the
case of loops.

How data-flow analysis is employed is also different: instead of
computing the last-reads and first-writes of *all* locations during a
single traversal of the full control-flow graph, they're now computed
*on-demand*. For example, to figure whether the value coming from an
lvalue expression is owned (in a consume context) forward data-flow
analysis is perfomed to check whether a further read is reachable.

### Additional changes
- fix `mirchangesets` not compiling
- fix multiple of the sets defined in `mirtrees` having the wrong
  elements
- fix the `indexLike` tag routine for `NodeInstance` not being exported
  from `mirtrees`
- fix `parentEnd` yielding the wrong node in some cases
- add a procedure for querying the enum of a types equality operator
  magic without the symbol
- add the `OrdinalSeq` type to the `utils/containers.nim` module
- add the `Cursor` type; meant as a temporary placeholder for immutable
  views
- add multiple new destructor-related test cases
- remove the `nfLastRead` and `nfFirstWrite` node flags
- remove the `sfSingleUsedTemp` symbol flag
- remove the `rsemUncollectableRefCycle` warning. It had questionable
  utility and removing it was simpler than to make it work again.
  Should there exist the desire to bring it back, the problematic
  assignment should be detected in `sempass2`
- remove the `optimizer` module
- flag the `arc/tstrformat.nim` test as a known issue

### Timings
Timings and peakmem are taken from compiling the compiler itself with
the options `-d:release --compileOnly` and no cached compilation
artifacts. The original compiler is build with
`-d:release --exceptions:goto`.

|compiler from|bootstrapped with|using|time taken|peakmem|
|:------------|:----------------|:----|:---------|:------|
|#d7a70238d0|`--gc:refc`|`--gc:refc`|24.426s|1015.844MiB|
|this commit|`--gc:refc`|`--gc:refc`|24.687s|1014.254MiB|
|#d7a70238d0  |`--gc:refc`|`--gc:orc`|45.206s|1.99GiB|
|this commit |`--gc:refc`|`--gc:orc`|29.531s|1.222GiB|
|#d7a70238d0  |`--gc:orc`|`--gc:refc`|15.492s|978.938MiB|
|this commit|`--gc:orc`|`--gc:refc`|15.714s|979.023MiB|
|#d7a70238d0  |`--gc:orc`|`--gc:orc`|22.218s|1.933GiB|
|this commit|`--gc:orc`|`--gc:orc`|18.266s|1.188GiB|
  • Loading branch information
zerbina committed Feb 17, 2023
1 parent d7a7023 commit da47c7b
Show file tree
Hide file tree
Showing 32 changed files with 4,242 additions and 1,638 deletions.
3 changes: 1 addition & 2 deletions compiler/ast/ast_query.nim
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,7 @@ const
PersistentNodeFlags*: TNodeFlags = {nfBase2, nfBase8, nfBase16,
nfDotSetter, nfDotField,
nfIsRef, nfIsPtr, nfPreventCg, nfLL,
nfFromTemplate, nfDefaultRefsParam,
nfLastRead, nfFirstWrite}
nfFromTemplate, nfDefaultRefsParam}

namePos* = 0 ## Name of the type/proc-like node
patternPos* = 1 ## empty except for term rewriting macros
Expand Down
3 changes: 0 additions & 3 deletions compiler/ast/ast_types.nim
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,6 @@ type
sfNeverRaises ## proc can never raise an exception, not even
## OverflowDefect or out-of-memory
sfUsedInFinallyOrExcept ## symbol is used inside an 'except' or 'finally'
sfSingleUsedTemp ## For temporaries that we know will only be used once
sfNoalias ## 'noalias' annotation, means C's 'restrict'
sfEffectsDelayed ## an 'effectsDelayed' parameter

Expand Down Expand Up @@ -500,8 +499,6 @@ type
nfDefaultParam ## an automatically inserter default parameter
nfDefaultRefsParam ## a default param value references another parameter
## the flag is applied to proc default values and to calls
nfLastRead ## this node is a last read
nfFirstWrite## this node is a first write
nfHasComment ## node has a comment
nfImplicitPragma ## node is a "singlePragma" this is a transition flag
## created as part of nkError refactoring for the pragmas
Expand Down
1 change: 0 additions & 1 deletion compiler/ast/report_enums.nim
Original file line number Diff line number Diff line change
Expand Up @@ -826,7 +826,6 @@ type
rsemUnsafeSetLen = "UnsafeSetLen"
rsemUnsafeDefault = "UnsafeDefault"
rsemBindDeprecated
rsemUncollectableRefCycle = "CycleCreated"
rsemObservableStores = "ObservableStores"
rsemCaseTransition = "CaseTransition"
rsemUseOfGc = "GcMem" # last !
Expand Down
3 changes: 0 additions & 3 deletions compiler/ast/reports_sem.nim
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,6 @@ type
of rsemXCannotRaiseY:
raisesList*: PNode

of rsemUncollectableRefCycle:
cycleField*: PNode

of rsemStrictNotNilExpr, rsemStrictNotNilResult:
nilIssue*: Nilability
nilHistory*: seq[SemNilHistory]
Expand Down
7 changes: 0 additions & 7 deletions compiler/front/cli_reporter.nim
Original file line number Diff line number Diff line change
Expand Up @@ -1909,13 +1909,6 @@ proc reportBody*(conf: ConfigRef, r: SemReport): string =
of rsemObservableStores:
result = "observable stores to '$1'" % r.ast.render

of rsemUncollectableRefCycle:
if r.cycleField == nil:
result = "'$#' creates an uncollectable ref cycle" % [r.ast.render]
else:
result = "'$#' creates an uncollectable ref cycle; annotate '$#' with .cursor" % [
r.ast.render, r.cycleField.render]

of rsemResultUsed:
result = "used 'result' variable"

Expand Down
Loading

0 comments on commit da47c7b

Please sign in to comment.