Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: make injectdestructors a MIR pass #450

Merged
merged 1 commit into from
Feb 17, 2023

Commits on Feb 17, 2023

  1. compiler: make injectdestructors a MIR pass

    ## Summary
    The previous `injectdestructors` pass operated on `PNode` AST, and
    while it should have been able to work with the AST as generated by
    `astgen`, it didn't, which required the canonicalization step to be
    disabled when the `injecdestructors` pass was needed.
    
    Since the goal is to have all code-generators operate on either the MIR
    or another IR derived from the former, it is necessary that the
    MIR -> AST step happens immediately before code-generation, with no
    further transformation in-between.
    
    The pass is rewritten from the ground up, fixing many issues / bugs the
    previous implementation had and also making the the pass significantly
    more time and memory efficient than it was previously. In addition,
    the pass generates more efficient code due to the smarter omission of
    `wasMoved` and `=sink` calls.
    
    An additional change is that calling a procedure that only raises
    `Defect`s is now also considered as having control-flow related side-
    effects. They previously weren't, which meant that locations sometimes
    weren't cleaned up (via invoking the destructor) when raising a
    `Defect`.
    
    ## Fixes
    Assignments to location with destructors where the right-hand side is a
    complex expression now respect the strict left-to-right evaluation
    order:
    ```nim
    var
      i = 0
      s = "str"
    
    # works now:
    (i = 1; s) =
      if i == 1: "str2"
      else: doAssert false; ""
    
    # also works now:
    proc p(s: var string): var string =
      i = 2
      s
    
    p(s) = block:
      doAssert i == 2
      "str2"
    
    ```
    
    A mutation of the argument is now considered to happen when
    control-flow reaches the call, not when the argument is passed to a
    `var` parameter. The difference is subtle but important:
    ```nim
    proc f_sink(x: var seq[int], y: sink seq[int]) =
      y[0] = 2
      doAssert x == [1, 2] # passes
    
    proc main() =
      var s = @[1, 2]
      f_sink(s, s) # a full copy is now introduced for the second argument
    
    main()
    ```
    
    Globals are now only moved if it can proven that doing so does not
    affect program semantics. Previously, assigning a location derived from
    a global would perform a destructive move in certain circumstances,
    such as:
    
    ```nim
    # at module scope
    var a = "str"
    var b = a # moved
    
    proc p() =
      discard
    
    echo a # echoes an empty string
    ```
    
    Due to the way the new implementation works, it is now possible for
    locations derived from globals to be consumed (if safe) when passing
    them to `sink` parameters.
    
    As a side-effect of operating on the MIR, the alias analysis now
    handles tuple access correctly:
    ```nim
    proc prc(x: var string) =
      var tup = (x: 0, y: "str")
      # `tup[1]` and `tup.y` were treated as two disjoint locations,
      # causing the assignment below to be erroneously optimized into a
      # move
      x = tup[1]
      doAssert tup.y == "str"
    
    var o: string
    prc(o)
    ```
    
    ## Details
    In contrast to the previous implementation, the new one splits the pass
    up into multiple sub-passes. This makes the code easier to reason
    about, more modular, and reduces the amount of state each sub-pass has
    access to.
    
    The core part, the lifetime-tracking hook injection, is split into a
    *setup*, *analysis*, and *application* step (executed in that order).
    
    During the *setup* step, a control-flow graph of the code is computed
    and a database about the entities (i.e. locations) relevant to the pass
    built. Using both of them, the *analysis* step (which is itself split up
    into multiple sub-steps) computes for all values appearing in a
    *consume* context (e.g. argument to a `sink` parameter, right-hand-side
    of an assignment) whether they're owned and what locations still store
    a value at the end of their scope (and thus needing destruction).
    
    Finally, the *application* step uses all previously gather information
    to rewrite assignments into copy or sink hook calls, inject copies for
    unowned values passed to `sink` parameters, and to, where necessary,
    inject calls to the destructors.
    
    For both simplicity and efficiency, the assignment rewriting and copy
    injection both perform additional data-flow analysis in order to
    omit unnecessary calls to `wasMoved` and to turn sink assignments
    into shallow assignments (i.e. `FastAsgn`).
    
    ### Data-flow analysis
    Compared to the data-flow analysis implementation (`dfa.nim`) used
    previously, definitions (`def`) and uses (`use`) are not encoded in the
    control-flow graph but rather stored separately. The new forward
    analysis also doesn't have quadratic time and memory complexity in the
    case of loops.
    
    How data-flow analysis is employed is also different: instead of
    computing the last-reads and first-writes of *all* locations during a
    single traversal of the full control-flow graph, they're now computed
    *on-demand*. For example, to figure whether the value coming from an
    lvalue expression is owned (in a consume context) forward data-flow
    analysis is perfomed to check whether a further read is reachable.
    
    ### Additional changes
    - fix `mirchangesets` not compiling
    - fix multiple of the sets defined in `mirtrees` having the wrong
      elements
    - fix the `indexLike` tag routine for `NodeInstance` not being exported
      from `mirtrees`
    - fix `parentEnd` yielding the wrong node in some cases
    - add a procedure for querying the enum of a types equality operator
      magic without the symbol
    - add the `OrdinalSeq` type to the `utils/containers.nim` module
    - add the `Cursor` type; meant as a temporary placeholder for immutable
      views
    - add multiple new destructor-related test cases
    - remove the `nfLastRead` and `nfFirstWrite` node flags
    - remove the `sfSingleUsedTemp` symbol flag
    - remove the `rsemUncollectableRefCycle` warning. It had questionable
      utility and removing it was simpler than to make it work again.
      Should there exist the desire to bring it back, the problematic
      assignment should be detected in `sempass2`
    - remove the `optimizer` module
    - flag the `arc/tstrformat.nim` test as a known issue
    
    ### Timings
    Timings and peakmem are taken from compiling the compiler itself with
    the options `-d:release --compileOnly` and no cached compilation
    artifacts. The original compiler is build with
    `-d:release --exceptions:goto`.
    
    |compiler from|bootstrapped with|using|time taken|peakmem|
    |:------------|:----------------|:----|:---------|:------|
    |#d7a70238d0|`--gc:refc`|`--gc:refc`|24.426s|1015.844MiB|
    |this commit|`--gc:refc`|`--gc:refc`|24.687s|1014.254MiB|
    |#d7a70238d0  |`--gc:refc`|`--gc:orc`|45.206s|1.99GiB|
    |this commit |`--gc:refc`|`--gc:orc`|29.531s|1.222GiB|
    |#d7a70238d0  |`--gc:orc`|`--gc:refc`|15.492s|978.938MiB|
    |this commit|`--gc:orc`|`--gc:refc`|15.714s|979.023MiB|
    |#d7a70238d0  |`--gc:orc`|`--gc:orc`|22.218s|1.933GiB|
    |this commit|`--gc:orc`|`--gc:orc`|18.266s|1.188GiB|
    zerbina committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    da47c7b View commit details
    Browse the repository at this point in the history