escape analysis: enables memory safe views (cf 1st class openarray) and optimizations #14976

timotheecour · 2020-07-14T08:52:51Z

the compiler can now issue a warning when it finds that a local stack address can escape its stack frame, making things like ptr T a lot safer
this makes things like std/views (1st class openArrray, but not escape-safe); std/pointers; fixes sequtils.applyIt #14869 escape-safe, allowing safe views into data structures
see extensive tests in this PR in tests/misc/tviewfroms.nim
correctly handles complex cases involving var/non-var params/result, globals, nested procs, arrays, local vars in arbitrarily nested scopes (including parent scopes), generics (each instantiated proc will have its own dependency graph) and correctly propagates dependencies through proc parameters

example (see a lot more in tviewfroms.nim)

for example, the compiler figures out that fn.result depends on a2 but not a1, and uses that to infer that result = fn(b1.addr, b2.addr) is illegal but result = fn(b2.addr, b1.addr) is legal:

block:
  proc fn(a1: ptr int, a2: ptr int): ptr int =
    # result depends on a2, not a1
    result = a2

  proc bad13(b1: var int): ptr int =
    var b2 = 0
    result = fn(b1.addr, b2.addr)
    checkEscape "'bad13.b2' escapes via 'result'"

    # fn.result depends only on `fn.a2`, so only on `b1`, which is legal here
    result = fn(b2.addr, b1.addr)
    checkNoEscape()

notes

all the logic for escape analysis is handled in compiler/lifetimes.nim; it figures for each symbol the set of symbols it depends on, and at which dereference level (<=1, can be negative), eg:

var b1 = a.addr # b1 => (a: 1)
var b2 = b1 # b2 => (a: 1)
var b3 = b1[] # b3 => (a: 0)

when it finds a dependency (lhs => rhs) such that lhs outlives rhs, it issues a warnings (which can be turned into an error)

future work

extend this analysis to other directions, eg:
track memory originating from static data segment (read only memory, eg used for cstrings var a = "foo".cstring) to make sure at CT that such memory can't be written to
more flexible alternative to Add write-tracking to Nim's func · Issue #234 · nim-lang/RFCs by doing similar interprocedural analysis as done in this PR

lib/system/excpt.nim

Araq · 2020-07-14T10:05:40Z

Please be aware that return/result is not the only way to pass information up the call chain, contrived example:

proc p(x: var ptr int) =
  var y = 4
  x = addr(y)

Varriount · 2020-07-14T14:03:20Z

Does this work across scopes? For example:

var x: ptr int
block:
  var y = @[1, 2, 3]
  x = addr y

timotheecour · 2020-07-14T18:04:32Z

Does this work across scopes? For example:

yes, there are several examples dealing with variables across nested function scopes in tviewfroms.nim (eg, search for example1); it works across any number of scopes

your example however is bogus for 3 reasons:

it should be x = addr y[0], not x = addr y since x is ptr int
addr y[0] lives on the heap (seq elements are allocated on heap), so there is no stack escaping possible
block creates a symbol resolution scope, not a stack frame scope, so whatever's stack allocated inside a block lives past end of the block until the function it's in returns

here's a correct example:
nim r --staticescapechecks main

var x: ptr int
block:
  var y = [10, 11, 12]
  x = addr y[0] # no warning, block in same stack frame as x

proc fn()=
  var y1 = @[10,11,12]
  var y2 = [10,11,12]
  x = addr y1[0] # no warning, that address is in heap
  x = addr y2[0] # Warning: local 'fn.y2' escapes via 'x' in 'x = addr y2[0]'
  # but the address of the seq itself is on the stack:
  var g {.global.}: ptr seq[int]
  g = y1.addr # Warning: local 'fn.y1' escapes via 'g' in 'g = addr(y1)'

Varriount · 2020-07-14T18:58:45Z

block creates a symbol resolution scope, not a stack frame scope, so whatever's stack allocated inside a block lives past end of the block until the function it's in returns

But there's nothing to prevent the Nim compiler from inserting destructors at the end of a block for variables present only in the block's scope, right?

I guess what I'm arguing for is that this analysis should also look at symbol scope, since it's unlikely for someone to intentionally want to access a variable that's been destroyed/is no longer accessible by symbolic means.

timotheecour · 2020-07-14T19:50:44Z

But there's nothing to prevent the Nim compiler from inserting destructors at the end of a block for variables present only in the block's scope, right?

a destructor can't deallocate stack allocated memory, so stack allocated memory can't be invalidated even if a destructor is called. If you disagree please provide an example.

timotheecour · 2020-07-14T20:13:50Z

Please be aware that return/result is not the only way to pass information up the call chain, contrived example:

I know; note that your example is correctly handled (gives Warning: local 'p.y' escapes via 'x' in 'x = addr(y)' ) and is already in the test suite;

however what currently is not supported is the following:

when true:
  proc fun(a: var ptr int, x: ptr int) = a = x
  proc main: ptr int =
    var l0=0
    fun(result, l0.addr)

and it's already catalogued in the test suite (see D20200711T130559) under failing tests as a known current limitation: right now i'm only tracking assignments lhs = rhs (with a lhs that contains some view, ie containsView returns true), but for those assignments, I'm correctly tracking escapes through all parameters/return/locals;

what I'm not yet tracking is function calls like fun(args) when there's no lhs at all (or an assignment with lhs that doesn't contain some view), but the mechanism would be very similar for this case.

Clyybber · 2020-07-14T21:14:56Z

@timotheecour

a destructor can't deallocate stack allocated memory, so stack allocated memory can't be invalidated even if a destructor is called. If you disagree please provide an example.

The pointer of a ref is stackallocated. And if the ref is destroyed the pointer will point into oblivion.

timotheecour · 2020-07-14T22:24:37Z

The pointer of a ref is stackallocated. And if the ref is destroyed the pointer will point into oblivion.

as I said above, please provide a complete example for what you're describing. destructors are for tyObject, not tyRef.

Clyybber · 2020-07-14T23:13:21Z

tyRef also uses destructors in gc:arc, and destructors can be used to implement custom ref types. Those would be tyObject.

timotheecour · 2020-07-14T23:26:07Z

using an object after it was destroyed but that lives in the same stack frame is a different kind of problem than using memory that was allocated in a different stack frame.

Handling the 1st case is useful but can be done in future work (and is "maybe" a very simple modification to outlives introduced in this PR, that considers the block owner instead of the proc owner), and perhaps using a different warning.

when true:
  type Foo = object
    x: seq[int]
  proc `=destroy`(a: var Foo) =
    echo ("dtor", a.x)
    a.x.reset
  proc main() =
    var f: ptr Foo
    block:
      var f1 = Foo(x: @[10])
      f = f1.addr
      echo f[]
    echo f[]
  main()

prints:

(x: @[10])
("dtor", @[10])
(x: @[])

=> not the same kind of corruption as that arising from dereferencing a pointer that was allocated in a different stackframe, eg you can't access invalid memory like that, the worse you get is an allocated object but in a destroyed state.

Clyybber · 2020-07-14T23:38:00Z

Your example relies on the fact that seqs can't be nil. Replace the seq with a ref or a ptr. It will crash.
Escape analysis must take the destructor scope into account to be useful.

timotheecour · 2020-07-14T23:41:01Z

Your example relies on the fact that seqs can't be nil. Replace the seq with a ref or a ptr. It will crash.

it doesn't change the example fundamentally, you'd get SIGSEGV if trying to access Foo.x[] if the destructor calls reset(a.x)

type Foo = object
    x: ref int

again, it's a different kind of memory problem: SIGSEGV is the not same as dereference garbage on the stack. block scope can be handled in future work IMO.

Clyybber · 2020-07-14T23:43:33Z

It's not memory safe. Keep in mind you don't know what the destructor will do with the pointer/ref; destructors do not have to reset their target anymore.

timotheecour · 2020-07-15T01:42:53Z

Keep in mind you don't know what the destructor will do with the pointer/ref; destructors do not have to reset their target anymore.

I know, but like i said, it's a different kind of memory issue: 1 is stack corruption, the other is accessing memory that was invalidated for some other reason (eg use after free).

In any case, I've now pushed a commit that fixes this, see fn16 in tviewfroms.nim which tests all possible scenarios wrt block escaping ; in particular it now gives local 'main.f1' escapes via 'f' for #14976 (comment) but see special semantics for global variables (and see caveat related to #14986)

Araq · 2020-07-19T06:38:14Z

If you want reviews, un-draft your PRs. :-)

timotheecour · 2020-07-19T07:16:45Z

If you want reviews, un-draft your PRs. :-)

done, some files are needed for debugging and will be removed for final version so ignore those:

compiler/debugutils.nim
compiler/debugutils_basic.nim + ndebugSetConfigExt
compiler/nim.nim
debugTmp
compiler/pragmas.nim + compiler/wordrecg.nim (the idea is a {.viewfrom: .... .} pragma for fwd procs to specify escape constraints, but this should probably be instead in another followup PR so ignore it from there)

also, it would really help if #15016 could be merged and would cut down on the diff (needed for getCapturedMsgs; the tests in this PR would really be worse to read/understand/maintain without this feature and using testament spec)

and instead focus on those:

compiler/ast.nim
compiler/lifetimes.nim
compiler/interfaces.nim
compiler/sem.nim
compiler/semexprs.nim
compiler/semstmts.nim
compiler/vmops.nim (just for viewConstraintsStr which is used only for testing, for exposing the escape constraints on symbols for a proc)
tests/misc/mviewfroms.nim
tests/misc/tviewfroms.nim

the most important ones are these:

tests/misc/tviewfroms.nim (all the tests <= start here)
compiler/lifetimes.nim (all the escape logic handled here)
compiler/ast.nim (small diff, shows what's added)

Araq · 2020-07-20T06:39:29Z

tests/misc/tviewfroms.nim

+        # this would hold currently and seems correct but is inconsistent
+        # until bug #14986 is resolved
+
+      # nim bug: there's no way to distinguish between 


There is, the .global is {sfGlobal, sfPure} iirc.

Araq · 2020-07-20T07:00:10Z

Well it's a good start.

Notes on the implementation

The escaping checks should be moved to sempass2.nim, if possible.
There is no RFC for the new pragma you introduced and apparently it's not used in the tests either (?).
Do not use importc, exportc to work around the cyclic deps problem, instead the Nim compiler uses callbacks inside semdata.nim.
Do not add the escaping information to the PSym in ast.nim, it's information you can keep in a side table, with a mapping from PSym -> CustomInfo.
We don't "visit" nodes in the compiler, we "traverse" a tree structure. (Nitpicking, I know.)
Too many break and return statements make the logic harder to follow. I know you don't agree, it's objective though. When you traverse a tree, it's crucial you don't forget to traverse nodes, we care about "forall" as often as we care about "exists", so every time you write "break" or "return" the "forall" property is harder to see.
Do not use of {a, b, c}, it's of a, b, c instead.

Notes on the design

There is no RFC to back this up, lent/var/openArray inside objects/containers RFCs#178 is about allowing for lent/var/openArray inside objects. This is an important difference! We don't want a mere linter for ptr T, ptr T is unsafe and will never be 100% safe, we seek to expand what is expressible in the memory safe subset of Nim.
The focus on scopes and "escaping the stack frame" is slightly misdirected, consider this problem:

var s: seq[int] 

var x = keepVarness(s[i])
x = 4 # ok, mutate s[i]
setLen s, 0
x = 4 # invalid write!

RFC #178 is fundamentally about detecting this case, whether var x = addr(someLocalInADifferentBlock) is safe or not is a minor problem compared to this.

Araq · 2020-07-22T08:04:10Z

Please have a look at #15030 which adds a mechanism of comparable complexity to the Nim compiler, without touching ast.nim, without introducing more recursive dependencies, in 250 lines of code. Of course #15030 is far from perfection too, but I hope it inspires you. (It's also covered by an RFC. ;-) )

Araq · 2020-10-28T18:44:35Z

We got --experimental:views instead.

timotheecour · 2021-06-18T07:36:12Z

var s: seq[int] 

var x = keepVarness(s[i])
x = 4 # ok, mutate s[i]
setLen s, 0
x = 4 # invalid write!

@Araq I don't understand this example, var x = keepVarness(s[i]) would create a copy, x is int; can you write another example that runs and produces bad results?

Araq · 2021-06-18T13:33:07Z

{.experimental: "views".}

type
  Foo = object
    field: string

proc valid(s: var seq[Foo]) =
  let v: lent Foo = s[0]  # begin of borrow
  echo v.field            # end of borrow
  s.setLen 0  # valid because 'v' isn't used afterwards

proc dangerous(s: var seq[Foo]) =
  let meh: lent Foo = s[0]
  s.setLen 0
  echo meh.field

timotheecour · 2021-07-08T01:13:07Z

@Araq {.views.} is easily fooled, eg this will compile and bypass the mutation check:

  proc dangerous(s: var seq[Foo]) =
    let meh: lent Foo = s[0]
    proc bar() = echo meh.field
    s.setLen 0
    bar()

but let's discuss views separately and focus on this PR.

This PR has the mechanism and modeling in place via symbol based constraint propagation to accomodate such kind of analysis:
recall that this PR introduces viewConstraints*: seq[ViewConstraint] attached to routineKinds symbols:

type
  ViewConstraint* = object
    # we could model other constraints, eg whether a parameter is being written to
    lhs*: PSym
    rhs*: PSym
    addrLevel*: int  # see also `ViewDep`

eg in the following example it gives (there are more examples in PR):

  block:
    type Foo = ref object
      f0: Foo
    proc fn24(a: Foo): auto =
      result = (a.f0.f0, a.f0)
    doAssert viewConstraints(fn24) == "fn24.result => fn24.a:-1; "

this can be extended to model the fact that a routine (eg setLen) invalidates a view (ie, a PSym + addrLevel), eg via:

type
  ViewConstraint* = object
    lhs*: PSym # lhs = nil could be used to encode an invalidation constraint
    rhs*: PSym
    addrLevel*: int  # see also `ViewDep`

example

proc fn(a, b: var seq[int]) =
  a.setLen 0
  b.setLen 0
proc dangerous(s, s2: var seq[Foo]) =
  let meh: lent Foo = s[0] # by constraint propagation, we know `meh => s`
  echo meh.field # ok
  fn(s, s2) # by constraint propagation we know this invalidates s, s2
  echo meh.field # this would issue a warning because meh => s was invalidated

the analysis would then proceed in a similar way as done in this PR, adding a check that a view doesn't get invalidated via (possibly indirect) function calls.

Araq · 2021-07-08T09:47:06Z

{.views.} is easily fooled, eg this will compile and bypass the mutation check

That's just a bug, it doesn't imply to switch strategies altogether over to your way which has no spec and not even a draft of a spec (!).

timotheecour · 2021-07-09T01:24:19Z

no spec and not even a draft of a spec (!).

I will add a spec.

Araq · 2021-09-19T22:34:38Z

@Araq {.views.} is easily fooled, eg this will compile and bypass the mutation check:

  proc dangerous(s: var seq[Foo]) =
    let meh: lent Foo = s[0]
    proc bar() = echo meh.field
    s.setLen 0
    bar()

Not true but it's done in lambda lifting and so unfortunately only triggers when you try to use the code:

proc dangerous(s: var seq[Foo]) =
  let meh: lent Foo = s[0]
  proc bar() = echo meh.field
  s.setLen 0
  bar()

var s = @[Foo(field: "")]
dangerous(s)

stale · 2022-09-20T18:57:03Z

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

timotheecour mentioned this pull request Jul 14, 2020

allocate closure env on stack if viable (wip) #14881

Closed

Araq reviewed Jul 14, 2020

View reviewed changes

lib/system/excpt.nim Outdated Show resolved Hide resolved

timotheecour added a commit to timotheecour/Nim that referenced this pull request Jul 14, 2020

misc changes pulled out from nim-lang#14976

290d16c

timotheecour force-pushed the pr_addrFrom branch 2 times, most recently from f99e3d1 to c615dfc Compare July 14, 2020 19:22

timotheecour mentioned this pull request Jul 14, 2020

misc changes pulled out from #14976 #14982

Closed

timotheecour mentioned this pull request Jul 15, 2020

{.global.} var calls destroy when goes out of scope (unlike {.threadvar.}) #14986

Open

timotheecour added 9 commits July 18, 2020 17:45

escape analysis

ef5bf60

fixup

35f5977

_

c65e7ae

_

c6cd153

_

5bf81f4

escape analysis now disallows block scope escape

2b8b651

factor out simulateCall

e5efaa1

_

a469810

_

776c9b3

timotheecour marked this pull request as ready for review July 19, 2020 06:52

Araq reviewed Jul 20, 2020

View reviewed changes

timotheecour added a commit to timotheecour/Nim that referenced this pull request Oct 10, 2020

misc changes pulled out from nim-lang#14976

ed174c3

Araq closed this Oct 28, 2020

timotheecour mentioned this pull request Nov 7, 2020

Setting subscript index of cstring is allowed and generates bad code #14157

Open

timotheecour mentioned this pull request Nov 23, 2020

views: 1st class openArray timotheecour/Nim#194

Open

timotheecour mentioned this pull request Dec 9, 2020

strictFuncs/views design issue #16305

Closed

This was referenced Dec 17, 2020

std/lists: support destructive add/prepend for SinglyLinkedList, DoublyLinkedList + other lists improvements nim-lang/RFCs#303

Open

add strbasics.strip #16280

Merged

timotheecour mentioned this pull request Mar 21, 2021

compiles(expr) with context: let context = compilesWithContext(expr)` with context.{ok, warnMsg, errorMsg} to simplify tests #8803

Closed

timotheecour mentioned this pull request Jun 18, 2021

Deprecate 'unsafeAddr' and make 'addr' available for all addressable locations nim-lang/RFCs#369

Closed

timotheecour added the Postponed label Jun 18, 2021

timotheecour mentioned this pull request Jul 8, 2021

random crash caused by createThread #6875

Closed

timotheecour reopened this Jul 8, 2021

timotheecour mentioned this pull request Jul 30, 2021

raises tracking should behave like nosideeffect tracking: effects through params should be allowed nim-lang/RFCs#403

Closed

timotheecour mentioned this pull request Aug 11, 2021

parametric effects: proc fn(a: proc()) {.raises: [a].} nim-lang/RFCs#408

Closed

stale bot added the stale Staled PR/issues; remove the label after fixing them label Sep 20, 2022

stale bot closed this Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

escape analysis: enables memory safe views (cf 1st class openarray) and optimizations #14976

escape analysis: enables memory safe views (cf 1st class openarray) and optimizations #14976

timotheecour commented Jul 14, 2020 •

edited

Loading

Araq commented Jul 14, 2020 •

edited by timotheecour

Loading

Varriount commented Jul 14, 2020 •

edited by timotheecour

Loading

timotheecour commented Jul 14, 2020 •

edited

Loading

Varriount commented Jul 14, 2020

timotheecour commented Jul 14, 2020

timotheecour commented Jul 14, 2020 •

edited

Loading

Clyybber commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 14, 2020

Clyybber commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 14, 2020 •

edited

Loading

Clyybber commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 14, 2020 •

edited

Loading

Clyybber commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 15, 2020 •

edited

Loading

Araq commented Jul 19, 2020

timotheecour commented Jul 19, 2020 •

edited

Loading

Araq Jul 20, 2020

Araq commented Jul 20, 2020

Araq commented Jul 22, 2020

Araq commented Oct 28, 2020

timotheecour commented Jun 18, 2021

Araq commented Jun 18, 2021

timotheecour commented Jul 8, 2021 •

edited

Loading

Araq commented Jul 8, 2021

timotheecour commented Jul 9, 2021

Araq commented Sep 19, 2021

stale bot commented Sep 20, 2022

escape analysis: enables memory safe views (cf 1st class openarray) and optimizations #14976

escape analysis: enables memory safe views (cf 1st class openarray) and optimizations #14976

Conversation

timotheecour commented Jul 14, 2020 • edited Loading

example (see a lot more in tviewfroms.nim)

notes

future work

Araq commented Jul 14, 2020 • edited by timotheecour Loading

Varriount commented Jul 14, 2020 • edited by timotheecour Loading

timotheecour commented Jul 14, 2020 • edited Loading

Varriount commented Jul 14, 2020

timotheecour commented Jul 14, 2020

timotheecour commented Jul 14, 2020 • edited Loading

Clyybber commented Jul 14, 2020 • edited Loading

timotheecour commented Jul 14, 2020

Clyybber commented Jul 14, 2020 • edited Loading

timotheecour commented Jul 14, 2020 • edited Loading

Clyybber commented Jul 14, 2020 • edited Loading

timotheecour commented Jul 14, 2020 • edited Loading

Clyybber commented Jul 14, 2020 • edited Loading

timotheecour commented Jul 15, 2020 • edited Loading

Araq commented Jul 19, 2020

timotheecour commented Jul 19, 2020 • edited Loading

Araq Jul 20, 2020

Choose a reason for hiding this comment

Araq commented Jul 20, 2020

Notes on the implementation

Notes on the design

Araq commented Jul 22, 2020

Araq commented Oct 28, 2020

timotheecour commented Jun 18, 2021

Araq commented Jun 18, 2021

timotheecour commented Jul 8, 2021 • edited Loading

example

Araq commented Jul 8, 2021

timotheecour commented Jul 9, 2021

Araq commented Sep 19, 2021

stale bot commented Sep 20, 2022

timotheecour commented Jul 14, 2020 •

edited

Loading

Araq commented Jul 14, 2020 •

edited by timotheecour

Loading

Varriount commented Jul 14, 2020 •

edited by timotheecour

Loading

timotheecour commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 14, 2020 •

edited

Loading

Clyybber commented Jul 14, 2020 •

edited

Loading

Clyybber commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 14, 2020 •

edited

Loading

Clyybber commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 14, 2020 •

edited

Loading

Clyybber commented Jul 14, 2020 •

edited

Loading

timotheecour commented Jul 15, 2020 •

edited

Loading

timotheecour commented Jul 19, 2020 •

edited

Loading

timotheecour commented Jul 8, 2021 •

edited

Loading