Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV with object variants and RTTI #23690

Closed
alex65536 opened this issue Jun 7, 2024 · 10 comments · Fixed by #23703
Closed

SIGSEGV with object variants and RTTI #23690

alex65536 opened this issue Jun 7, 2024 · 10 comments · Fixed by #23703

Comments

@alex65536
Copy link
Contributor

alex65536 commented Jun 7, 2024

Description

The simplest reproducer I have found by now is:

import std/sequtils

type
  SomeObj* = object of RootObj

  Item* = object
    case kind*: 0..1
    of 0:
      a*: int
      b*: SomeObj
    of 1:
      c*: string

  ItemExt* = object
    a*: Item
    b*: string

proc do1(x: int): seq[(string, Item)] =
  result = @[("zero", Item(kind: 1, c: "first"))]

proc do2(x: int, e: ItemExt): seq[(string, ItemExt)] =
  do1(x).map(proc(v: (string, Item)): auto = (v[0], ItemExt(a: v[1], b: e.b)))

echo do2(0, ItemExt(a: Item(kind: 1, c: "second"), b: "third"))

This code stably leads to segfault under my amd64 Linux machine (tested both on 2.0.4 and devel)

Nim Version

$ nim -v
Nim Compiler Version 2.1.1 [Linux: amd64]
Compiled at 2024-06-06
Copyright (c) 2006-2024 by Andreas Rumpf

git hash: 69d0b73d667c4be9383f29cda3f70e411995d9af
active boot switches: -d:release

Also reproducible on stable 2.0.4.

Current Output

Traceback (most recent call last)
/home/user/repro.nim(24) repro
/home/user/repro.nim(22) do2
/home/user/.choosenim/toolchains/nim-#devel/lib/system/alloc.nim(1068) dealloc
/home/user/.choosenim/toolchains/nim-#devel/lib/system/alloc.nim(967) rawDealloc
/home/user/.choosenim/toolchains/nim-#devel/lib/system/alloc.nim(770) addToSharedFreeListBigChunks
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Segmentation fault

Expected Output

@[("zero", (a: (kind: 1, c: "first"), b: "third"))]

Possible Solution

No response

Additional Information

Some extra debugging shows more details:

  • replacing sequtils.map with a loop makes the issue go away
  • compiling with --mm:refc doesn't reproduce the bug, but --mm:arc and --mm:orc do

Debugging points out that eqcopy implementation is likely at fault. The compiled C code is as follows:

N_LIB_PRIVATE N_NIMCALL(void, eqcopy___repro_u284)(tyObject_ItemExt__D8Kw4o1swHAWDzvwwWvsFw* dest_p0, tyObject_ItemExt__D8Kw4o1swHAWDzvwwWvsFw* src_p1) {
        tyObject_Item__jnMhVK55Yw3PPq6Gy9aXzMQ colontmp_;
        colontmp_ = (*dest_p0).a;
        nimZeroMem((void*)(&(*dest_p0).a), sizeof(tyObject_Item__jnMhVK55Yw3PPq6Gy9aXzMQ));
        (*dest_p0).a = TM__ckn2SibXCdoGNfJOD0JenA_5;
        (*dest_p0).a.kind = (*src_p1).a.kind;
        switch ((*dest_p0).a.kind) {
        case ((NI)0):
        {
                (*dest_p0).a._kind_1.a = (*src_p1).a._kind_1.a;
        }
        break;
        case ((NI)1):
        {
                eqcopy___system_u2621((&(*dest_p0).a._kind_2.c), (*src_p1).a._kind_2.c);
        }
        break;
        default: __builtin_unreachable();
        }
        switch (colontmp_.kind) {
        case ((NI)0):
        {
        }
        break;
        case ((NI)1):
        {
                if (colontmp_._kind_2.c.p && !(colontmp_._kind_2.c.p->cap & NIM_STRLIT_FLAG)) {
 deallocShared(colontmp_._kind_2.c.p);
}
        }
        break;
        default: __builtin_unreachable();
        }
        eqcopy___system_u2621((&(*dest_p0).b), (*src_p1).b);
}

It does the following:

  • zeroes memory under dest_p0->a
  • copies some pre-defined constant TM__ckn2SibXCdoGNfJOD0JenA_5 into it. As far as I understand, it is the default value of Item, and as a default value, it has kind: 0 and, because kind: 0 contains SomeObj, the contents of the union are filled with some RTTI data
  • then we assign kind, switching the active union element ((*dest_p0).a.kind = (*src_p1).a.kind). Note that nothing is zeroed at the moment, and all the RTTI data are still in the union
  • then we try to copy to the string, which has the same offset in the union as RTTI data, and copying damages these RTTI data
  • program crashes :(
@alex65536
Copy link
Contributor Author

!nim c

Copy link
Contributor

github-actions bot commented Jun 7, 2024

🐧 Linux bisect by @alex65536 (contributor)
devel 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:24
  • Finished 2024-06-07T03:45:24
  • Duration

AST

stable 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:25
  • Finished 2024-06-07T03:45:25
  • Duration now

AST

2.0.4 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:25
  • Finished 2024-06-07T03:45:25
  • Duration now

AST

2.0.0 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:28
  • Finished 2024-06-07T03:45:28
  • Duration now

AST

1.6.20 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:31
  • Finished 2024-06-07T03:45:31
  • Duration now

AST

1.4.8 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:33
  • Finished 2024-06-07T03:45:33
  • Duration now

AST

1.2.18 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:35
  • Finished 2024-06-07T03:45:35
  • Duration

AST

1.0.10 👎 FAIL

Output


IR

Compiled filesize 0 bytes (0 bytes)

Stats

  • Started 2024-06-07T03:45:37
  • Finished 2024-06-07T03:45:37
  • Duration now

AST

Stats
  • GCC 11.4.0
  • Clang 14.0.0
  • NodeJS 20.3
  • Created 2024-06-07T03:44:52Z
  • Comments 1
  • Commands nim c --run -d:nimDebug -d:nimDebugDlOpen -d:ssl -d:nimDisableCertificateValidation --forceBuild:on --colors:off --verbosity:0 --hints:off --lineTrace:off --nimcache:/home/runner/work/Nim/Nim --out:/home/runner/work/Nim/Nim/temp /home/runner/work/Nim/Nim/temp.nim

🤖 Bug found in 16 minutes bisecting 8 commits at 0 commits per second

@alex65536
Copy link
Contributor Author

!nim c

import std/sequtils

type
  SomeObj* = object of RootObj

  Item* = object
    case kind*: 0..1
    of 0:
      a*: int
      b*: SomeObj
    of 1:
      c*: string

  ItemExt* = object
    a*: Item
    b*: string

proc do1(x: int): seq[(string, Item)] =
  result = @[("zero", Item(kind: 1, c: "first"))]

proc do2(x: int, e: ItemExt): seq[(string, ItemExt)] =
  do1(x).map(proc(v: (string, Item)): auto = (v[0], ItemExt(a: v[1], b: e.b)))

echo do2(0, ItemExt(a: Item(kind: 1, c: "second"), b: "third"))

@ringabout
Copy link
Member

@alex65536 wow, your insights are really helpful. I think it's mostly likely that copy needs to zero memery the union before assignement. I'm working on it. Thank you very much!

@ringabout
Copy link
Member

It crashed with 1.6.16 with --mm:orc -d:useMalloc

@alex65536
Copy link
Contributor Author

alex65536 commented Jun 7, 2024

Digging further into the compiler code, I noticed the following:

  1. Here is the algorithm for copy-assignment of object variants. The main take here is that we move the old object into the new location and call wasMoved for the old object.
  2. wasMoved is basically implemented here. Among other things, it calls resetLoc(), which in our case (code) generates nimZeroMem + default-initialization.
  3. Zeroing memory is OK, but re-intialization with the default value for this moved-out object is actually the source of this bug.

I am not proficient in the compiler internals at all, so everything below this text in this comment is just a wild guess from a person who has seen the compiler code today for the first time:

  • I am unsure whether zeroing memory is OK, because the next part of the code calls copy assign for each sub-variant, and I don't know how destructors of old values behave when they encounter zeroed memory (maybe everything is OK, but again, I don't know the compiler internals well).
  • Another solution would probably be resetting the location before doing recursive assignment inside the case statement (not outside). It could be slower, but seems more reliable for me.

@beef331
Copy link
Collaborator

beef331 commented Jun 7, 2024

how destructors of old values behave when they encounter zeroed memory

Destructors are supposed to be written in such a way that 0'd memory indicates "this was moved"

@alex65536
Copy link
Contributor Author

alex65536 commented Jun 7, 2024

Destructors are supposed to be written in such a way that 0'd memory indicates "this was moved"

OK, thank you, then zeroing is a nice idea indeed :)

And what happens if a =copy() implementation hooked by user receives zeroed memory as a destination? Or is there anything that prevents such thing from happening?

@beef331
Copy link
Collaborator

beef331 commented Jun 7, 2024

You call =destroy then copy the data over, there is no special logic for data that is 0'd =wasMoved should disarm =destroy.

@Araq
Copy link
Member

Araq commented Jun 9, 2024

Destructors are supposed to be written in such a way that 0'd memory indicates "this was moved"

Note that we're moving away from 0'd memory to the more abstract "=wasMoved/default state".

Araq pushed a commit that referenced this issue Jun 11, 2024
fixes #23690

```nim
dest.`:state` = src.`:state`
var :tmp_553651276 = dest.e1.a
`=wasMoved`(dest.e1.a)
dest.e1.a.kind = src.e1.a.kind
case dest.e1.a.kind
of 0:
  dest.e1.a.a = src.e1.a.a
of 1:
  `=copy`(dest.e1.a.c, src.e1.a.c)
case :tmp_553651276.kind
of 0:
of 1:
  `=destroy`(:tmp_553651276.c)
```
`dest.e1.a.kind = src.e1.a.kind` changes the discrimant but it fails to
clear the memory of `dest.e1.a`. Before using hooks for copying, we need
to clear the dest, e.g. `=wasMoved(dest.e1.a.c)`.

```nim
dest.`:state` = src.`:state`
var :tmp_553651276 = dest.e1.a
`=wasMoved`(dest.e1.a)
dest.e1.a.kind = src.e1.a.kind
case dest.e1.a.kind
of 0:
  `=wasMoved`(dest.e1.a.a)
  dest.e1.a.a = src.e1.a.a
  `=wasMoved`(dest.e1.a.b)
of 1:
  `=wasMoved`(dest.e1.a.c)
  `=copy`(dest.e1.a.c, src.e1.a.c)
case :tmp_553651276.kind
of 0:
of 1:
  `=destroy`(:tmp_553651276.c)
```
narimiran pushed a commit that referenced this issue Jun 11, 2024
fixes #23690

```nim
dest.`:state` = src.`:state`
var :tmp_553651276 = dest.e1.a
`=wasMoved`(dest.e1.a)
dest.e1.a.kind = src.e1.a.kind
case dest.e1.a.kind
of 0:
  dest.e1.a.a = src.e1.a.a
of 1:
  `=copy`(dest.e1.a.c, src.e1.a.c)
case :tmp_553651276.kind
of 0:
of 1:
  `=destroy`(:tmp_553651276.c)
```
`dest.e1.a.kind = src.e1.a.kind` changes the discrimant but it fails to
clear the memory of `dest.e1.a`. Before using hooks for copying, we need
to clear the dest, e.g. `=wasMoved(dest.e1.a.c)`.

```nim
dest.`:state` = src.`:state`
var :tmp_553651276 = dest.e1.a
`=wasMoved`(dest.e1.a)
dest.e1.a.kind = src.e1.a.kind
case dest.e1.a.kind
of 0:
  `=wasMoved`(dest.e1.a.a)
  dest.e1.a.a = src.e1.a.a
  `=wasMoved`(dest.e1.a.b)
of 1:
  `=wasMoved`(dest.e1.a.c)
  `=copy`(dest.e1.a.c, src.e1.a.c)
case :tmp_553651276.kind
of 0:
of 1:
  `=destroy`(:tmp_553651276.c)
```

(cherry picked from commit 262ff64)
narimiran pushed a commit that referenced this issue Jun 11, 2024
fixes #23690

```nim
dest.`:state` = src.`:state`
var :tmp_553651276 = dest.e1.a
`=wasMoved`(dest.e1.a)
dest.e1.a.kind = src.e1.a.kind
case dest.e1.a.kind
of 0:
  dest.e1.a.a = src.e1.a.a
of 1:
  `=copy`(dest.e1.a.c, src.e1.a.c)
case :tmp_553651276.kind
of 0:
of 1:
  `=destroy`(:tmp_553651276.c)
```
`dest.e1.a.kind = src.e1.a.kind` changes the discrimant but it fails to
clear the memory of `dest.e1.a`. Before using hooks for copying, we need
to clear the dest, e.g. `=wasMoved(dest.e1.a.c)`.

```nim
dest.`:state` = src.`:state`
var :tmp_553651276 = dest.e1.a
`=wasMoved`(dest.e1.a)
dest.e1.a.kind = src.e1.a.kind
case dest.e1.a.kind
of 0:
  `=wasMoved`(dest.e1.a.a)
  dest.e1.a.a = src.e1.a.a
  `=wasMoved`(dest.e1.a.b)
of 1:
  `=wasMoved`(dest.e1.a.c)
  `=copy`(dest.e1.a.c, src.e1.a.c)
case :tmp_553651276.kind
of 0:
of 1:
  `=destroy`(:tmp_553651276.c)
```

(cherry picked from commit 262ff64)
alex65536 added a commit to alex65536/dirutils that referenced this issue Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants