Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEV encountered when creating threads in a loop w/ --gc:arc #13935

Closed
zacharycarter opened this issue Apr 9, 2020 · 19 comments
Closed

SIGSEV encountered when creating threads in a loop w/ --gc:arc #13935

zacharycarter opened this issue Apr 9, 2020 · 19 comments

Comments

@zacharycarter
Copy link
Contributor

zacharycarter commented Apr 9, 2020

SIGSEV encountered when creating threads in a loop w/ --gc:arc.

Example

import osproc

var threads: seq[Thread[int]]

proc threadFn(a: int) {.thread.} =
  echo a

let numThreads = countProcessors() - 1

threads.setLen(numThreads)

for i in 0 ..< numThreads:
  createThread(threads[i], threadFn, i)
joinThreads(threads)

Current output

➜  junkers git:(master) ✗ nim c --gc:arc --threads:on --debugger:native -r thread_error.nim
Hint: used config file '/Users/zacharycarter/.choosenim/toolchains/nim-#devel/config/nim.cfg' [Conf]
Hint: system [Processing]
Hint: repr_v2 [Processing]
Hint: widestrs [Processing]
Hint: io [Processing]
Hint: thread_error [Processing]
Hint: osproc [Processing]
Hint: strutils [Processing]
Hint: parseutils [Processing]
Hint: math [Processing]
Hint: bitops [Processing]
Hint: macros [Processing]
Hint: algorithm [Processing]
Hint: unicode [Processing]
Hint: os [Processing]
Hint: pathnorm [Processing]
Hint: osseps [Processing]
Hint: posix [Processing]
Hint: times [Processing]
Hint: options [Processing]
Hint: typetraits [Processing]
Hint: strtabs [Processing]
Hint: hashes [Processing]
Hint: streams [Processing]
Hint: cpuinfo [Processing]
Hint: kqueue [Processing]
CC: stdlib_system.nim
CC: thread_error.nim
Hint:  [Link]
Hint: dsymutil /Users/zacharycarter/dev/junkers/thread_error [Exec]
Hint: 39581 LOC; 0.993 sec; 52.48MiB peakmem; Debug build; proj: /Users/zacharycarter/dev/junkers/thread_error.nim; out: /Users/zacharycarter/dev/junkers/thread_error [SuccessX]
Hint: /Users/zacharycarter/dev/junkers/thread_error  [Exec]
0
3
2
No stack traceback available
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
1
5
No stack traceback available
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
4

Additional Information

https://gist.github.com/zacharycarter/dbee66c50a6c26e7f187512235c4ed1f

➜  junkers git:(master) ✗ nim -v
Nim Compiler Version 1.3.1 [MacOSX: amd64]
Compiled at 2020-04-09
Copyright (c) 2006-2020 by Andreas Rumpf

active boot switches: -d:release
@zacharycarter zacharycarter changed the title SIGSEV encountered when creating threads in a loop SIGSEV encountered when creating threads in a loop w/ --gc:arc Apr 9, 2020
@cooldome
Copy link
Member

Hi,
Could you please provide more details like what c/cpp compiler you have used and details OS version. I wan't this one fixed, but I can't replicate it on my PC.
It is likely are a reason for failed tests of #13897, but I can't progress because everything just works for me.

@zacharycarter
Copy link
Contributor Author

@cooldome - sure

I'm using clang / clang++ and OS is macOS Mojave 10.14.6 (18G103)

@cooldome
Copy link
Member

cooldome commented Apr 15, 2020

Hmm, I am using the same and it works for me. Looks like I can't help.

@Yardanico
Copy link
Collaborator

Yardanico commented Apr 15, 2020

@cooldome I can reproduce this behaviour with -d:release (or -d:danger), in debug build it just hangs forever, I'm on Linux amd64 (Bedrock Linux) on latest devel, maybe I can help you debug this?

I compile and run like nim c --threads:on -d:danger --gc:arc -r main.nim.

Seems like it hangs if I compile with --stacktrace:on and if I compile with --stacktrace:off (or with -d:release or -d:danger) it crashes.

@Yardanico
Copy link
Collaborator

Yardanico commented Apr 15, 2020

gdb backtrace after nim c --stacktrace:off --debugger:native --threads:on --gc:arc b.nim

Reading symbols from ./b...
(gdb) run
Starting program: /home/dian/Projects/data/b 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff7be9700 (LWP 14579)]
0
[New Thread 0x7ffff79e8700 (LWP 14580)]
[Thread 0x7ffff7be9700 (LWP 14579) exited]
1
[New Thread 0x7ffff77e7700 (LWP 14581)]

Thread 3 "b" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff79e8700 (LWP 14580)]
0x0000000000401ace in del__Io5JDKCS5u26IEWw0J53hQ (a=a@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, t=t@entry=0x40c228 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw+12456>, 
    x=x@entry=140737349857312) at /home/dian/.nim/lib/system/avltree.nim:74
74        if isBottom(t): return
(gdb) bt
#0  0x0000000000401ace in del__Io5JDKCS5u26IEWw0J53hQ (a=a@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, t=t@entry=0x40c228 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw+12456>, 
    x=x@entry=140737349857312) at /home/dian/.nim/lib/system/avltree.nim:74
#1  0x0000000000401d58 in rawDealloc__K7uQ6aTKvW6OnOV8EMoNNQ (a=a@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, p=p@entry=0x7ffff7beb038) at /home/dian/.nim/lib/system/alloc.nim:867
#2  0x0000000000401dd5 in dealloc__Jg1OaY9ahkT3MBopLAXRSGw (allocator=allocator@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, p=p@entry=0x7ffff7beb038)
    at /home/dian/.nim/lib/system/alloc.nim:968
#3  0x0000000000401df9 in deallocSharedImpl__lmwgHsdhTsrQaepFju8wew (p=p@entry=0x7ffff7beb038) at /home/dian/.nim/lib/system/alloc.nim:1083
#4  0x0000000000401e10 in deallocShared (p=p@entry=0x7ffff7beb038) at /home/dian/.nim/lib/system/memalloc.nim:295
#5  0x0000000000404b41 in threadProcWrapper__oTnP9cUoE9cVTUL7iHAoIIAA (closure=0x7ffff7bea060) at /home/dian/.nim/lib/system/threads.nim:187
#6  0x00007ffff7e3df27 in start_thread (arg=<optimized out>) at pthread_create.c:479
#7  0x00007ffff7d6ee0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@Yardanico
Copy link
Collaborator

Yardanico commented Apr 15, 2020

@cooldome I just checked and I have the same behaviour with your #13897 :

$ bin/nim -v
Nim Compiler Version 1.3.1 [Linux: amd64]
Compiled at 2020-04-15
Copyright (c) 2006-2020 by Andreas Rumpf

git hash: aeaaad38674a7c8302461b208ad8824f7179a3c8
active boot switches: -d:release

$  bin/nim c --stacktrace:off --debugger:native --threads:on --gc:arc b.nim
...

And GDB:

Reading symbols from ./b...
(gdb) run
Starting program: /home/dian/Stuff/nim/b 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff7be9700 (LWP 23636)]
0
[New Thread 0x7ffff79e8700 (LWP 23637)]
[Thread 0x7ffff7be9700 (LWP 23636) exited]
1

Thread 3 "b" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff79e8700 (LWP 23637)]
0x0000000000401ace in del__Io5JDKCS5u26IEWw0J53hQ (a=a@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, 
    t=t@entry=0x40c228 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw+12456>, x=x@entry=140737349857312)
    at /home/dian/Stuff/nim/lib/system/avltree.nim:74
74        if isBottom(t): return
(gdb) bt
#0  0x0000000000401ace in del__Io5JDKCS5u26IEWw0J53hQ (a=a@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, 
    t=t@entry=0x40c228 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw+12456>, x=x@entry=140737349857312)
    at /home/dian/Stuff/nim/lib/system/avltree.nim:74
#1  0x0000000000401d58 in rawDealloc__K7uQ6aTKvW6OnOV8EMoNNQ (
    a=a@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, p=p@entry=0x7ffff7beb038)
    at /home/dian/Stuff/nim/lib/system/alloc.nim:867
#2  0x0000000000401dd5 in dealloc__Jg1OaY9ahkT3MBopLAXRSGw (
    allocator=allocator@entry=0x409180 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, p=p@entry=0x7ffff7beb038)
    at /home/dian/Stuff/nim/lib/system/alloc.nim:968
#3  0x0000000000401df9 in deallocSharedImpl__lmwgHsdhTsrQaepFju8wew (p=p@entry=0x7ffff7beb038)
    at /home/dian/Stuff/nim/lib/system/alloc.nim:1083
#4  0x0000000000401e10 in deallocShared (p=p@entry=0x7ffff7beb038)
    at /home/dian/Stuff/nim/lib/system/memalloc.nim:295
#5  0x0000000000404b91 in threadProcWrapper__oTnP9cUoE9cVTUL7iHAoIIAA (closure=0x7ffff7bea060)
    at /home/dian/Stuff/nim/lib/system/threads.nim:187
#6  0x00007ffff7e3df27 in start_thread (arg=<optimized out>) at pthread_create.c:479
#7  0x00007ffff7d6ee0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@cooldome
Copy link
Member

cooldome commented Apr 15, 2020

Yes, I can replicate it now. I found that it works if you compile with cpp nim cpp --threads:on -d:danger --gc:arc -r. @Yardanico, @zacharycarter could you please confirm you are getting the same behavior?

I also found it works with old nim 1.1.1 built on 2019-12-09 so it is some kind of regression likely exception handling related.

@Yardanico
Copy link
Collaborator

Yardanico commented Apr 15, 2020

@cooldome weirdly enough I still get SIGSEGV.

I tried using git bisect for the first time and with it found that be795bb (#13280) is the commit that broke this code (the commit before - 84e8477 - works just fine)

@Yardanico
Copy link
Collaborator

Yardanico commented Apr 15, 2020

Also tried running with ./koch temp c -d:useSysAssert --threads:on --gc:arc -r b.nim
(it was broken but I moved rawWrite in system.nim after the import of "system/ansi_c" and commented the only usage of it in seqs_v2 and it worked):

[SYSASSERT] sizeof FreeCell

that sysassert comes from mmdisp.nim line 87:

sysAssert(sizeof(Cell) == sizeof(FreeCell), "sizeof FreeCell")

Cell seems to be of sizeof 8, FreeCell - 16
cc @Araq I guess?

P.S.: It just works (:tm: ) if I comment line 867 in alloc.nim:
del(a, a.root, cast[int](addr(c.data)))
, but that's obviously not correct (although valgrind shows no leaks)

@Araq
Copy link
Member

Araq commented Apr 16, 2020

@Yardanico that assertion is simply wrong for ARC.

@zacharycarter
Copy link
Contributor Author

@cooldome confirmed that if I compile with nim cpp --threads:on -d:danger --gc:arc -r it works.

@zacharycarter
Copy link
Contributor Author

zacharycarter commented Apr 17, 2020

I believe it was actually this commit / PR which introduced the issue:

c334486
#12977

Note: --gc:arc --exceptions:setjmp works.

@Yardanico
Copy link
Collaborator

@Araq it still seems to be broken on devel - hangs without --stackTrace:off and crashed with it

@Yardanico
Copy link
Collaborator

Some logs:

2
0
/home/dian/.nim/lib/system/avltree.nim:74:9: runtime error: member access within null pointer of type 'tyObject_AvlNode__IaqjtwKhxLEpvDS9bct9blEw' (aka 'struct tyObject_AvlNode__IaqjtwKhxLEpvDS9bct9blEw')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/dian/.nim/lib/system/avltree.nim:74:9 in 
/home/dian/.nim/lib/system/avltree.nim:74:9: runtime error: applying zero offset to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/dian/.nim/lib/system/avltree.nim:74:9 in 
/home/dian/.nim/lib/system/avltree.nim:74:9: runtime error: load of null pointer of type 'tyObject_AvlNode__IaqjtwKhxLEpvDS9bct9blEw *' (aka 'struct tyObject_AvlNode__IaqjtwKhxLEpvDS9bct9blEw *')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/dian/.nim/lib/system/avltree.nim:74:9 in 
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
==22360== Thread 16:
==22360== Invalid read of size 8
==22360==    at 0x402198: del__Io5JDKCS5u26IEWw0J53hQ (.nim/lib/system/avltree.nim:74)
==22360==    by 0x4024F6: rawDealloc__K7uQ6aTKvW6OnOV8EMoNNQ (.nim/lib/system/alloc.nim:870)
==22360==    by 0x4057B4: dealloc__Jg1OaY9ahkT3MBopLAXRSGw (.nim/lib/system/alloc.nim:971)
==22360==    by 0x4057B4: deallocSharedImpl__lmwgHsdhTsrQaepFju8wew (.nim/lib/system/alloc.nim:1086)
==22360==    by 0x4057B4: deallocShared (.nim/lib/system/memalloc.nim:295)
==22360==    by 0x4057B4: threadProcWrapper__oTnP9cUoE9cVTUL7iHAoIIAA (.nim/lib/system/threads.nim:193)
==22360==    by 0x49C8F26: start_thread (pthread_create.c:479)
==22360==    by 0x4ADDE0E: clone (clone.S:95)
==22360==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22360== 
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Thread 3 "test2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff79e4700 (LWP 22569)]
0x0000000000402198 in del__Io5JDKCS5u26IEWw0J53hQ (a=0x40a118 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, 
    t=0x40c9c0 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw+10408>, x=140737349840928)
    at /home/dian/.nim/lib/system/avltree.nim:74
74        if isBottom(t): return
(gdb) bt
#0  0x0000000000402198 in del__Io5JDKCS5u26IEWw0J53hQ (a=0x40a118 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, 
    t=0x40c9c0 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw+10408>, x=140737349840928)
    at /home/dian/.nim/lib/system/avltree.nim:74
#1  0x00000000004024f7 in rawDealloc__K7uQ6aTKvW6OnOV8EMoNNQ (a=0x40a118 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, 
    p=0x7ffff7be7040) at /home/dian/.nim/lib/system/alloc.nim:870
#2  0x00000000004057b5 in dealloc__Jg1OaY9ahkT3MBopLAXRSGw (
    allocator=0x40a118 <sharedHeap__R3bhvQCN0d6AYpkvxfT9aGw>, p=0x7ffff7be7040)
    at /home/dian/.nim/lib/system/alloc.nim:971
#3  deallocSharedImpl__lmwgHsdhTsrQaepFju8wew (p=0x7ffff7be7040) at /home/dian/.nim/lib/system/alloc.nim:1086
#4  deallocShared (p=0x7ffff7be7040) at /home/dian/.nim/lib/system/memalloc.nim:295
#5  threadProcWrapper__oTnP9cUoE9cVTUL7iHAoIIAA (closure=0x7ffff7be6068)
    at /home/dian/.nim/lib/system/threads.nim:193
#6  0x00007ffff7e38f27 in start_thread (arg=<optimized out>) at pthread_create.c:479
#7  0x00007ffff7d69e0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@Yardanico
Copy link
Collaborator

Nim Compiler Version 1.3.5 [Linux: amd64]
Compiled at 2020-05-16
Copyright (c) 2006-2020 by Andreas Rumpf

git hash: c777f2fb608e1f42daf31d314dda0050f6354acd
active boot switches: -d:release

@Yardanico
Copy link
Collaborator

Yardanico commented Jun 12, 2020

Simplified example:

var threads: array[5, Thread[void]]

proc threadFn() {.thread.} =
  discard

proc main = 
  for i in 0 ..< 5:
    createThread(threads[i], threadFn)
  joinThreads(threads)

main()

@Yardanico
Copy link
Collaborator

With stacktrace:on hangs with this:

==26741== Thread 2:
==26741== Invalid read of size 8
==26741==    at 0x10BAD8: del__Io5JDKCS5u26IEWw0J53hQ (Things/Nim/lib/system/avltree.nim:74)
==26741==    by 0x10C6E4: rawDealloc__K7uQ6aTKvW6OnOV8EMoNNQ (Things/Nim/lib/system/alloc.nim:870)
==26741==    by 0x10CB1A: dealloc__Jg1OaY9ahkT3MBopLAXRSGw (Things/Nim/lib/system/alloc.nim:971)
==26741==    by 0x10CB7A: deallocSharedImpl__SAtZpVrJ3o5FJvSdLQe9auA (Things/Nim/lib/system/alloc.nim:1086)
==26741==    by 0x10CBA4: deallocShared (Things/Nim/lib/system/memalloc.nim:295)
==26741==    by 0x1143E2: threadProcWrapper__KwtUyNVh00QDWGRZcngjGA (Things/Nim/lib/system/threads.nim:193)
==26741==    by 0x49CE421: start_thread (in /usr/lib/libpthread-2.31.so)
==26741==    by 0x4AE6BF2: clone (in /usr/lib/libc-2.31.so)
==26741==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

Without it crashes with:

==27864== Thread 5:
==27864== Invalid read of size 8
==27864==    at 0x10BA7B: del__Io5JDKCS5u26IEWw0J53hQ (Things/Nim/lib/system/avltree.nim:74)
==27864==    by 0x10C0AB: rawDealloc__K7uQ6aTKvW6OnOV8EMoNNQ (Things/Nim/lib/system/alloc.nim:870)
==27864==    by 0x10C13C: dealloc__Jg1OaY9ahkT3MBopLAXRSGw (Things/Nim/lib/system/alloc.nim:971)
==27864==    by 0x10C17A: deallocSharedImpl__SAtZpVrJ3o5FJvSdLQe9auA (Things/Nim/lib/system/alloc.nim:1086)
==27864==    by 0x10C1A4: deallocShared (Things/Nim/lib/system/memalloc.nim:295)
==27864==    by 0x1101F2: threadProcWrapper__KwtUyNVh00QDWGRZcngjGA (Things/Nim/lib/system/threads.nim:193)
==27864==    by 0x49CE421: start_thread (in /usr/lib/libpthread-2.31.so)
==27864==    by 0x4AE6BF2: clone (in /usr/lib/libc-2.31.so)
==27864==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==27864== 
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
==27864== 
==27864== HEAP SUMMARY:
==27864==     in use at exit: 1,440 bytes in 5 blocks
==27864==   total heap usage: 5 allocs, 0 frees, 1,440 bytes allocated

EchoPouet pushed a commit to EchoPouet/Nim that referenced this issue Jun 13, 2020
@Araq
Copy link
Member

Araq commented Jun 16, 2020

For me this works

var threads: array[5, Thread[void]]

proc threadFn() {.thread.} =
  discard

proc main =
  for i in 0 ..< 5:
    createThread(threads[i], threadFn)
  joinThreads(threads)

main()

But I'm on Windows.

@Yardanico
Copy link
Collaborator

@Araq maybe we can add it as a test-case for ARC to see where it fails and where it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants