Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending ref-object to channel works with Arc and RefC but segfaults with Orc in unregisterCycle. #19105

Closed
treeform opened this issue Nov 7, 2021 · 18 comments

Comments

@treeform
Copy link
Contributor

treeform commented Nov 7, 2021

Sending ref-object with --gc:ref works because it just deep copies the object. Not that interesting.

Passing with --gc:arc works great as it passing the ref object without copying which is great because my objects are big and complex. I need this behavior. I don't want to use non-ref object.

But sending ref object fails with --gc:orc for some strange reason. I can make it work if I send it "inline" but its very strange why that works and a let ref object does not.

I am misunderstanding some thing? I though the power of arc/orc is that I can "move" objects between threads.

Minimal example (uncomment "inline" to see it work):

import os

when not defined(gcOrc):
  echo "use --gc:orc to see the bug"

type
  WorkRequest = ref object
    id: int

var
  chanIn: Channel[WorkRequest]
  thread: Thread[ptr Channel[WorkRequest]]

proc workThread(chanIn: ptr Channel[WorkRequest]) {.thread.} =
  echo "Started work thread"
  var req = chanIn[].recv()
  echo "Got work"

chanIn.open()
createThread(thread, workThread, chanIn.addr)

## var segfaults:
# var req = WorkRequest(
#   id: 1
# )
# chanIn.send(req)

## let segfaults:
let req = WorkRequest(
  id: 1
)
chanIn.send(req)

## "inline" works:
# chanIn.send(WorkRequest(
#   id: 1
# ))

sleep(1000) # Give thread time to run
echo "exit"

Output:

nim c -r --gc:orc .\orcbug2.nim
Started work thread
Got work
exit
Traceback (most recent call last)
C:\p\try\threadq\orcbug2.nim(16) orcbug2
C:\Users\me\.choosenim\toolchains\nim-1.6.0\lib\system\orc.nim(494) nimDecRefIsLastCyclicStatic
C:\Users\me\.choosenim\toolchains\nim-1.6.0\lib\system\orc.nim(466) rememberCycle
C:\Users\me\.choosenim\toolchains\nim-1.6.0\lib\system\orc.nim(145) unregisterCycle
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Compiler:

Nim Compiler Version 1.6.0 [Windows: amd64]
Compiled at 2021-10-19
Copyright (c) 2006-2021 by Andreas Rumpf

active boot switches: -d:release

Maybe related: #18950

@treeform
Copy link
Contributor Author

treeform commented Nov 7, 2021

Maybe related: #14901

@Araq
Copy link
Member

Araq commented Nov 9, 2021

Please use the channels of nim-lang/threading

@Araq Araq removed the Showstopper label Nov 9, 2021
@treeform
Copy link
Contributor Author

Thank you for the suggestion. I tried it and it does not work:

import os, threading/channels

when not defined(gcOrc):
  echo "use --gc:orc to see the bug"

type
  WorkRequest = ref object
    id: int
  Channel[T] = channels.Channel[T]

var
  chanIn: Channel[WorkRequest]
  thread: Thread[ptr Channel[WorkRequest]]

proc workThread(chanIn: ptr Channel[WorkRequest]) {.thread.} =
  echo "Started work thread"
  var req: WorkRequest
  chanIn[].recv(req)
  echo "Got work"

chanIn.open()
createThread(thread, workThread, chanIn.addr)

## breaks:
# var req = WorkRequest(
#   id: 1
# )
# chanIn.send(req)

## breaks:
block:
  var f = WorkRequest(
    id: 1
  )
  chanIn.send(f)

# chanIn.send(WorkRequest(
#   id: 1
# ))

sleep(1000) # Give thread time to run
echo "exit"
quit()
Error: expression cannot be isolated: f

I can change it to just:

chanIn.send(WorkRequest(
  id: 1
))

But then its a just:

C:\p\threading\src\threading\channels.nim(385) open
C:\Users\me\.choosenim\toolchains\nim-1.6.0\lib\pure\concurrency\atomics.nim(341) store
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Am I missing anything?

@Araq
Copy link
Member

Araq commented Nov 10, 2021

Well it's easier to fix the bug in nim-lang/threading, the send call misses a GC_runOrc call or whatever it's called. Ping @narimiran

@shayanhabibi
Copy link

I have had similar issues with inter-thread communication and ORC. I have issues with the destruction of the objects passed between threads only.

@treeform I suggest you have a look at loony which is a unbounded 'channel' that doesn't use locks or conditions; it is faster and safer.

I have not recently tested it against ORC, however I did have similar issues as you have described when it came to object destruction.

@ringabout
Copy link
Member

ringabout commented Nov 11, 2021

Well it's easier to fix the bug in nim-lang/threading, the send call misses a GC_runOrc call or whatever it's called. Ping @narimiran

Not really.

Am I missing anything?

Hi, @treeform your example is wrong, you need use newChan to initialize the Channel

with latest nim-lang/threading

import os
import threading/channels

type
  WorkRequest = ref object
    id: int
  Channel[T] = Chan[T]

var
  chanIn: Channel[WorkRequest]
  thread: Thread[ptr Channel[WorkRequest]]

proc workThread(chanIn: ptr Channel[WorkRequest]) {.thread.} =
  echo "Started work thread"
  var req: WorkRequest
  chanIn[].recv(req)
  echo "Got work"

chanIn = newChan[WorkRequest]()  # ------------>
createThread(thread, workThread, chanIn.addr)

block:
  chanIn.send(WorkRequest(
    id: 1
  ))

sleep(1000) # Give thread time to run
echo "exit"
quit()

@Araq
Copy link
Member

Araq commented Nov 11, 2021

You're supposed to use it like this:

type
  WorkRequest = ref object
    id: int

var
  chanIn: Chan[WorkRequest]
  thread: Thread[Chan[WorkRequest]]

proc workThread(chanIn: Chan[WorkRequest]) {.thread.} =
  echo "Started work thread"
  var req: WorkRequest
  chanIn.recv(req)
  echo "Got work"

chanIn = newChan[WorkRequest]()  # ------------>
createThread(thread, workThread, chanIn)

block:
  chanIn.send(WorkRequest(
    id: 1
  ))

sleep(1000) # Give thread time to run
echo "exit"
quit()

@ringabout
Copy link
Member

Don't do this because it is not thread safe

An observation: if you run GC_runOrc per thread, it won't segfault.

import os

when not defined(gcOrc):
  echo "use --gc:orc to see the bug"

type
  WorkRequest = ref object
    id: int

var
  chanIn: Channel[WorkRequest]
  thread: Thread[ptr Channel[WorkRequest]]

proc workThread(chanIn: ptr Channel[WorkRequest]) {.thread.} =
  defer: GC_runOrc()   # <--------------------------------
  echo "Started work thread"
  var req = chanIn[].recv()
  echo "Got work"

chanIn.open()
createThread(thread, workThread, chanIn.addr)


## let segfaults:
let req = WorkRequest(
  id: 1
)
chanIn.send(req)

sleep(1000) # Give thread time to run
echo "exit"

Related: #19146

@treeform
Copy link
Contributor Author

Thank you @Araq and @xflywind, it looks like your suggestions worked and i'll try this in my bigger program.

@xflywind, could i just put a lock around GC_runOrc to make it threadsafe? Your solution appears to work with refc, arc and orc.

@treeform
Copy link
Contributor Author

Because channels did not work for us, we ended up using regular dequeu and locks: https://github.com/treeform/puppy/blob/master/src/puppy/requestpools.nim

@treeform
Copy link
Contributor Author

@xflywind Do you think some thing like this makes it thread safe:

import os, locks

type
  WorkRequest = ref object
    id: int

var
  chanIn: Channel[WorkRequest]
  thread: Thread[ptr Channel[WorkRequest]]
  orcLock: Lock

proc workThread(chanIn: ptr Channel[WorkRequest]) {.thread.} =
  when defined(gcOrc):
    defer:     
      orcLock.acquire()
      GC_runOrc()
      orcLock.release()
  echo "Started work thread"
  var req = chanIn[].recv()
  echo "Got work"

orcLock.initLock()
chanIn.open()
createThread(thread, workThread, chanIn.addr)

let req = WorkRequest(
  id: 1
)
chanIn.send(req)

sleep(1000) # Give thread time to run
echo "exit"

@ringabout
Copy link
Member

ringabout commented Nov 16, 2021

Your solution appears to work with refc, arc and orc.

Passing acyclic refs crossing threads should work

Hi @treeform Lock is not needed. I think it may work if you can make sure the ref is isolated (ref counting is 0). You can tag your object with {.acyclic.}, then the original example works without modification.
Ofc, runtime "isolated" is terrible, but if you are interested, you can ref
https://github.com/nim-works/arc/blob/e06d1ad24f70c0ea12089da08928471d780720cd/arc.nim#L56
https://github.com/nim-works/loony/compare/arc

ORC will evolve with isolated in the future and handle refs better.

Passing cyclic refs crossing threads needs more efforts

There are two extra solutions

  • Use Atomic RC + backup shared heap tracing GC (I'm testing this)
  • Use novel GC implementation which someone in our community has already been working on

@Araq
Copy link
Member

Araq commented Nov 17, 2021

Because channels did not work for us, we ended up using regular dequeu and locks:

You really are supposed to use nim-lang/threading's Chan[T]. If it doesn't work, file a bug report.

@guzba
Copy link
Contributor

guzba commented Nov 19, 2021

You really are supposed to use nim-lang/threading's Chan[T]. If it doesn't work, file a bug report.

I think we could get that working, certainly. One of us would just need to take another look. A concern I have is that the nim/threading repo is both not in Nimble and not in the standard library. Given people using Nim will not have it installed, the only way I know of to have Nimble install it as a dependency for us is as an GitHub url to a commit hash. Is this the intended way to depend on this repo? Is this repo going to be around long-term or is it just a home for stuff temporarily?

My current feeling is to just wait until std/channels becomes hopefully real in an upcoming Nim release. I think it was removed from 1.6.0? Are these implementations related?

Neither I nor treeform have been keeping up on that stuff and only recently wanted to explore concurrent Puppy fetch requests which lead to this GitHub issue. Channels and concurrency stuff seems like an area of active development so it is a bit unclear how things relate and where things are going. This is not a problem, active dev in an area is great to see, but it makes me want to just stick to my simple implementation for the time being.

@Araq
Copy link
Member

Araq commented Nov 19, 2021

My current feeling is to just wait until std/channels becomes hopefully real in an upcoming Nim release. I think it was removed from 1.6.0? Are these implementations related?

Yes, they are related. And nim-lang/threading is here to stay, there are currently no plans to move it to the stdlib. We will publish it on Nimble.

@narimiran
Copy link
Member

We will publish it on Nimble.

It is now published and available: nimble install threading.

@guzba
Copy link
Contributor

guzba commented Nov 22, 2021

It is now published and available: nimble install threading.

Excellent, thanks you for the update.

@ringabout
Copy link
Member

With #21815, the example no longer crashes because WorkRequest is not cyclic anymore.

@Araq Araq closed this as completed May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants