Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition with metro callbacks on script cleanup #1033

Open
catfact opened this issue Mar 19, 2020 · 6 comments
Open

race condition with metro callbacks on script cleanup #1033

catfact opened this issue Mar 19, 2020 · 6 comments
Labels

Comments

@catfact
Copy link
Collaborator

@catfact catfact commented Mar 19, 2020

see this thread: changing scripts results in some errors on metro callback defined in the old script:
https://llllllll.co/t/cheat-codes-v1-2/30329/16?u=zebra

one guess: maybe the metro callback events are already in the main event queue when the script change happens.

i'm looking at cleanup in script.lua, and it looks like calls metro.free_all to stop all threads, but the metro tables in lua are untouched. when a metro callback event is pulled off the stack, it
just uses a numerical index to look up the callback fn in the corresponding metro table.

so, possible solution: script cleanup should also set all handlers to no-op. (if it does this already somewhere i've missed, then nvm sorry.)

could also be related to #769

@tehn
Copy link
Member

@tehn tehn commented Mar 19, 2020

looks like the global state gets cleared/reset before the metros are freed:

https://github.com/monome/norns/blob/master/lua/core/script.lua#L45

i can PR this (but it may be hard to reproduce??) and it shouldn't cause any ill effects

@tehn
Copy link
Member

@tehn tehn commented Mar 19, 2020

nope that didn't work.

also clearing the metro events didn't work, so now i'm totally confused but still working on it.

@catfact
Copy link
Collaborator Author

@catfact catfact commented Mar 19, 2020

what i'm saying is that metro.free_all is not sufficient:

function Metro.free(id)
  Metro.metros[id]:stop()
  Metro.available[id] = true
  Metro.assigned[id] = false
end

--- free all
function Metro.free_all()
  for i=1,Metro.num_script_metros do
    Metro.free(i)
  end
end

it calls down into C to stop the threads, and it makes the metros assignable again, but the tables are still there and still populated with handler functions.

clearing the metro events didn't work

you mean you tried setting each .event field? or actually clearing the event queue of metro events?

@tehn
Copy link
Member

@tehn tehn commented Mar 19, 2020

you mean you tried setting each .event field? or actually clearing the event queue of metro events?

sorry, yes, that's what i did. added

function Metro.free(id)
  Metro.metros[id]:stop()
  Metro.available[id] = true
  Metro.assigned[id] = false
  Metro.metros[id].event = nil
end

which would make me think any pending metro calls in the event loop would then abort when trying to call a nil event handler.

will be able to pick this back up later tonight

@catfact
Copy link
Collaborator Author

@catfact catfact commented Mar 20, 2020

hm.. i guess another possibility is that the C layer is somehow still posting events after the metros are stopped.

it's not supposed to do that: metro.c:metro_stop() calls pthread_cancel, and the metro threads call pthread_testcancel() on wakeup before banging out an event. which AFAIK is the best we can do.

(TBH, i'm not sure when i will next have the brainspace for a thorough deep-dive and review of the metro.c synchronization sequences. the natural consequence of which would be "lets just rewrite this in c++ or Rust.")

can we repro with a script that just makes as many very fast metros as possible?

@tehn
Copy link
Member

@tehn tehn commented Apr 20, 2020

might be related to #769

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants