Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engines using polls fail on boot, when loaded from saved script #1158

Open
catfact opened this issue Jul 6, 2020 · 6 comments
Open

engines using polls fail on boot, when loaded from saved script #1158

catfact opened this issue Jul 6, 2020 · 6 comments

Comments

@catfact
Copy link
Collaborator

@catfact catfact commented Jul 6, 2020

https://llllllll.co/t/mangl/21066/284?u=zebra

[quote]
just had another freeze up while booting into mangl. maiden was flooded with endless

warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll

and all buttons and encoders were non-functional. i was able to ssh in and reboot, at which point everything seemed fine.
[/quote]

the warning makes it clear that there is a race condition somewhere and the poll data structure is not populated as it should be after engine load and before script init. (when mangl calls :start() on phase/level polls.)

i'd look at sequence between SC and oracle. Crone.setEngine calls reportCommands, then reportPolls.

in oracle.c, the handler sequence and test_engine_load_done should be ensuring all reports are complete before raising EVENT_ENGINE_LOADED and kicking off script init.


here's the entry point for lua, in startup:
https://github.com/monome/norns/blob/main/lua/core/startup.lua#L65-L72

this looks a little suspicious in that the engine report should really happen before trying to launch scripts. but i don't immediately see a reason this would break polls.

@dndrks
Copy link
Contributor

@dndrks dndrks commented Jul 6, 2020

if it helps for testing, executing ;restart in the matron REPL with mangl loaded can reliably cause continuous callback poll messages as norns reboots to a cleared state. reloading mangl stops the messages. restarting supercollider also stops the messages.

@catfact
Copy link
Collaborator Author

@catfact catfact commented Jul 6, 2020

i don't know what restart; in matron REPL is supposed to accomplish from a user-story perspective, or why we would recommend using it.

it's a little hard to follow (for me!), but seems to me like maiden just restarts the associated systemd service. if that service is matron alone, then i would expect polls to break along with several other things probably.

@dndrks
Copy link
Contributor

@dndrks dndrks commented Jul 6, 2020

why we would recommend using it

ah, i think it's just been generally accepted as a quick alternative to full-blown SLEEP when experiencing system lockups. didn't mean to muddy the waters, this was just the only way to reliably repro the poll error streaming on my unit.

@tehn
Copy link
Member

@tehn tehn commented Jul 6, 2020

this looks a little suspicious in that the engine report should really happen before trying to launch scripts. but i don't immediately see a reason this would break polls.

that is confusing for sure--- it seems the script load shouldn't work at all if the engine table doesn't know about its engines.


more generally i think the poll failure might be a symptom of a different problem. ;restart in maiden likely causes the symptom, but that's not the same as the core problem.

generally ;restart (in maiden) is helpful to me while writing lua core code that doesn't mess with SC (exactly because it only restarts matron, reloading the core lua libs). if anything with engines is happening it certainly makes more sense to do full SYSTEM > RESET, which is basically equivalent to a full SLEEP

the problem here, of course, is that if matron is jammed, you can't use the menu to restart--- hence ;restart via maiden.

perhaps we simply need another maiden command ;reset which does the exact same thing as the menu item (ie, full stack)

we added a bunch of stuff to clear the default state, which some people have reported as annoying, but is obviously very helpful for getting out of a failed state

@catfact
Copy link
Collaborator Author

@catfact catfact commented Jul 6, 2020

yes of course, the polls issue is independent. i'm simply saying that restarting matron is not exacty useful as a repro case - though it is still a useful clue so thanks for that.

the proximate issue is that the .register method in poll.lua has not yet been called when the poll callback is fired, even though this is supposed to happen as part of the engine-load sequence:

  • matron requests engine load
  • Crone.setEngine commences
    • engine class is instantiated, commands and polls are registered on SC side
    • SC sends command-report sequence:
      • /report/commands/start
      • repeat /report/commands/entry
      • /report/commands/end
    • poll-report sequence:
      • /report/polls/start
      • repeat /report/polls/entry
      • /report/polls/end

/report/polls/entry handler in oracle.c is what ultimately causes Poll.register to be called in lua, which has evidently not happened yet when the script starts running.

the whole set of handlers in oracle.c are supposed to wait until both sequences are complete before raising EVENT_ENGINE_LOADED.

and of course the lua side should be waiting for that event before firing script init.

so it seems to me that one of these things is happening:

  1. somehow, EVENT_ENGINE_LOADED is being raised early
  2. somehow, script init is being called prematurely, before the engine-loaded event is handled by lua.

both possibilities seem straightforward to debug for, but i think case 2 is more likely.


as we've discussed, the dynamic reporting system is ultimately not really necessary, and could be replaced with metadata files read directly from lua. this works best if the metadata is autogenerated, so that code generation facility should be baked into the engine classes on SC side.


i think we already have other GH issues we can reopen for improving the reset/restart options. but as it stands, i'd say that restart; in matron REPL for recovery reasons, should always be followed by SLEEP or some other full reboot.

@catfact
Copy link
Collaborator Author

@catfact catfact commented Jul 6, 2020

it seems the script load shouldn't work at all if the engine table doesn't know about its engines.

there is actually no reason this will break. Engine.load(name, cb) simply sends the requested name and fires the callback on completion. if the named engine doesn't exist, nothing happens, but the engine registry in lua isn't actually checked at this point.

what is definitely broken right now is the ability for a script to get a list of available engines, if that script is being run at launch. (Engine.names will be empty.) but not many scripts require this facility. (maybe none.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants