-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in code_server.erl (badmatch: error) #8510
Comments
I believe I can see why this bug happens.
otp/lib/kernel/src/code_server.erl Line 353 in 9f778f5
otp/lib/kernel/src/code_server.erl Lines 1102 to 1110 in 9f778f5
otp/lib/kernel/src/code_server.erl Line 186 in 9f778f5
Then, when we process the pending on load queue, the loader down message has already been processed. I believe the fix is to put both lines below around a handle_pending_on_load: otp/lib/kernel/src/code_server.erl Line 186 in 9f778f5
otp/lib/kernel/src/code_server.erl Line 317 in 9f778f5
Otherwise we may process those events out of order. cc @michalmuskala |
Actually, we pretty much have two different locking mechanisms for loading a module, one for loading on the client, the other one is for handling |
I am looking into this still but I am busy with the upcoming Elixir v1.17 release, sorry for the delay. @robashton, a temporary solution would most likely be to preload any modules with |
We are just running a forked OTP for now that pretends the error doesn't exist, we aren't blocked so no worries! |
Prior to this patch, the code server had two internal queues, one to track module loading and another to track on_load callbacks. This pull requests refactors the code to have a single queue, in order to fix bugs and improve maintainability. Closes erlang#7466. Closes erlang#8510.
Prior to this patch, the code server had two internal queues, one to track module loading and another to track on_load callbacks. This pull requests refactors the code to have a single queue, in order to fix bugs and improve maintainability. Closes erlang#7466. Closes erlang#8510.
Prior to this patch, the code server had two internal queues, one to track module loading and another to track on_load callbacks. This pull requests refactors the code to have a single queue, in order to fix bugs and improve maintainability. Closes erlang#7466. Closes erlang#8510.
Prior to this patch, the code server had two internal queues, one to track module loading and another to track on_load callbacks. This pull requests refactors the code to have a single queue, in order to fix bugs and improve maintainability. Closes erlang#7466. Closes erlang#8510.
This happens on modules with on_load callbacks on them, so you can try preloading them before (for example, in your helper). We also haven't hard reports of this happening earlier than OTP 27. More concurrency was added on 27, which would explain why we suddenly started seeing it. |
Hmm, we're currently using OTP 26 so I guess you have a report of it now 😄 I did find some modules in our dependencies that have
I've also removed some usages of Patch which loads modules at runtime and now the frequency appears to have gone way down (although it is still happening occasionally). |
Good to know! If Patch is relying |
Prior to this patch, the code server had two internal queues, one to track module loading and another to track on_load callbacks. This pull requests refactors the code to have a single queue, in order to fix bugs and improve maintainability. Closes erlang#7466. Closes erlang#8510.
Describe the bug
In certain scenarios (Our massive Purescript codebase compiling into Erlang), possibly pertaining to short lived processes coupled with modules with larger than normal load times but I'm not sure, we get a bad match thrown inside code_server. The place where this reproduces most is in our Cowboy use, if we make some small requests just after start-up, we get the error with the module
cowboy_static@ps
(which nearly everything references). The file itself is nothing remarkable and I don't think is related to the issue.If I preload the module with a manual load this does not happen.
I think this relates to the changes made in #7503 and #6736
To Reproduce
I can't easily build an isolated example
Expected behavior
The code should load and not crash code_server
Affected versions
26+
Additional context
This bit I can help with, I've pulled down latest and built, the problem exists on line 1184 of code_server.erl, we do a
maps:take
on the loading collection and the module doesn't exist in there.A trace of the messages to and from the code server pid reveals the following order of events
Essentially we get some LOADER_DOWN message, remove the module from the map, and then try to read the module from the map.
Sticking a case statement around the maps:take "fixes" this, but I don't know if that is what is needed, or if it's just a band-aid - not fully understanding the end to end flow of this code and not having sat down with pen and paper to work it out yet. If this is obvious to somebody else then I'm happy for then to make the fix, or tell me what the fix is and I'll do it.
Cheers,
Rob
The text was updated successfully, but these errors were encountered: