-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading a module that contains on_load may hang #7466
Comments
Thanks for your report! Just as a heads-up, we're a bit short-handed over the summer vacation period so we likely won't get around to fixing this until most of us are back from vacation. |
I don't think |
|
When using OTP-26 our application crashes when io:format is used in the on_load function. There is an open issue in otp repo with a similar issue where it's mentioned that on_load is somewhat brittle. erlang/otp#7466 (comment) ``` =INFO REPORT==== 10-Aug-2023::11:01:37.449084 === init got unexpected: {io_request,<0.1805.0>, #Ref<0.3063375415.1090256898.200776>, {put_chars,unicode,io_lib,format, ["Loading library: ~p ~n", [<<"/opt/smd/_build/local/rel/smd/lib/elibphonenumber-8.13.17/priv/phonenumber_util_nif">>]]}} ```
Here is a patch that fixes this issue:
The root cause was that we were not clearing up the OnLoad state for that module before triggering the actions, which would cause new entries to be added to Waiting (and those would never be picked up). I was not able to write a test though due to limited CT knowledge, so I only have the fix. :) Anyone is welcome to use the patch above. |
I will try to submit a patch for this one in the upcoming week. EDIT: I will do it once OTP 27 is out. |
Describe the bug
If module
foo
contains anon_load
annotation, then a call tocode:load_binary/3
(and probablycode:load_abs/1,2
) may hang if there are concurrent calls to these functions, or evencode:ensure_loaded/1
) on the same module.To Reproduce
Given these two modules:
Run :
Expected behavior
Both concurrent calls to
code:load_binary/3
inrepro:go/0
should eventually return, so we expectgo()
to return{ok, ok}
. Instead, we see:Without the
after
clause, the second receive will wait forever.Affected versions
Both R25 and R26 show this problem, I didn't try it on older versions.
Additional context
As far as I can see, the problem is in
code_server:try_finish_module
, since the action that is passed tohandle_pending_on_load
will always return a{noreply, ...}
when the module to load contains anon_load
; however, if the action ends up waiting since there is an on_load in progress, this means that no reply will ever be sent to the caller.We found the race condition investigating tests that would sometimes block. In our case, the race was between
code:ensure_loaded/1
andcode:load_binary/3
and the issue happens, again, when the action forload_binary
waits on the on_load fromensure_loaded
to finish, but this combination is harder to reproduce consistently.The text was updated successfully, but these errors were encountered: