New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce contention on the code_server #6736
Conversation
CT Test Results 2 files 65 suites 1h 6m 4s ⏱️ For more details on these failures, see this check. Results for commit c9e6925. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
65546a5
to
5e8f7c4
Compare
load_file/1, load_abs/2, and load_binary/3 now perform most of the work on the client. is_sticky/1 and is_loaded/1 now read directly from the module database.
5e8f7c4
to
c1b4f25
Compare
A thought about |
Have you observed actual contention caused by the functions you have optimized? (That is, will this optimization make a meaningful difference?) When running in interactive mode, I would expect most code loading being caused by a call to an unloaded module and that the |
@bjorng no, I didn't measure anything. My thoughts are that the approach is generally preferable in terms of design. In particular:
Do you think those are fair? I will share what I did for |
@bjorng here is the commit for and here is the test failure I get:
|
I am only worried about potential unintended consequences because of lack serialization of the request. Moving relatively cheap calls such as Loading potentially very expensive calls such as the call to I will try to take a look at why |
Can you think of a scenario where it would introduce bugs? Given the writes are serialized, we will always read our own writes. The only race would be if two different processes are writing and reading at the same time, but then the result is not guaranteed in any way. In any case, you are right in that they are cheap and the benefit in moving them is minimal. I will be glad to amend it in any way desired. |
No, I am just paranoid. 😉 I have looked at the failed test case and I now roughly understands what happens. The following test module is used:
One process attempts to load it using I will look more at this next week. |
I have now taken a look at this new ensure loaded function and fixed the issue. |
0df4779
to
c9e6925
Compare
Merged, thanks @josevalim for your contribution! |
With changes in erlang#6736 significant work when code loading was moved away from the code server and shifted into the requesting process. However, there could be a lot of repeated work, especially on system startup, if a lot of similar processes are started, all trying to load the same module at the same time, overwhelming the code server with `get_object_code` requests. In this change, we add additional synchronisation to `ensure_loaded` that makes sure only one process at a time tries to load the module. In the added test showcasing the worst-case scenario of long load path and many concurrent requests, this changes the runtime (on my local machine) from 8s+ to around 200ms.
With changes in erlang#6736 significant work when code loading was moved away from the code server and shifted into the requesting process. However, there could be a lot of repeated work, especially on system startup, if a lot of similar processes are started, all trying to load the same module at the same time, overwhelming the code server with `get_object_code` requests. In this change, we add additional synchronisation to `ensure_loaded` that makes sure only one process at a time tries to load the module. In the added test showcasing the worst-case scenario of long load path and many concurrent requests, this changes the runtime (on my local machine) from 8s+ to around 200ms.
With changes in erlang#6736 significant work when code loading was moved away from the code server and shifted into the requesting process. However, there could be a lot of repeated work, especially on system startup, if a lot of similar processes are started, all trying to load the same module at the same time, overwhelming the code server with `get_object_code` requests. In this change, we add additional synchronisation to `ensure_loaded` that makes sure only one process at a time tries to load the module. In the added test showcasing the worst-case scenario of long load path and many concurrent requests, this changes the runtime (on my local machine) from 8s+ to around 200ms.
With changes in erlang#6736 significant work when code loading was moved away from the code server and shifted into the requesting process. However, there could be a lot of repeated work, especially on system startup, if a lot of similar processes are started, all trying to load the same module at the same time, overwhelming the code server with `get_object_code` requests. In this change, we add additional synchronisation to `ensure_loaded` that makes sure only one process at a time tries to load the module. In the added test showcasing the worst-case scenario of long load path and many concurrent requests, this changes the runtime (on my local machine) from 8s+ to around 200ms. Furthermore, the special sync operation can be merged with this and save one round-trip to the code server.
With changes in erlang#6736 significant work when code loading was moved away from the code server and shifted into the requesting process. However, there could be a lot of repeated work, especially on system startup, if a lot of similar processes are started, all trying to load the same module at the same time, overwhelming the code server with `get_object_code` requests. In this change, we add additional synchronisation to `ensure_loaded` that makes sure only one process at a time tries to load the module. In the added test showcasing the worst-case scenario of long load path and many concurrent requests, this changes the runtime (on my local machine) from 8s+ to around 200ms. Furthermore, the special sync operation can be merged with this and save one round-trip to the code server.
With changes in erlang#6736 significant work when code loading was moved away from the code server and shifted into the requesting process. However, there could be a lot of repeated work, especially on system startup, if a lot of similar processes are started, all trying to load the same module at the same time, overwhelming the code server with `get_object_code` requests. In this change, we add additional synchronisation to `ensure_loaded` that makes sure only one process at a time tries to load the module. In the added test showcasing the worst-case scenario of long load path and many concurrent requests, this changes the runtime (on my local machine) from 8s+ to around 200ms. Furthermore, the special sync operation can be merged with this and save one round-trip to the code server.
Summary: With changes in erlang#6736 significant work when code loading was moved away from the code server and shifted into the requesting process. However, there could be a lot of repeated work, especially on system startup, if a lot of similar processes are started, all trying to load the same module at the same time, overwhelming the code server with `get_object_code` requests. In this change, we add additional synchronisation to `ensure_loaded` that makes sure only one process at a time tries to load the module. In the added test showcasing the worst-case scenario of long load path and many concurrent requests, this changes the runtime (on my local machine) from 8s+ to around 200ms. Furthermore, the special sync operation can be merged with this and save one round-trip to the code server. Test Plan: Upstream CI Reviewers: vlanvin, skatepalli, #whatsapp_devinfra_ctt, #whatsapp_clr Reviewed By: vlanvin Differential Revision: https://phabricator.intern.facebook.com/D48425445
Erlang/OTP 25 only attempted to perform code loading if the mode was interactive: https://github.com/erlang/otp/blob/maint-25/lib/kernel/src/code_server.erl#L301 This check was removed in erlang#6736 as part of the decentralization. However, we received reports of increased cpu/memory usage in Erlang/OTP 26.1 in a code that was calling code:ensure_loaded/1 on a hot path. The underlying code was fixed but, given erlang#7503 added the server back into the equation for ensure_loaded we can add the mode check back to preserve Erlang/OTP 25 behaviour.
Erlang/OTP 25 only attempted to perform code loading if the mode was interactive: https://github.com/erlang/otp/blob/maint-25/lib/kernel/src/code_server.erl#L301 This check was removed in erlang#6736 as part of the decentralization. However, we received reports of increased cpu/memory usage in Erlang/OTP 26.1 in a code that was calling code:ensure_loaded/1 on a hot path. The underlying code was fixed but, given erlang#7503 added the server back into the equation for ensure_loaded we can add the mode check back to preserve Erlang/OTP 25 behaviour.
The first regression is not attempt to load code if mode is embedded. Erlang/OTP 25 only attempted to perform code loading if the mode was interactive: https://github.com/erlang/otp/blob/maint-25/lib/kernel/src/code_server.erl#L301 This check was removed in erlang#6736 as part of the decentralization. However, we received reports of increased cpu/memory usage in Erlang/OTP 26.1 in a code that was calling code:ensure_loaded/1 on a hot path. The underlying code was fixed but, given erlang#7503 added the server back into the equation for ensure_loaded we can add the mode check back to preserve Erlang/OTP 25 behaviour. The second regression would cause the caller process to deadlock if attempting to load a file with invalid .beam more than once.
The first regression is not attempt to load code if mode is embedded. Erlang/OTP 25 only attempted to perform code loading if the mode was interactive: https://github.com/erlang/otp/blob/maint-25/lib/kernel/src/code_server.erl#L301 This check was removed in erlang#6736 as part of the decentralization. However, we received reports of increased cpu/memory usage in Erlang/OTP 26.1 in a code that was calling code:ensure_loaded/1 on a hot path. The underlying code was fixed but, given erlang#7503 added the server back into the equation for ensure_loaded we can add the mode check back to preserve Erlang/OTP 25 behaviour. The second regression would cause the caller process to deadlock if attempting to load a file with invalid .beam more than once.
load_file/1, load_abs/2, and load_binary/3 now
perform most of the work on the client.
is_sticky/1 and is_loaded/1 now read directly
from the module database.
PS: I could not move ensure_loaded/1 to the
client for reasons I could not understand yet.