Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce outstanding requests limit for system processes #5473

Conversation

rickard-green
Copy link
Contributor

This limit effects system processes that orchestrates operations that
involves all processes in the system. Currently there are two such
processes, the code purger process and the literal area collector
process. They previously sent out requests to all processes on the
system at once, and then waited for responses from these processes in
order to determine when the operation had completed.

When a process had handled such a request it could (and still can)
continue executing Erlang code once it had served this request. If
executing on priority normal or low, it would however not get another
opportunity to execute until all other requests had been served. This
negatively impacted responsiveness of processes during operations like
these on systems with a huge amount of processes.

This change limits the amount of outstanding request which will cause
the system to interleave handling of requests like these with other
execution and by this improve responsiveness on systems with a huge
amount of processes. By default the limit is set to two times the
amount of schedulers. This will make sure that there always will be
enough outstanding requests to utilize all schedulers fully during
operations like these while also letting other work execute without
having to wait very long.

@rickard-green rickard-green added team:VM Assigned to OTP team VM fix testing currently being tested, tag is used by OTP internal CI labels Dec 2, 2021
@rickard-green rickard-green self-assigned this Dec 2, 2021
@rickard-green rickard-green force-pushed the rickard/outstanding-cpc-cla-limit/24.1.7/OTP-17796 branch from 3a23722 to de6855a Compare December 3, 2021 12:15
Copy link
Contributor

@garazdawi garazdawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The literal collector has had similar functionality for limiting the number of collections done for a long time without a need to configure it. One advantage of not allowing this setting to be configured either is that we don't have to come up with a name for it thus avoiding one of the hard problems of programming :)

erts/emulator/beam/erl_proc_sig_queue.h Show resolved Hide resolved
erts/preloaded/src/erts_internal.erl Outdated Show resolved Hide resolved
@@ -484,6 +485,49 @@ perf_counter_unit() ->
is_system_process(_Pid) ->
erlang:nif_error(undefined).

-spec sys_proc_outstanding_requests_limit() -> pos_integer().

sys_proc_outstanding_requests_limit() ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this function and not use erlang:system_info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is frequently used by the code purger and the literal collector, so we want it efficient for that use case. In order to implement it efficiently in system_info() we need to move it up in the if branching mess in system_info() which would make other things less efficient that the user actually might want to look at frequently. It could be interesting for the user to look at this setting, but hardly more than once when checking their configuration. I haven't looked that closely to what is at the top of system_info() though, it might perhaps be moved up near the top.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If performance is what matters, then wouldn't it be better to make erlang:system_flag send messages to these processes when the flag is changed?

I'm having a hard time seeing this being the hot path in a system and thus warranting the extra complexity, but if you think it is worth it then keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the argument up the if branching in system_info and removed the erts_internal function.

[{code_purger, receive {Ref1, OP1} -> OP1 end},
{literal_area_collector, receive {Ref2, OP2} -> OP2 end}].

find_lac() ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there is a reason, but why isn't the lac registered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be directly addressed by anyone except for the VM itself which have a direct pointer to the process structure. This function is only there as a fallback if we should need to reduce the priority at a customer site if something unexpected should occur now when this functionality is newly introduced. I expect to remove this functionality in OTP 25.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the premature optimization in find_lac() all together.

@rickard-green
Copy link
Contributor Author

The literal collector has had similar functionality for limiting the number of collections done for a long time without a need to configure it. One advantage of not allowing this setting to be configured either is that we don't have to come up with a name for it thus avoiding one of the hard problems of programming :)

Yes, it previously hasn't been configurable, but we've also had a number of complaints about it over the years. I think it is very likely that some of our customers will want to reduce this limit even further than what we want as a default, and I really don't think such a low limit would be what we want in the general case. I also really don't want anymore support cases about this.

It is nice for testing as well. However, if it would only be there for testing such a limit would of course not have to have a nice name. "4711" would do :)

I don't find this name especially good, but I have not been able to come up with anything better. Will change it if someone comes up with something better. I'm not too bothered about the name, though. I don't think this is something users will end up writing in there code on a daily basis.

@max-au
Copy link
Contributor

max-au commented Dec 4, 2021

That's a nice one! I wonder if it could help in a situation when there is 1M+ hibernated processes, and some module is attempted to get purged. In the past it was sending memory usage through the roof, as all these processes were woken up just to check if the module being attempted to purge is on the stack.

@rickard-green
Copy link
Contributor Author

rickard-green commented Dec 4, 2021

That's a nice one! I wonder if it could help in a situation when there is 1M+ hibernated processes, and some module is attempted to get purged. In the past it was sending memory usage through the roof, as all these processes were woken up just to check if the module being attempted to purge is on the stack.

We had a bug that could cause hibernated processes to expand and being left in their heap state for execution (with free memory on the heap) when doing literal collection (which is part of a purge operation if the module has a literal area associated with the module). That bug was fixed in OTP 19.3.6 though. The fix has a test case, so I don't think it should have reappeared without us noticing. If it should reappear please report it. My guess is that you have references to the literal area associated with the module being purged from the hibernated processes which will cause an increased memory consumption. The wakeup of hibernated processes due to purge or literal area collection should not cause them to increase in memory size unless they have references to a literal area being removed.

Memory consumption may increase drastically after literal area collection without their being a bug in the VM. Before a literal area can be removed all processes are checked for references into it. If a reference into the area is found, the referred data is copied onto the heap of the process, so that the reference into the area can be removed. By this all reference to the literal area will be removed which in turn will make it possible to remove the literal area. This however also mean that if you have lots of references into the area you will get a large increase in memory consumption due to the removal of the literal area.

Beside modules also persistent terms are stored in literal areas. That is, if you remove a persistent term, memory consumption can increase drastically if you have any live references to it from any processes.

It is only when you have no live references into the literal area that you will see a reduction in memory consumption after the literal area collection is done. However, during the literal area collection memory consumption may increase temporarily even in this scenario. This since we might need to garbage collect a process to get rid of references to the literal area from garbage terms on the heap (we don't know if it is garbage terms or not unless we garbage collect). A process will not be garbage collected unless we find a reference into the literal area on its heap. There will however only be at maximum 2 processes garbage collection at a time due to a literal collection, so this increase should not be huge unless the processes are huge.

If a hibernated process is garbage collected due to a literal area collection, this implies that it actually had references on its heap into the literal area from live terms which also imply that it will grow its heap size. This since hibernated processes has no garbage terms. Also a hibernated processes can be inspected for references to a literal area without having to garbage collect it, so it is only garbage collected if a reference to the literal area is found. Temporarily during such a garbage collection its heap size will grow up to its execution state, but once it is done, the heap will be shrunk down to the minimum size needed for all live data.

@rickard-green
Copy link
Contributor Author

That's a nice one! I wonder if it could help in a situation when there is 1M+ hibernated processes, and some module is attempted to get purged. In the past it was sending memory usage through the roof, as all these processes were woken up just to check if the module being attempted to purge is on the stack.

I noticed that I gave you a long answer above without actually giving you a direct answer to your question. To answer your question, no this PR will not change what is performed during a purge or a literal collection. It is the same amount of work and the same work that will be performed. The only difference is that this work will be interleaved with other work to a much larger degree than before and you will not get any run-queue spikes like before. Instead of putting all processes into the run-queues at once, they will be put into the run-queues at a slower pace.

@@ -484,6 +485,49 @@ perf_counter_unit() ->
is_system_process(_Pid) ->
erlang:nif_error(undefined).

-spec sys_proc_outstanding_requests_limit() -> pos_integer().

sys_proc_outstanding_requests_limit() ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If performance is what matters, then wouldn't it be better to make erlang:system_flag send messages to these processes when the flag is changed?

I'm having a hard time seeing this being the hot path in a system and thus warranting the extra complexity, but if you think it is worth it then keep it.

This limit effects system processes that orchestrates operations that
involves all processes in the system. Currently there are two such
processes, the code purger process and the literal area collector
process. They previously sent out requests to all processes on the
system at once, and then waited for responses from these processes in
order to determine when the operation had completed.

When a process had handled such a request it could (and still can)
continue executing Erlang code once it had served this request. If
executing on priority normal or low, it would however not get another
opportunity to execute until all other requests had been served. This
negatively impacted responsiveness of processes during operations like
these on systems with a huge amount of processes.

This change limits the amount of outstanding request which will cause
the system to interleave handling of requests like these with other
execution and by this improve responsiveness on systems with a huge
amount of processes. By default the limit is set to two times the
amount of schedulers. This will make sure that there always will be
enough outstanding requests to utilize all schedulers fully during
operations like these while also letting other work execute without
having to wait very long.
…o rickard/outstanding-cpc-cla-limit/23.3.4/OTP-17796

* rickard/outstanding-cpc-cla-limit/22.3.4/OTP-17796:
  Introduce outstanding requests limit for system processes
…o rickard/outstanding-cpc-cla-limit/24.1.7/OTP-17796

* rickard/outstanding-cpc-cla-limit/23.3.4/OTP-17796:
  Introduce outstanding requests limit for system processes
@rickard-green rickard-green force-pushed the rickard/outstanding-cpc-cla-limit/24.1.7/OTP-17796 branch from de6855a to 0458f3a Compare December 6, 2021 17:12
@rickard-green
Copy link
Contributor Author

Changed the system_flag() argument from sys_proc_outstanding_requests_limit to outstanding_system_requests_limit as suggested by @jhogberg

@KennethL
Copy link
Contributor

KennethL commented Dec 7, 2021

Why do we need a command line switch for the erlcommand to set this? The erlcommand already has enough of cryptical switches and this one is not really an every day thing to set. Would it not suffice with an application environment variable and the system_flag?

In the documentation around this I think it should be clearly stated that the user has no reason to change the default setting. We expect the default setting to be good. Only if the user is experiencing problems of a specific nature that point in this direction there could be of value to experiment with different values here.

@rickard-green
Copy link
Contributor Author

Why do we need a command line switch for the erlcommand to set this? The erlcommand already has enough of cryptical switches and this one is not really an every day thing to set. Would it not suffice with an application environment variable and the system_flag?

Nothing in erts currently use an application environment variable. This would be the first. It would also quite considerably complicate the implementation.

In the documentation around this I think it should be clearly stated that the user has no reason to change the default setting. We expect the default setting to be good. Only if the user is experiencing problems of a specific nature that point in this direction there could be of value to experiment with different values here.

I think this is true regarding more or less every configuration parameter. I don't see why we should single out this one.

@rickard-green rickard-green merged commit 075d671 into erlang:maint Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants