Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-1403: suspended processes with trap_exit == true are unable to exit #4390

Closed
OTP-Maintainer opened this issue Nov 8, 2020 · 7 comments
Closed
Assignees
Labels

Comments

@OTP-Maintainer
Copy link

Original reporter: okeuday
Affected version: Not Specified
Component: erts
Migrated from: https://bugs.erlang.org/browse/ERL-1403


Normally, if trap_exit == false, a suspended process is allowed to exit due to any exception (e.g., with {{erlang:exit/2}}).

However, if trap_exit == true, a suspended process is not allowed to exit because it is unable to process the 'EXIT' message.

This matters because {{dist_buf_busy_limit}} can cause processes that send too many distributed Erlang messages to be suspended.  If the processes are sending because they are misbehaving and they have trap_exit == true, they also become difficult to kill.  That prevents fail-fast behavior, so suspended processes that have trap_exit == true need to be resumed to process the 'EXIT' message or possibly be terminated in a different way.

Part of the problem is that suspended processes with trap_exit == true do not have monitors provide their message when the process gets an exit exception because they are unable to process the 'EXIT' message.  The process could be terminated by using the unignorable {{kill}} exit reason, but that doesn't solve the problem.

The current situation would require a separate process to determine a potentially misbehaving process needs to terminate, then use {{erlang:process_info(P, status)}} to determine the process is suspended and needs to be terminated with the unignorable {{kill}} exit reason, without the ability of monitors to receive exit exceptions that may occur from separate processes while the process P is suspended (if P has trap_exit == true).
@OTP-Maintainer
Copy link
Author

rickard said:

I've changed this issue from bug to new feature request, since this is not a bug.

The problem is much more general than exit signals and dist_buf_busy_limit:
1. It concerns all signals that one wants a process to react to in a timely manner. Exit signals is just a special case.
2. It concerns all blocking operations. If you are blocked in a gen_server call, you'll end up in the same situation. That is, it does not have anything to do with the suspended state. The problem is that the process is blocked and cannot take any actions when other signals arrive.

As I see it, using only truly fully asynchronous operations in these scenarios is the solution. If a process only communicate using fully asynchronous primitives, it can always be made to handle all scenarios like this that can occur.

The problem with the dist_buf_busy_limit is that it makes distributed operations not fully asynchronous. This in turn can cause various problems. If dist_buf_busy_limit had not existed as of now and someone would have wanted to introduce it, I would object to its introduction for that reason.

By increasing dist_buf_busy_limit *much*, preferably to its maximum value and by this making the operation fully asynchronous, this is no longer a problem. Your process will not be blocked, and it can handle the exit signal. The only thing you have to think of is to manage flow control yourself. This flow control mechanism (today using dist_buf_busy_limit) has been present since at least the 90s, so its default value cannot be changed in such a manner.

@OTP-Maintainer
Copy link
Author

okeuday said:

It might seem better with distributed Erlang using asynchronous sends but that approach should run the risk of exhausting buffer/queue memory, unless it switches to synchronous only when memory usage gets too high (asynchronous logging has the same problem).  I understand I can increase the dist_buf_busy_limit, but my expectation was that there can always be a limit that causes the suspended process state and I wanted to make sure it was handled in a fail-fast way.

I am currently using {{erlang:process_info/2}} to check if the process is suspended and if it is suspended, then using {{erlang:exit(Pid, kill)}} instead of waiting for the process to terminate the way it should.  That keeps the forced termination fast when the process is suspended due to sends hitting the dist_buf_busy_limit.  However, it would be better if the suspended/waiting process is able to handle the 'EXIT' message due to an exit exception when trap_exit == true.

I thought this would only relate to exit exceptions because the erlang module requires non-self exceptions be exit exceptions, but I understand the ERTS source code likely handles all exceptions in the same way (I have not checked).  That was why I had thought this problem would be specific to how trap_exit == true works.

Thanks for the information.

@OTP-Maintainer
Copy link
Author

zuiderkwast said:

exit/2 sends an _exit signal._ It is not an _exception_. Trap_exit turns a received exit signal into a _message_. These are three different things.

@OTP-Maintainer
Copy link
Author

okeuday said:

Yes, sorry for the confusion.  The signals are exit, link, unlink, monitor, and demonitor.

@OTP-Maintainer
Copy link
Author

rickard said:

bq. It might seem better with distributed Erlang using asynchronous sends but that approach should run the risk of exhausting buffer/queue memory

Yes, that is why you need to handle flow control yourself

bq. I am currently using erlang:process_info/2 to check if the process is suspended and if it is suspended

You cannot rely on the process ending up in the suspended state. This is an implementation detail that might change. It might very well end up in a waiting state instead of suspended state in the future.

bq. The signals are exit, link, unlink, monitor, and demonitor.

There are more signals than this, most importantly messages are signals as well. They are all signals, but of different types.

@OTP-Maintainer
Copy link
Author

rickard said:

Exceptions are however completely different things than signals.

@rickard-green
Copy link
Contributor

Closing this. Nowadays there are also fully asynchronous distribution on a per process level available to help handle things like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants