Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix process_info that can hang waiting for reply #5078

Merged
merged 1 commit into from
Aug 10, 2021

Conversation

max-au
Copy link
Contributor

@max-au max-au commented Jul 29, 2021

With multiple reusable receive markers, marker that is marked
to remove may be followed by process_info signal, which will
also be handled within the loop in erts_proc_sig_handle_incoming.

It is possible that process_info moves messages up to the next
non-message signal from the middle to inner queue. If there was a
receive marker at the very beginning of the middle queue, it
would set save pointer to sig_qs.cont, which will be overwritten
when linking middle queue head into inner queue. This would make
save pointer to skip all messages that were in between of removed
receive marker and process_info request.

To observe this behaviour using a debugger, run the test case
provided with a breakpoint set to handle_process_info part where
it reassigns c_p->sig_qs.cont to the next non-message signal
(**next_nm_sig)

Test case runs a process that triggers receive marker to enter
its middle queue. Currently it's done with process_info BIF, but
can also be triggered with erlang:cancel_timer (which also
places receive marker). Additional processes are sending
process_info requesting to move messages into inner queue, which
eventually makes middle queue to contain:

  • receive marker that is set to be removed but set_save
  • reply tuple ({Ref, Dictionary})
  • process_info request moving previous message into inner queue

Receive marker gets removed from the middle queue after setting
save pointer to sig_qs.cont, then erts_proc_sig_handle_incoming
proceeds to process_info. Which moves reply tuple into the
inner queue, but save pointer is left untouched pointing to the
head of the middle queue, which is beyond the reply. This
way reply never gets matched (as save pointer has already
moved past it), and process_info hangs forever.

@uabboli uabboli added the team:VM Assigned to OTP team VM label Jul 30, 2021
@kjellwinblad kjellwinblad added the testing currently being tested, tag is used by OTP internal CI label Aug 3, 2021
@kjellwinblad
Copy link
Contributor

kjellwinblad commented Aug 4, 2021

Thanks for the PR and for finding the bug! I think the commit looks good but I think we can wait a few days until @jhogberg or @rickard-green (they are much more familiar with the receive marker optimization than I am) have had time to look at it before we merge and release a patch.

@rickard-green rickard-green self-assigned this Aug 9, 2021
@rickard-green
Copy link
Contributor

Great catch! Thanks for the PR!

Please rebase your branch onto the tag OTP-24.0.5, so that it is possible to release this in the next emergency patch package that we release.

With multiple reusable receive markers, marker that is marked
to remove may be followed by process_info signal, which will
also be handled within the loop in erts_proc_sig_handle_incoming.

It is possible that process_info moves messages up to the next
non-message signal from the middle to inner queue. If there was a
receive marker at the very beginning of the middle queue, it
would set save pointer to sig_qs.cont, which will be overwritten
when linking middle queue head into inner queue. This would make
save pointer to skip all messages that were in between of removed
receive marker and process_info request.

To observe this behaviour using a debugger, run the test case
provided with a breakpoint set to handle_process_info part where
it reassigns c_p->sig_qs.cont to the next non-message signal
(**next_nm_sig)

Test case runs a process that triggers receive marker to enter
its middle queue. Currently it's done with process_info BIF, but
can also be triggered with erlang:cancel_timer (which also
places receive marker). Additional processes are sending
process_info requesting to move messages into inner queue, which
eventually makes middle queue to contain:
 - receive marker that is set to be removed but set_save
 - reply tuple ({Ref, Dictionary})
 - process_info request moving previous message into inner queue

Receive marker gets removed from the middle queue after setting
save pointer to sig_qs.cont, then erts_proc_sig_handle_incoming
proceeds to process_info. Which moves reply tuple into the
inner queue, but save pointer is left untouched pointing to the
head of the middle queue, which is beyond the reply. This
way reply never gets matched (as save pointer has already
moved past it), and process_info hangs forever.
@max-au
Copy link
Contributor Author

max-au commented Aug 10, 2021

Rebased to OTP-24.0.5 (base branch has to stay "maint").

@rickard-green rickard-green merged commit b1a0cde into erlang:maint Aug 10, 2021
@max-au max-au deleted the max-au/fix-pi-lockup branch January 25, 2022 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants