Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stray {ref, ok} message can kill the writer process #9991

Closed
lukebakken opened this issue Nov 27, 2023 · 9 comments · Fixed by #9994
Closed

Stray {ref, ok} message can kill the writer process #9991

lukebakken opened this issue Nov 27, 2023 · 9 comments · Fixed by #9994
Assignees
Labels
Milestone

Comments

@lukebakken
Copy link
Collaborator

lukebakken commented Nov 27, 2023

Describe the bug

Complete details can be found in this discussion:

#9803

Reproduction steps

No reproduction steps exist at this time.

Expected behavior

The source of the stray message is found, or the message is logged and then ignored.

Additional context

@gomoripeti says that there is code in the ra library that fits the bill for the format of the stray message (link). Waiting on him to point out what code it is. This may, of course, be a message from within OTP itself.

@lukebakken lukebakken added the bug label Nov 27, 2023
@lukebakken lukebakken self-assigned this Nov 27, 2023
@michaelklishin
Copy link
Member

@lukebakken should we simply ignore this message in the writer and try to log some context at debug level? I mean, the writer cannot do much about this message, and this may allow us to log relevant details about the sender.

@lukebakken
Copy link
Collaborator Author

This feels like a warning level sort of message, maybe?

michaelklishin added a commit that referenced this issue Nov 27, 2023
of a certain structure reported in #9803.

Closes #9991.
@michaelklishin
Copy link
Member

@lukebakken logging a warning works for me #9994.

Those who have a way to reproduce and are interested in digging in now have a place to add more logging and tracing.

@michaelklishin michaelklishin added this to the 3.12.11 milestone Nov 27, 2023
@michaelklishin
Copy link
Member

rabbit_writer interacts with

  • A socket, which in Erlang is a source of messages
  • A statistics timer, which uses a reference for identity
  • It implements a few bits required by the sys module

I'd start with those three for most likely senders of these stray {ref(), ok} messages.

mergify bot pushed a commit that referenced this issue Nov 28, 2023
of a certain structure reported in #9803.

Closes #9991.

(cherry picked from commit f14bd15)
mergify bot pushed a commit that referenced this issue Nov 29, 2023
of a certain structure reported in #9803.

Closes #9991.

(cherry picked from commit f14bd15)
(cherry picked from commit 0c742bb)
@lukebakken lukebakken reopened this Nov 29, 2023
@lukebakken
Copy link
Collaborator Author

I'm going to re-open this issue to devote some time (at some point) to figure out the root cause.

@michaelklishin will having this issue open affect release notes? I can open a new issue if so.

@michaelklishin
Copy link
Member

Don't worry about the release notes, we can leave it open

@kjnilsson
Copy link
Contributor

I can't see any usages in Ra of {ref, ok} the closest I have is https://github.com/rabbitmq/ra/blob/ba3293a3ea29e6aa738726ba22e951f1ce8e7af4/src/ra_server_sup_sup.erl#L243
but that should be a proper reference() not the atom ref.

@lukebakken
Copy link
Collaborator Author

@kjnilsson yes, the error in the logs shows an actual reference, not the atom ref

@michaelklishin
Copy link
Member

I've closed this because as of #9991, we hope that the writer process does not die. If someone wants to investigate where these messages originate from, you are welcome to do it.

It's just that this specific issue, as it is worded right now, should be addressed.

michaelklishin added a commit that referenced this issue Feb 29, 2024
of a certain structure reported in #9803.

Closes #9991.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants