Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(buffer_worker): avoid sending late reply messages to callers #10455

Merged

Conversation

thalesmg
Copy link
Contributor

@thalesmg thalesmg commented Apr 19, 2023

Fixes https://emqx.atlassian.net/browse/EMQX-9635

During a sync call from process A to a buffer worker B, its call to the underlying resource C can be very slow. In those cases, A will receive a timeout response and expect no more messages from B nor C. However, prior to this fix, if B is stuck in a long sync call to C and then gets its response after A timed out, B would still send the late response to A, polluting its mailbox.

Summary

馃 Generated by Copilot at 8b5521b

Added a new function reply_call/2 to emqx_resource_buffer_worker module to improve query request handling and tracing. Updated the version number of the emqx_resource application. Added test cases for timeout and late reply scenarios using the emqx_connector_demo module.

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

  • Added tests for the changes
  • Changed lines covered in coverage report
  • Change log has been added to changes/{ce,ee}/(feat|perf|fix)-<PR-id>.en.md files
  • For internal contributor: there is a jira ticket to track this change
  • [na] If there should be document changes, a PR to emqx-docs.git is sent, or a jira ticket is created to follow up
  • [na] Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

  • [na] If changed package build workflow, pass this action (manual trigger)
  • [na] Change log has been added to changes/ dir for user-facing artifacts update

@thalesmg thalesmg force-pushed the fix-late-gen-server-replies-buf-worker-v50 branch 3 times, most recently from 0efc511 to 818d8fa Compare April 19, 2023 21:09
@thalesmg thalesmg marked this pull request as ready for review April 19, 2023 21:22
@thalesmg thalesmg requested a review from a team as a code owner April 19, 2023 21:22
@@ -295,19 +295,19 @@ pick_call(Id, Key, Query, Timeout) ->
?PICK(Id, Key, Pid, begin
Caller = self(),
MRef = erlang:monitor(process, Pid, [{alias, reply_demonitor}]),
From = {Caller, MRef},
From = {Caller, [alias | MRef]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is pretty complicated, it would be great to explain what's going on in a comment.
Q: why does From contain an improper list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll drop the usage of gen_statem:reply and go back to my first solution of just sending to MRef the reply, after Stone's comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

鉁旓笍

Fixes https://emqx.atlassian.net/browse/EMQX-9635

During a sync call from process `A` to a buffer worker `B`, its call
to the underlying resource `C` can be very slow.  In those cases, `A`
will receive a timeout response and expect no more messages from `B`
nor `C`.  However, prior to this fix, if `B` is stuck in a long sync
call to `C` and then gets its response after `A` timed out, `B` would
still send the late response to `A`, polluting its mailbox.
@thalesmg thalesmg force-pushed the fix-late-gen-server-replies-buf-worker-v50 branch from 818d8fa to cb995e2 Compare April 19, 2023 21:27
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 4748080857

  • 3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
  • 50 unchanged lines in 15 files lost coverage.
  • Overall coverage decreased (-0.07%) to 81.204%

Files with Coverage Reduction New Missed Lines %
apps/emqx_authn/src/enhanced_authn/emqx_enhanced_authn_scram_mnesia.erl 1 85.29%
apps/emqx_resource/src/emqx_resource_manager.erl 1 90.43%
apps/emqx/src/emqx_broker.erl 1 87.64%
apps/emqx/src/emqx_connection.erl 1 85.43%
apps/emqx/src/emqx_crl_cache.erl 1 90.11%
apps/emqx/src/emqx_schema.erl 1 88.89%
apps/emqx_connector/src/emqx_connector_jwt_worker.erl 2 92.54%
apps/emqx/src/emqx_alarm.erl 2 91.89%
apps/emqx/src/emqx_stats.erl 2 91.67%
apps/emqx_gateway_exproto/src/emqx_exproto_gcli.erl 3 48.78%
Totals Coverage Status
Change from base Build 4745034971: -0.07%
Covered Lines: 25585
Relevant Lines: 31507

馃挍 - Coveralls

@thalesmg thalesmg merged commit 3f18c5e into emqx:master Apr 20, 2023
125 of 126 checks passed
@thalesmg thalesmg deleted the fix-late-gen-server-replies-buf-worker-v50 branch April 20, 2023 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants