New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(webhook): add async retries and evaluate reply callback in fresh process #10690
perf(webhook): add async retries and evaluate reply callback in fresh process #10690
Conversation
2af2628
to
f61a5a4
Compare
Pull Request Test Coverage Report for Build 5003109574
💛 - Coveralls |
apps/emqx/rebar.config
Outdated
@@ -24,7 +24,7 @@ | |||
{deps, [ | |||
{emqx_utils, {path, "../emqx_utils"}}, | |||
{lc, {git, "https://github.com/emqx/lc.git", {tag, "0.3.2"}}}, | |||
{gproc, {git, "https://github.com/emqx/gproc", {tag, "0.9.0.1"}}}, | |||
{gproc, {git, "https://github.com/uwiger/gproc", {tag, "0.9.1"}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to sync and use our fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our 0.9.0.1
was tagged just because 0.9.1
was not ready upstream yet. Both are basically at the same state, just 0.9.0.1
has appup for v4.4.
I talked with @zmstone , and I understood that we wanted to use upstream if possible here. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry.
For gproc, since we already forked (with appup etc in place), it makes more sense to use our fork instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted back to using our fork.
✔️
f61a5a4
to
0a55c1e
Compare
This is a performance improvement for webhook bridge. Since this bridge is called using `async` callback mode, and `ehttpc` frequently returns errors of the form `normal` and `{shutdown, normal}` that are retried "for free" by `ehttpc`, we add this behavior to async requests as well. Other errors are retried too, but they are not "free": 3 attempts are made at a maximum. This is important because, when using buffer workers, we should avoid making them enter the `blocked` state, since that halts all progress and makes throughput plummet.
This surprisingly simple change yields a big performance improvement in throughput. While the previous commit achieves ~ 55 k messages / s in throughput under some test conditions (100 k concurrent publishers publishing 1 QoS 1 message per second), the simple change in this commit improves it further to ~ 63 k messages / s. Benchmarks indicated that the evaluating one reply function is consistently quite fast (~ 20 µs), which makes this performance gain counterintuitive. Perhaps, although each call is cheap, `ehttpc` calls several of these in a row when there are several sent requests, and those costs might add up in latency.
0a55c1e
to
bc7d0d5
Compare
Enhancements
|
增强
|
(Partially) fixes https://emqx.atlassian.net/browse/EMQX-9637
There are two optimizations made to improve the throughput of the webhook bridge.
normal
and{shutdown, normal}
requests are retried indefinetely.Since this bridge is called using
async
callback mode, andehttpc
frequently returns errors of the formnormal
and{shutdown, normal}
that are retried "for free" byehttpc
, we add this behavior to async requests as well. Other errors are retried too, but they are not "free": 3 attempts are made at a maximum.This is important because, when using buffer workers, we should avoid making them enter the
blocked
state, since that halts all progress and makes throughput plummet.The reply callback
emqx_connector_http:reply_delegator
is evaluated in a fresh process.This surprisingly simple change yields a big performance improvement in throughput.
While the previous commit achieves ~ 55 k messages / s in throughput under some test conditions (100 k concurrent publishers publishing 1 QoS 1 message per second), the simple change in this commit improves it further to ~ 63 k messages / s.
Benchmarks indicated that the evaluating one reply function is consistently quite fast (~ 20 µs), which makes this performance gain counterintuitive. Perhaps, although each call is cheap,
ehttpc
calls several of these in a row when there are several sent requests, and those costs might add up in latency.This also makes a slight change in the load balancing of requests to
ehttpc
from the webhook bridge: now the routing key is the message's clientid rather than the calling PID. This should have the same behavior as before, except in the event of a client takeover. In this case, requests will still be forwarded to the same HTTP connection for the same client (assuming that such connection is still alive when the takeover is complete) rather than a different one.Summary
🤖 Generated by Copilot at 4ed859c
This pull request enhances the HTTP connector with a retry mechanism and fixes a bug related to gproc counters. It also updates some dependencies in
mix.exs
andrebar.config
to use more reliable sources and versions.PR Checklist
Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:
changes/{ce,ee}/(feat|perf|fix)-<PR-id>.en.md
filesChecklist for CI (.github/workflows) changes
changes/
dir for user-facing artifacts update