New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid logging unnecessary errors in async cleanup functions #11065
fix: avoid logging unnecessary errors in async cleanup functions #11065
Conversation
86e3cdb
to
d1ccf03
Compare
-spec lookup(ets:tab(), term()) -> [tuple()]. | ||
lookup(Tab, Key) -> | ||
?safe_ets(ets:lookup(Tab, Key), []). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to add a try-catch block in every ets operation ?
Maybe the cost too high? is it possible to only handle this during shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about adding a generic try catch
block directly in the cleanup functions but ultimately opted for this solution instead.
As the goal is to avoid extra logging, it seems slightly risky to catch and hide any badarg
error as it's quite generic (may be potentially triggered by other code besides ETS).
As to the performance cost, I think try catch
is cheap enough as long as stack trace is not inspected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I think this fix introduces a side effect for the general call, keeping the affected scope in the clean up
would be better.
At the same time, I think try-catch is always expensive because it needs to save the calling environment for restoration when some exception happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try catch without attempting to retrieve the stacktrace should not impact performance in a observable way.
we can try to run some simple benchmark test to verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with zhongwen and firest that hiding badarg error in a generic call path may hinder the exposure of some potential issues especially during tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
After additional discussion with @zmstone I've removed safe ETS wrappers and added try .. catch
only in relevant shutdown functions that use async worker pool.
My 5 cents: sounds like startup order / dependency problem to me, that is usually generally solvable on the supervision tree level. For example, with |
@keynslug, |
Cleanup functions that access ETS tables may fail with `badarg` error during EMQX shutdown. They are called asynchronously by `emqx_pool` workers and accessed ETS tables may be already destroyed as their owners are shut down. This fix catches ETS `badarg` errors before they can be caught and logged by `emqx_pool`. Fixes: EMQX-9992
d1ccf03
to
950d5ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Cleanup functions that access ETS tables may fail with
badarg
error during EMQX shutdown.They are called asynchronously by
emqx_pool
workers and accessed ETS tables may be already destroyed as their owners are shut down. This fix catches ETSbadarg
errors silently and individually, before they can be caught and logged byemqx_pool
.Fixes EMQX-9992
Summary
🤖 Generated by Copilot at 86e3cdb
This pull request refactors the code that uses ets tables in various modules to improve error handling and readability. It introduces a new module
emqx_utils_ets
that provides wrapper functions and a macro for ets operations. It also moves some broker-related functions to theemqx_broker
module.PR Checklist
Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:
changes/{ce,ee}/(feat|perf|fix)-<PR-id>.en.md
filesChecklist for CI (.github/workflows) changes
changes/
dir for user-facing artifacts update