-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement RotatingScope with thread local storage #5573
base: master
Are you sure you want to change the base?
Conversation
😊 Welcome @wangjian-pg! This is either your first contribution to the Istio proxy repo, or it's been You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
Hi @wangjian-pg. Thanks for your PR. I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
2babfbd
to
4a1e3e0
Compare
4a1e3e0
to
e35dd28
Compare
/ok-to-test |
@lei-tang I have formatted the changes with clang-lint, and all tests have passed. Could you please review the code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangjian-pg Thank you for your PR!
Can you provide a concrete example to describe the problems of current implementation?
@lei-tang The current implementation generally works well for most regular cases and under normal circumstances. However, consider a scenario where thousands of processes are running on a host with limited resources, such as a system with only two CPU cores. In this case, the operating system may schedule the worker thread off the CPU core when it attempts to allocate a new gauge metric with the currently active scope pointer through the function call The real problem is that it relies on timing to ensure that the scope pointer used by the worker threads is valid and lacks any other synchronization mechanism. This behavior is non-deterministic in a time-sharing operating system. By the way, as far as I know, using std::atomic may introduce a memory barrier and could potentially lead to performance degradation. However, I haven't yet benchmarked this and it shouldn't be a big deal. |
If the http.stats plugin is used as the statistics plugin, when the onDelete callback is triggered, Is it possible that all the historical statistical data will be cleared? if so, this could lead to a sudden change in the statistical data of the monitoring system, potentially triggering alerts. To avoid this issue, it would be better to only clear the historical statistical data for targets (e.g., Pods) that have not received requests for a long time, while leaving the statistical data for targets that are still actively receiving requests unaffected. This would help maintain the continuity of the statistical data and prevent unnecessary alerts. |
@jezhang2014 AFAIK, the widely adopted monitoring system |
@kyessenov Could you please take a look at this PR? |
@kyessenov Could you please take a look at this PR? |
not stale |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
The current implementation of metric rotation relies on timing to ensure that the scope pointer used by the worker thread is valid, which is probably not a good practice or design as far as I know.
After digging into the envoy's implementation of
ThreadLocalStoreImpl
(which faces similar issues of how to propagate scoped metrics to worker threads and expire them when the scope is released), I found that storing the active scope reference in thread local storage and replacing it with the new one when rotation occurs might be a better design. With this approach, we eliminate the use of raw pointers, making rotation behavior more predictable and still lock-free.I re-implemented the rotation as described and tested the code in my environment, it works as expected. Please take a look. Looking forward to any feedback.