RLQS: Improve TlsStore persistence after filter factory deletion#44690
RLQS: Improve TlsStore persistence after filter factory deletion#44690bsurber wants to merge 7 commits into
Conversation
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
…inish-rlqs-fix
…TlsStore deletion outside of the main thread Signed-off-by: Brian Surber <bsurber@google.com>
Signed-off-by: Brian Surber <bsurber@google.com>
|
The only way to get coverage on the non-main-thread deletion check is to block filter execution until a config change pushes through to delete the filter's factory. From experimentation, I've confirmed that the LDS config updates in filter_persistence_test.cc actually get blocked while the filter is still in-process on the worker thread (done by injecting a blocking callback in my experiments). That behavior may be specific to the integration test framework or may actually match production, but it doesn't feel great to rely on that assumption. Either way, this does block the full coverage improvement. |
|
/coverage |
|
Coverage for this Pull Request will be rendered here: https://storage.googleapis.com/envoy-cncf-pr/44690/coverage/index.html For comparison, current coverage on https://storage.googleapis.com/envoy-cncf-postsubmit/main/coverage/index.html The coverage results are (re-)rendered each time the CI |
Signed-off-by: Brian Surber <bsurber@google.com>
|
CC @envoyproxy/coverage-shephards: FYI only for changes made to |
With an issue like this which involves multiple threads and object lifetime, it seems vital to have complete test coverage. Otherwise we run the risk of discovering crash bugs in prod. Cooking up an integration test to do this seems very wise. if you've been able to get an LDS config update to do this, that sounds great. But are you saying that the new changes in test/extensions/filters/http/rate_limit_quota/filter_persistence_test.cc don't cover it? |
Whoops, I just noticed that I accidentally forgot to remove this unhelpful So the only way to get coverage is to force the filter to block until it has definitely outlived the filter factory, but the LDS push to delete the filter factory was blocked by the running filter (no stat updates, no change in shared_ptr owner count, etc). |
|
I'm apprehensive about the lack of coverage so I'd like to get some additional opinions: |
Commit Message: Branch off of #44236 with some cleanup and improved coverage in filter_persistence_test.cc
Overall, this lets filter instances in worker threads hold shared_ptr ownership of TlsStores and the Global RLQS client (GlobalRateLimitClientImpl), guaranteeing their existence even if the filter factory (previous unique owner of both) gets deleted via an xDS push.
Operations posted to the main thread by the GlobalRateLimitClientImpl also now check to make sure they haven't outlived the original global client.
Additional Description:
Risk Level: minor (haven't been reproducible outside of direct coverage testing)
Testing: unit testing
Docs Changes:
Release Notes:
Platform Specific Features: