-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e test to validate hitless reload of Envoy Gateway #1503
Comments
Some background on this, I hit an issue where after restarting the controller or after a resync period envoy begins draining the existing listeners. I believe this is caused by a re-ordering of the routes sent over xDS. In my case I had configured 2 TLSRoute resources and without any changes envoy began draining and eventually sent TCP resets to the long lived connections. I think a key part of this test is having several routes configured in order to reproduce the undesired behavior. |
@dboslee long lived connections might be a separate sub issue related to keep alive timeouts (currently disabled by default) |
To clarify in my case the connections were hitting the default |
@dboslee What would be the short name and description for this new test? How about these:
Do we expect the client to fail to send requests after 10 minutes following the restart, as the connection is reset after 10 minutes? Thanks! |
@arkodg |
Hi @Ronnie-personal thanks for the looking into this issue, assigning this you for now
|
@arkodg Thanks! |
@arkodg Here are the steps in the reload.go test code,
Thanks! |
hey @Ronnie-personal that looks right ! updating some steps Obtain Kubernetes client from cSuite
|
@arkodg Thanks for the prompt reply! Thanks! |
This means restarting the envoy gateway controller. Which can be done by deleting the controller pods or updating the envoy gateway controller deployment in such a way that it triggers a rollout of new pods for you. |
Also update on the issue which sparked this conversation, while I initially thought the issue was caused by route order changing this was just a trigger for the actual root cause. Which is that the TCP listener name is set to name of the first route processed by the controller so when routes get processed in a different order or the route the TCP listener is named after is deleted the TCP listener name changes and the listener is drained. This happens here gateway/internal/xds/translator/translator.go Line 230 in 708287e
ir.TCPListener.Name is actually the name of a specific TCPRoute or TLSRoute.
|
Thanks for the clarification. Thanks! |
Do we consider it as a bug? What would be the options to fix this root issue? |
thanks for highlighting this @dboslee !
will raise GH issues for each of these so this GH issue around E2E is not forgotten :) |
|
@arkodg Here is the 'reload' test code main...Ronnie-personal:gateway:e2ereload1503 Thanks, |
hey @Ronnie-personal thanks for continuing to work on this, can you raise this as a PR so others in the community can also review, tia ! |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
Description:
Add a test case in our e2e framework https://github.com/envoyproxy/gateway/tree/main/test/e2e where
The text was updated successfully, but these errors were encountered: