New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option to disable retries & disable boto retries for CI tests #9074
Conversation
This should speed up exception testing considerably because most It also shows good progress at mitigating flaky tests. 5 of the 9 failing tests are marked by CircleCI as flaky: https://app.circleci.com/pipelines/github/localstack/localstack/17875/workflows/93965461-12f5-4dff-9821-1fabbe7ea8ba/jobs/136694/tests |
425c835
to
a4fc54b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
localstack/aws/connect.py
Outdated
@@ -333,7 +333,7 @@ def _get_client( | |||
aws_access_key_id=aws_access_key_id, | |||
aws_secret_access_key=aws_secret_access_key, | |||
aws_session_token=aws_session_token, | |||
config=config, | |||
config=config.merge(Config(retries={"max_attempts": 0})), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be reverted? Are we trying to address an issue with the test suite? I would rather set this in the fixtures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is also intended to be set for internal communication, though we should observe this a bit. Generally I think it makes more sense to fail fast if there's an issue instead of retrying countless times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not hide potential issues through retries for internal calls as long as we assume that LocalStack runs on the same machine (i.e., without a potentially problematic network connection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now also added a config option to opt-out of this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep this a bit less controversion for now, I've switched the default back to normal retries & explicitly set DISABLE_BOTO_RETRIES
in our CI tests. This means you'll need to set this in your environment locally though if you want to execute your tests without retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Did the rebase fix the other flaky tests or did you patch b1a17823afc5cd2a147b5c5d83ab6a02aa660e9b
?
localstack/aws/connect.py
Outdated
@@ -333,7 +333,7 @@ def _get_client( | |||
aws_access_key_id=aws_access_key_id, | |||
aws_secret_access_key=aws_secret_access_key, | |||
aws_session_token=aws_session_token, | |||
config=config, | |||
config=config.merge(Config(retries={"max_attempts": 0})), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not hide potential issues through retries for internal calls as long as we assume that LocalStack runs on the same machine (i.e., without a potentially problematic network connection).
FunctionName=function_name, Qualifier=function_version | ||
)["Status"] | ||
if status == "FAILED": | ||
raise ShortCircuitWaitException("terminal fail state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice pattern we should remember and use more often in tests 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: also a better solution than the regular retry from the Lambda invocation loop 👍 (favored this one in rebase)
just a normal re-run 🤷♂️ but they're gone now on master anyway. |
a4fc54b
to
550ea7a
Compare
Motivation
These uncontrolled boto-client level retries are an inherent source of unpredictable behavior, so let's see if disabling them causes any unexpected issues.
Retries also cause unnecessary time waiting when we actually want to test a failure case.
Changes
DISABLE_BOTO_RETRIES
. SetDISABLE_BOTO_RETRIES=1
to opt-tin of the new behavior.Discussion
Might also want to go through cases where we use SDKs in lambda functions etc. to disable any retries.