-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug - Error: Failed to persist state to backend on modernisation-platform #5859
Comments
The error to be fixed:
|
The fix has been temporarily deployed (see the slack thread) to the scheduled baseline pipeline only and it has already worked with the happy path and a failure on apply. It still needs evidence of errored state failure on apply and a successful state push. This will require some time, but if no pipelines fails due to an errored state for over a week or two, this is probably a good enough test. Leaving this issue open, for when there is more evidence and to then enrol it to all other pipelines. Putting it into the blocked column (or feel free to put it back into the backlog, if easier. |
https://mojdt.slack.com/archives/C015UBQ78MR/p1702984257007459 << You can see a short Slack conversation with our AWS TAM here where we were given some guidance / linked to the S3 performance design considerations whitepaper. https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html |
Thanks David, but according to the doc, we are not hitting the limit in a single workflow run. FYI, the implemented solution is not looking into the limits, but is to push the errored state. |
It appears that the error may be a terraform bug: We see the same issue in v1.6.6. Terraform trace for more insights:
NOTE, s3 bucket and tf backend values were further redacted with ***. Additionally, the CloudTrail does not show any errors for the above HTTP request:
which is a good indication that the problem lies on terraform (no issue in cloudtrail and the state was actually saved in this instance, but the terraform still fails). |
The state push fix for the state persistence failure is now rolled out to the scheduled baseline workflow with temporarily suppression of slack alerts for when the state push is successful. |
To roll out the fix to other repos/workflows see this issue: #6038 |
Expected Behavior
The state should save without issues.
Actual Behavior
We get the above error as seen in https://github.com/ministryofjustice/modernisation-platform/actions/runs/7285740796/job/19855252914#step:7:113 as an example
Steps to Reproduce the Problem
Run a full release that amends everything, e.g. adding a role access change. The number that happen is not consistent but it has been happening more recently.
Version
Example is the run for PR #5840
Modules
modernisation-platform
Account
No response
The text was updated successfully, but these errors were encountered: