It is recommended to roll forward with a fix than to roll back. If a rollback is still required, follow the steps in this guide
This repo contains an interactive script which can be used to roll back a corrupt config or container version for the Network Access Control service.
In the event that Grafana has alerted on a disaster scenario, find the correct section and follow the steps provided.
Identify the corrupt configuration file, this can be either clients.conf
or authorised_macs
aws-vault exec ENV_PROFILE -- make restore-config
- At the prompt, enter the environment name (development/pre-production/production)
- At the second prompt, enter the name of the S3 key to restore
- You will be given an output of the last five published configs with their
VersionId
andLastModified
- Copy the
VersionId
of the config you wish to restore to - At the final prompt, paste the
VersionId
- The terminal will exit with the following command:
Successfully rolled back 'authorised_macs'/'clients.conf' to version: VersionId
- Kick off a release in the server pipeline to force a rolling deploy where the servers pull in this reverted file.
The latest ECR image is tagged with 'latest'. Untagged images are kept for 14 days before being deleted with a lifecycle policy. This is to keep storage costs down. It is assumed that if an image has been live for 14 days, it can be considered stable.
If a previous image needs to be restored within this 14 day period, follow the steps below:
aws-vault exec mojo-{ENVIRONMENT}-cli -- make restore-service-container
- At the prompt, enter the environment name (development/pre-production/production)
- At the second prompt, enter the corrupt service name (dns/dhcp)
- You will be given an output of the last five pushed containers with their
imageDigest
andimagePushedAt
- Copy the
imageDigest
of the container you wish to re-tag as latest - At the final prompt, paste the
imageDigest
- The terminal will exit with the following command:
Successfully re-tagged image: imageDigest as latest
- A rolling deploy will have to be done manually by stopping each of the container and waiting for them to be accepted into service