-
Notifications
You must be signed in to change notification settings - Fork 30
Description
For a couple of times now, the AWS CLI has been killed during the production workflow in the "Update CloudFront KeyValueStore redirects" (docs-assembler deploy update-redirects
) step. The cause appears to be the underlying system running out of memory, which is reflected in the error code provided by the CLI (137).
The main suspect so far is getting unlucky with a noisy neighbor - but it's still weird that it never happened and then happens twice in a week, so we can't rule out just reaching memory limits now. AFAIU we consistently exit cleanly from the tooling in earlier steps, but maybe somehow they aren't freed from the runner's perspective...? Our input in this step doesn't seem to be a factor - we had an OOM when running aws cloudfront describe-key-value-store
to acquire an ARN, which is much cheaper than the paginated update that happens later in the process.
How should we tackle this?
- Can we increase the runner's available memory?
- Add a swapfile/partition?
- Add memory consumption logs/metrics?