This experiment reproduces an issue of writing logs to AWS CloudWatch from AWS Lambda written in Go.
In small number of cases the last log written from the lambda will not be saved to CloudWatch. In all observed occurrence of the issue the request which exhibits the faulty behavior is the last one written in CloudWatch stream.
The simple lambda writes a log in JSON format which contains AWS request ID and message begin.
Then it sleeps for 10 to 600 miliseconds.
After which it again writes to log in JSON format which contains AWS request ID and message end.
Sometimes the second log (one with message end) will not be written to CloudWatch.
We offer the following data as proof:
- 2019-12-09:
- Analysis records which lists number of request logs analyzed and list of the ones which have a missing pair.
- Complete CloudWatch streams for bad occurrences:
- Request c76d6233-7b64-489a-bc28-7f26a09446e6 stream c7116d03d8cd480e9139328dfe28a7ee
- Request 83445a2c-f381-4f38-9091-c16cbf1eda33 stream a68cf9e178884125b5b38b5ee81b0f14
- Request 728cae3b-6b12-41f9-a6eb-ed99b92d50b1 stream 01d9127a46934a5f806f4403fda64ffc
- Request e075102a-f773-484f-9ed3-f05026080b59 stream 9f192e37e3a14aed966ed4a5e1cc7288
- Request 06058d99-3be3-4a1e-afda-e2c466050a75 stream 0385626bb44c4e3a8558a5ad5f4dff64
- Complete CloudWatch log group
- 2019-12-20:
- Analysis records which lists number of request logs analyzed and list of the ones which have a missing pair.
- Complete CloudWatch streams for bad occurrences:
- Complete CloudWatch log group
Please install Go and Terraform before proceeding.
Run make install to compile lambda.go binary and provision the infrastructure.
Run make run to run experiment. The experiment will invoke the lambda with invocation type Event 600 times in 600 concurrent executions waiting 5 to 6 seconds between doing request (each of indviidual 600 concurrent executions).
Wait an ~hour for any delayed CloudWatch logs to be written (known problem).
Run make analyze to analyze logs. The output will contain number of request logs analyzed, how many are missing and AWS Request IDs of missing ones.
Use CloudWatch console or AWS CLI to analyze the issue.
For clean up run make uninstall to tear-down provision infrastructure, note this will also delete the CloudWatch group with the data.