Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add awslogs multiline support #30891
Signed-off-by: Justin Menga email@example.com
- What I did
This PR adds multiline processing to the AWS CloudWatch logs driver, providing functionality similar to the native AWS CloudWatch logs agent.
The feature allows users to optionally specify multiline pattern matching logic using either a regular expression or a Python strftime expression, meaning application events that span multiple lines (e.g. application stack traces) can be published to CloudWatch logs as a single event.
- How I did it
Implemented an event buffer that buffers log messages until a new multiline start pattern is matched. As soon as a new event is detected, the event buffer is appended/flushed to the pre-existing events slice for normal processing using the pre-existing batch publishing mechanism.
The implementation will immediately flush the event buffer if the maximum CloudWatch logs event size is reached, appending the buffer to the events slice up to the maximum event size.
The implementation also will only buffer up to the batch publishing frequency (currently 5 seconds). If a multiline event has been buffered for longer, it will be flushed to the events slice for normal processing at the next batch processing ticker. This ensures a multiline message will not be buffered for a long period of time in the scenario where no further messages are sent by the application for an extended period of time (the maximum amount of time a message can be buffered is 2 * batch publishing frequency or 10 seconds)
- How to verify it
Assuming you have AWS access key ID and secret access key with permissions to a pre-existing CloudWatch logs group called
Next run the examples below...
With the following container stdout:
Two CloudWatch log events will be generated:
With the following container stdout:
Three CloudWatch log events will be generated:
- Description for the changelog
Add AWS CloudWatch Logs multiline processing
- A picture of a cute animal (not mandatory but encouraged)
If we were to take this functionality, I'd much prefer to implement this in a way that works on any log driver as there is nothing driver specific about the implementation.
I also find the date/time format vs "specify any regex" kind of weird from a UX point of view.
Implementation should take into account a misbehaving application that either sends really long messages (too big to hold in a single message) or that just does not send the message delimiter.
I also expect that the regex matching will impose a significant performance cost for logging. Some benchmarks here would be helpful, and notes in the documentation on this.
@cpuguy83 - I would tend to agree with a more generic approach, however as per #22920 this proposal was rejected and advice was for individual logging drivers to implement this functionality themselves.
I would suggest there is a reasonably strong argument for this in the AWS CloudWatch logs driver, given the native agent supports multiline processing and the date time format processing I have implemented.
The date/time format reflects the native CloudWatch Logs agent implementation (see
The implementation leverages the existing implementation in terms of dealing with log messages that are too long (CloudWatch logs have published limits and the existing implementation adheres to these limitations), and specifically detects if a buffered message has exceeded these limits and hands off processing to the existing implementation. The implementation also will only buffer for the batch publishing frequency (currently 5 seconds as per the existing implementation), so I think all questions around unusual log event processing are covered.
WRT benchmarking, agree there may be a performance hit and will benchmark to quantify this. Ultimately this is an opt-in feature and has been designed such that the same existing processing logic is applied if the multiline feature has not been activated, but will be useful to provide guidance on when this feature may introduce performance issues.
@cpuguy83 - I have added some basic benchmarks, they just simulate pushing 10 multiline logs with 100 lines per multiline log, with a reasonably real-world multiline pattern of
Master branch benchmark (see https://github.com/mixja/docker/tree/awslogs-benchmark):
Pull request benchmark:
So yes as would be expected there is a performance hit with multiline pattern matching, but no performance hit with the new code base if you don't enable multiline pattern matching.
@vdemeester - can we re-run the windows tests - looks like an unrelated failure in the Jenkins job output:
Hey @mixja, I just became aware of this PR (not sure how I missed it for so long) and want to thank you for contributing this! I'll take a look at the code today (since I wrote the original code) and I'll also pass this along internally to the Amazon CloudWatch Logs team to see if they have any comments on it.
I just tested it out, and it works well, here are my test files incase anyone else wants to try it out.
I created a
May 17, 2017
6 checks passed
Thank you @mixja! Given that this adds new features to the AWS logging driver, can you also open a pull request in the documentation repository (probably the