Add awslogs multiline support #30891

mixja · 2017-02-10T01:44:13Z

Signed-off-by: Justin Menga justin.menga@gmail.com

- What I did

This PR adds multiline processing to the AWS CloudWatch logs driver, providing functionality similar to the native AWS CloudWatch logs agent.

The feature allows users to optionally specify multiline pattern matching logic using either a regular expression or a Python strftime expression, meaning application events that span multiple lines (e.g. application stack traces) can be published to CloudWatch logs as a single event.

The awslogs-multiline-pattern and awslogs-datetime-format log options have been added:

awslogs-datetime-format - defines a multiline start pattern in Python strftime format. This option always takes precedence if both awslogs-datetime-format and awslogs-multiline-pattern are configured.
awslogs-multiline-pattern - defines a multiline start pattern using a regular expression. This option is ignored if awslogs-datetime-format is also configured.

- How I did it

Implemented an event buffer that buffers log messages until a new multiline start pattern is matched. As soon as a new event is detected, the event buffer is appended/flushed to the pre-existing events slice for normal processing using the pre-existing batch publishing mechanism.

The implementation will immediately flush the event buffer if the maximum CloudWatch logs event size is reached, appending the buffer to the events slice up to the maximum event size.

The implementation also will only buffer up to the batch publishing frequency (currently 5 seconds). If a multiline event has been buffered for longer, it will be flushed to the events slice for normal processing at the next batch processing ticker. This ensures a multiline message will not be buffered for a long period of time in the scenario where no further messages are sent by the application for an extended period of time (the maximum amount of time a message can be buffered is 2 * batch publishing frequency or 10 seconds)

- How to verify it

Assuming you have AWS access key ID and secret access key with permissions to a pre-existing CloudWatch logs group called test, in the Docker development container first start up the Docker daemon with AWS permissions:

# Make sure we have accurate time
apt-get install ntp -y
service ntp start

# Compile and install Docker daemon
hack/make.sh binary install-binary 

# Replace with your AWS settings - must have permissions to write to the CloudWatch logs group
export AWS_REGION=us-west-2
export AWS_ACCESS_KEY_ID=xxxxx
export AWS_SECRET_ACCESS_KEY=xxxxx

# Start Docker daemon with AWS permissions
dockerd -D&

Next run the examples below...

Example using awslogs-multline-pattern

docker run -it --rm --log-driver=awslogs --log-opt awslogs-group=test --log-opt awslogs-multiline-pattern='^ABCD' my-container

With the following container stdout:

ABCD First event
Some more lines
Another line
ABCD Second event

Two CloudWatch log events will be generated:

Event 1

ABCD First event
Some more lines
Another line

Event 2

ABCD Second event

Example using awslogs-datetime-format

docker run -it --rm --log-driver=awslogs --log-opt awslogs-group=test --log-opt awslogs-datetime-format='%Y-%m-%d' my-container

With the following container stdout:

[2017-04-01 12:00:00] First event
Some more lines
Another line
[2017-04-01 12:00:05] Second event 
[2017-04-01 12:00:06] Third event

Three CloudWatch log events will be generated:

Event 1

[2017-04-01 12:00:00] First event
Some more lines
Another line

Event 2

[2017-04-01 12:00:05] Second event

Event 3

[2017-04-01 12:00:06] Third event

- Description for the changelog

Add AWS CloudWatch Logs multiline processing

- A picture of a cute animal (not mandatory but encouraged)

cpuguy83 · 2017-02-14T19:22:31Z

If we were to take this functionality, I'd much prefer to implement this in a way that works on any log driver as there is nothing driver specific about the implementation.

I also find the date/time format vs "specify any regex" kind of weird from a UX point of view.
This should probably error out if both are specified.
Event better to implement the regex and do time separately as it really is different.

Implementation should take into account a misbehaving application that either sends really long messages (too big to hold in a single message) or that just does not send the message delimiter.

I also expect that the regex matching will impose a significant performance cost for logging. Some benchmarks here would be helpful, and notes in the documentation on this.

mixja · 2017-02-14T21:27:19Z

@cpuguy83 - I would tend to agree with a more generic approach, however as per #22920 this proposal was rejected and advice was for individual logging drivers to implement this functionality themselves.

I would suggest there is a reasonably strong argument for this in the AWS CloudWatch logs driver, given the native agent supports multiline processing and the date time format processing I have implemented.

The date/time format reflects the native CloudWatch Logs agent implementation (see datetime_format and multi_line_start_pattern parameters) - and IMHO makes it easier to express a date time pattern in strftime format from an end-users perspective. The implementation compiles this one-time to a regex anyway at container startup so there is no material difference at an implementation level.

The implementation leverages the existing implementation in terms of dealing with log messages that are too long (CloudWatch logs have published limits and the existing implementation adheres to these limitations), and specifically detects if a buffered message has exceeded these limits and hands off processing to the existing implementation. The implementation also will only buffer for the batch publishing frequency (currently 5 seconds as per the existing implementation), so I think all questions around unusual log event processing are covered.

WRT benchmarking, agree there may be a performance hit and will benchmark to quantify this. Ultimately this is an opt-in feature and has been designed such that the same existing processing logic is applied if the multiline feature has not been activated, but will be useful to provide guidance on when this feature may introduce performance issues.

danieljimenez · 2017-02-15T19:58:43Z

👏 thanks for doing this! It's a huge addition for anyone running Java applications in Docker with Amazon CloudWatch Logs. I look forward to seeing it upstream.

mixja · 2017-02-22T11:55:19Z

@cpuguy83 - I have added some basic benchmarks, they just simulate pushing 10 multiline logs with 100 lines per multiline log, with a reasonably real-world multiline pattern of \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[1,2][0-9]|3[0,1]) (?:[0,1][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]

Master branch benchmark (see https://github.com/mixja/docker/tree/awslogs-benchmark):

BenchmarkCollectBatch-4            1000        1330899 ns/op

Pull request benchmark:

BenchmarkCollectBatch-4                            1000        1322489 ns/op
BenchmarkCollectBatchMultilinePattern-4            1000        2178240 ns/op

So yes as would be expected there is a performance hit with multiline pattern matching, but no performance hit with the new code base if you don't enable multiline pattern matching.

vdemeester · 2017-03-27T19:45:37Z

ping @cpuguy83 @thaJeztah @tonistiigi @icecrime @tiborvass what is the status for this one ?

tiborvass · 2017-03-28T00:42:58Z

+1 for the design. I agree with phemmer in his comment about #22920 (comment) so I wouldn't want this to be generalized in docker, but in the awslogs driver it makes sense.

cpuguy83 · 2017-03-28T01:08:29Z

Design OK, moving to code review.

vdemeester · 2017-04-03T14:18:05Z

@mixja compilation errors

00:24:41 # github.com/docker/docker/daemon/logger/awslogs
00:24:41 daemon/logger/awslogs/cloudwatchlogs.go:146: not enough arguments to return

mixja · 2017-04-03T18:51:44Z

@tiborvass - looks like your commit broke the build?

tiborvass · 2017-04-03T20:20:26Z

@mixja thanks, fixing.

mixja · 2017-04-07T20:33:34Z

@vdemeester - can we re-run the windows tests - looks like an unrelated failure in the Jenkins job output:

Stderr:   Error response from daemon: Cannot stop container d51d40c8f721e8373cd4a8b86c00134d3b17ffededcefc21ea02f7fdc58f536e: Cannot kill container d51d40c8f721e8373cd4a8b86c00134d3b17ffededcefc21ea02f7fdc58f536e: invalid container: d51d40c8f721e8373cd4a8b86c00134d3b17ffededcefc21ea02f7fdc58f536e

kencochrane · 2017-05-16T14:54:52Z

I just tested it out, and it works well, here are my test files incase anyone else wants to try it out.

Dockerfile

FROM alpine:latest
ADD test*.sh /
RUN chmod 755 /test*.sh

test1.sh

#!/bin/ash
echo "ABCD First event"
echo "Some more lines"
echo "Another line"
echo "ABCD Second event"

test2.sh

#! /bin/ash
echo "[2017-04-01 12:00:00] First event"
echo "Some more lines"
echo "Another line"
echo "[2017-04-01 12:00:05] Second event"
echo "[2017-04-01 12:00:06] Third event"

docker run -t --rm --log-driver=awslogs --log-opt awslogs-group=test --log-opt awslogs-multiline-pattern='^ABCD' awslogtest:latest /test1.sh

docker run -it --rm --log-driver=awslogs --log-opt awslogs-group=test --log-opt awslogs-datetime-format='%Y-%m-%d' awslogtest:latest /test2.sh

I created a test log group in cloudformation, and made sure my AWS creds were exported, like it is described above.

mixja · 2017-05-17T07:01:49Z

@crosbymichael - yes I have tested both.

Attached is a screen shot of CloudWatch Logs console showing Java stack traces with the multiline support enabled.

vdemeester

LGTM 🐯
/cc @cpuguy83 @tiborvass

cpuguy83

LGTM

cpuguy83 · 2017-05-17T13:25:33Z

daemon/logger/awslogs/cloudwatchlogs.go

+// batch-calculations.
+func (l *logStream) processEvent(events []wrappedEvent, unprocessedLine []byte, timestamp int64) []wrappedEvent {
+	bytes := 0
+	for len(unprocessedLine) > 0 {


Just a nit on the style here, we could flip this and return early to reduce indentation.

cpuguy83 · 2017-05-17T13:25:58Z

daemon/logger/awslogs/cloudwatchlogs.go

+		}
+		line := unprocessedLine[:lineBytes]
+		unprocessedLine = unprocessedLine[lineBytes:]
+		if (len(events) >= maximumLogEventsPerPut) || (bytes+lineBytes+perEventBytes > maximumBytesPerPut) {


parens are not needed here

thaJeztah · 2017-05-17T13:33:53Z

Thank you @mixja! Given that this adds new features to the AWS logging driver, can you also open a pull request in the documentation repository (probably the vnext-engine branch, because this will be in the upcoming release) to document the new feature? https://github.com/docker/docker.github.io/blob/vnext-engine/engine/admin/logging/awslogs.md

kensaggy · 2017-05-18T07:29:38Z

Sorry for jumping in, but when can we expect this to be released? (we've been looking forward to this feature for a long time!)

thaJeztah · 2017-05-18T07:44:55Z

See the milestone; it's gonna be part of the upcoming 17.06 release, targeted start of June (and release candidate 1 later this week for testing)

mixja · 2017-05-18T08:59:15Z

@thaJeztah - done docker/docs#3319

thaJeztah · 2017-05-18T09:19:16Z

@mixja thanks!

kensaggy · 2017-05-19T07:02:42Z

Thanks @thaJeztah :)

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de>

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de> (cherry picked from commit 1d21a3dd7c4ffff80c093c005b910a131e7fc05a) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de>

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de> (cherry picked from commit 1d21a3dd7c4ffff80c093c005b910a131e7fc05a) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de> (cherry picked from commit 1d21a3d) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

GordonTheTurtle added the status/0-triage label Feb 10, 2017

mixja force-pushed the awslogs-multiline-support branch from 221fb60 to ba6fa44 Compare February 10, 2017 02:12

GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Feb 12, 2017

mixja force-pushed the awslogs-multiline-support branch from 7b96e58 to a525b3b Compare February 12, 2017 06:14

GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Feb 12, 2017

mixja force-pushed the awslogs-multiline-support branch 3 times, most recently from d02b268 to d30da02 Compare February 13, 2017 08:25

vdemeester added status/1-design-review and removed status/0-triage labels Feb 13, 2017

mixja force-pushed the awslogs-multiline-support branch from d30da02 to 2dd776f Compare February 22, 2017 11:42

GordonTheTurtle assigned tiborvass Feb 24, 2017

tiborvass added the status/needs-attention Calls for a collective discussion during a review session label Mar 28, 2017

cpuguy83 added status/2-code-review and removed status/1-design-review status/needs-attention Calls for a collective discussion during a review session labels Mar 28, 2017

tiborvass force-pushed the awslogs-multiline-support branch from a91b345 to 539c4ec Compare April 3, 2017 20:23

vdemeester added the rebuild/windowsRS1 label Apr 10, 2017

GordonTheTurtle removed the rebuild/windowsRS1 label Apr 10, 2017

thaJeztah added rebuild/experimental labels May 17, 2017

GordonTheTurtle removed rebuild/experimental labels May 17, 2017

vdemeester approved these changes May 17, 2017

View reviewed changes

cpuguy83 approved these changes May 17, 2017

View reviewed changes

cpuguy83 merged commit 5034288 into moby:master May 17, 2017

thaJeztah added impact/changelog impact/documentation labels May 17, 2017

thaJeztah added the docs/revisit label May 17, 2017

mixja mentioned this pull request May 18, 2017

Update awslogs.md docker/docs#3319

Merged

albers added a commit to albers/docker-cli that referenced this pull request Jul 3, 2017

Add bash completion for awslogs multiline log driver options

1d21a3d

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de>

albers mentioned this pull request Jul 3, 2017

Add bash completion for awslogs multiline log driver options docker/cli#280

Merged

alshabib pushed a commit to alshabib/cli that referenced this pull request Aug 1, 2017

Add bash completion for awslogs multiline log driver options

a60b10c

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de>

matrixik mentioned this pull request Aug 2, 2017

Getting logs from all containers into Logstash monasca/monasca-docker#139

Open

riyazdf pushed a commit to riyazdf/cli that referenced this pull request Aug 2, 2017

Add bash completion for awslogs multiline log driver options

309f445

This adds bash completion for moby/moby#30891. Signed-off-by: Harald Albers <github@albersweb.de>

samuelkarp mentioned this pull request Sep 20, 2017

Make the log lines splitting configurable #34855

Open

adnxn mentioned this pull request Jan 17, 2018

Allow newer awslogs driver options aws/amazon-ecs-agent#872

Closed

thaJeztah mentioned this pull request May 8, 2018

Incorrect %L option in awslogs-datetime-format #36796

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add awslogs multiline support #30891

Add awslogs multiline support #30891

mixja commented Feb 10, 2017 •

edited

Loading

cpuguy83 commented Feb 14, 2017

mixja commented Feb 14, 2017 •

edited

Loading

danieljimenez commented Feb 15, 2017 •

edited

Loading

mixja commented Feb 22, 2017

vdemeester commented Mar 27, 2017

tiborvass commented Mar 28, 2017

cpuguy83 commented Mar 28, 2017

vdemeester commented Apr 3, 2017

mixja commented Apr 3, 2017

tiborvass commented Apr 3, 2017

mixja commented Apr 7, 2017

kencochrane commented May 16, 2017

mixja commented May 17, 2017

vdemeester left a comment

cpuguy83 left a comment

cpuguy83 May 17, 2017

cpuguy83 May 17, 2017

thaJeztah commented May 17, 2017

kensaggy commented May 18, 2017

thaJeztah commented May 18, 2017

mixja commented May 18, 2017

thaJeztah commented May 18, 2017

kensaggy commented May 19, 2017 •

edited

Loading

Add awslogs multiline support #30891

Add awslogs multiline support #30891

Conversation

mixja commented Feb 10, 2017 • edited Loading

cpuguy83 commented Feb 14, 2017

mixja commented Feb 14, 2017 • edited Loading

danieljimenez commented Feb 15, 2017 • edited Loading

mixja commented Feb 22, 2017

vdemeester commented Mar 27, 2017

tiborvass commented Mar 28, 2017

cpuguy83 commented Mar 28, 2017

vdemeester commented Apr 3, 2017

mixja commented Apr 3, 2017

tiborvass commented Apr 3, 2017

mixja commented Apr 7, 2017

kencochrane commented May 16, 2017

mixja commented May 17, 2017

vdemeester left a comment

Choose a reason for hiding this comment

cpuguy83 left a comment

Choose a reason for hiding this comment

cpuguy83 May 17, 2017

Choose a reason for hiding this comment

cpuguy83 May 17, 2017

Choose a reason for hiding this comment

thaJeztah commented May 17, 2017

kensaggy commented May 18, 2017

thaJeztah commented May 18, 2017

mixja commented May 18, 2017

thaJeztah commented May 18, 2017

kensaggy commented May 19, 2017 • edited Loading

mixja commented Feb 10, 2017 •

edited

Loading

mixja commented Feb 14, 2017 •

edited

Loading

danieljimenez commented Feb 15, 2017 •

edited

Loading

kensaggy commented May 19, 2017 •

edited

Loading