Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon/logger: read the length header correctly #43043

Merged
merged 2 commits into from
Dec 23, 2021
Merged

Conversation

kzys
Copy link
Member

@kzys kzys commented Nov 22, 2021

Before this change, if Decode() couldn't read a log record fully,
the subsequent invocation of Decode() would read the record's non-header part
as a header and cause a huge heap allocation.

This change prevents such a case by having the intermediate buffer in
the decoder struct.

Fixes #42125.

Signed-off-by: Kazuyoshi Kato katokazu@amazon.com

- What I did

Changing decoder struct to fix #42125.

- How I did it

Changing decoder struct to keep read-but-unused bytes.

- How to verify it

Run docker logs -f $(docker run --rm -d --log-driver local alpine cat /dev/urandom) > /dev/null. It kills dockerd after a few minutes before the change.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Havam

(from https://www.zooborns.com/zooborns/2021/05/two-very-different-babies-emerge-at-woodland-park-zoo.html)

@kzys
Copy link
Member Author

kzys commented Nov 22, 2021

A few tests are failing on Jenkins' Windows builder. Not really sure it is related to my change.

[2021-11-22T17:57:41.327Z] === Failed
[2021-11-22T17:57:41.327Z] === FAIL: github.com/docker/docker/pkg/discovery/file Test/TestWatch (0.01s)
[2021-11-22T17:57:41.327Z]     file_test.go:112: assertion failed: expression is false: <-errCh == nil
[2021-11-22T17:57:41.327Z]     --- FAIL: Test/TestWatch (0.01s)
[2021-11-22T17:57:41.327Z] 
[2021-11-22T17:57:41.327Z] === FAIL: github.com/docker/docker/pkg/discovery/file Test (0.01s)
[2021-11-22T17:57:41.327Z] 
[2021-11-22T17:57:41.327Z] DONE 2075 tests, 70 skipped, 2 failures in 298.601s

Comment on lines 148 to 163
if len(d.buf) < msgLen+encodeBinaryLen {
d.buf = make([]byte, msgLen+encodeBinaryLen)
} else {
d.buf = d.buf[:msgLen+encodeBinaryLen]
if msgLen <= initialBufSize {
d.buf = d.buf[:initialBufSize]
} else {
d.buf = d.buf[:msgLen+encodeBinaryLen]
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This buffer resizing logic could move up into readRecord. We could then additionally make it so that code reading d.buf explicitly tries to read d.buf[:size] rather than the entire buffer, meaning we don't have to reslice d.buf making it smaller for small records and then having to re-grow it later (e.g. we can conserve the single buffer and reuse it for the life of the decoder, only growing it once necessary). (The downside of this is that a single large message means we grow the buffer up to that maximum size and may not ever need it that big again and we're preventing Go from garbage-collecting that unused portion.)

@cpuguy83 What do you think about that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the log record have its upper limit in terms of the size? Bit worried about the downside you've mentioned.

}

msg := protoToMessage(proto)
if msg.PLogMetaData == nil {
msg.Line = append(msg.Line, '\n')
}

d.msgLen = 0
Copy link

@apollo13 apollo13 Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to move this above the Unmarshal call? After all the data now has been read properly and if Unmarshal were to throw an error the next Decode will try to decode the read message again and will keep throw an error that unmarshalling fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will do.

@kzys kzys force-pushed the decode-3 branch 2 times, most recently from 795db5d to e24ea26 Compare November 23, 2021 17:26
@thaJeztah
Copy link
Member

Looks like there's a compile failure;


[2021-11-23T17:35:48.032Z] _=/usr/bin/env
[2021-11-23T17:35:48.032Z] Building test suite binary /go/src/github.com/docker/docker/integration/build/test.main
[2021-11-23T17:37:24.547Z] # github.com/docker/docker/daemon/logger/local
[2021-11-23T17:37:24.547Z] daemon/logger/local/read.go:204:12: undefined: proto
[2021-11-23T17:37:24.547Z] # github.com/docker/docker/daemon/logger/local
[2021-11-23T17:37:24.547Z] daemon/logger/local/read.go:204:12: undefined: proto
[2021-11-23T17:37:56.698Z] Building test suite binary /go/src/github.com/docker/docker/integration/config/test.main
[2021-11-23T17:37:56.959Z] Building test suite binary /go/src/github.com/docker/docker/integration/container/test.main
[2021-11-23T17:38:11.866Z] # github.com/docker/docker/daemon/logger/local
[2021-11-23T17:38:11.866Z] daemon/logger/local/read.go:204:12: undefined: proto

@kzys
Copy link
Member Author

kzys commented Nov 23, 2021

@thaJeztah Oh sorry. It should be fixed in the last revision.

@kzys
Copy link
Member Author

kzys commented Nov 24, 2021

One Windows build is failing.

[2021-11-23T22:21:29.904Z] === Failed
[2021-11-23T22:21:29.904Z] === FAIL: github.com/docker/docker/daemon/logger/jsonfilelog TestJSONFileLoggerWithOpts (0.04s)
[2021-11-23T22:21:29.904Z]     jsonfilelog_test.go:187: open C:\Users\ContainerAdministrator\AppData\Local\Temp\docker-logger-1992056636\container.log.1: The process cannot access the file because it is being used by another process.
[2021-11-23T22:21:29.904Z] 
[2021-11-23T22:21:29.904Z] DONE 2075 tests, 68 skipped, 1 failure in 251.343s

Others are all green now!

@samuelkarp
Copy link
Member

I don't see an open issue for a flake in TestJSONFileLoggerWithOpts so I've kicked off a rerun to see if that fixes it.

@samuelkarp
Copy link
Member

It looks like there are failures related to reading logs in the tests:

[2021-11-24T19:29:54.941Z] === Failed
[2021-11-24T19:29:54.941Z] === FAIL: amd64.integration-cli TestDockerSuite/TestLogsAPIUntilFutureFollow (20.72s)
[2021-11-24T19:29:54.941Z]     docker_api_logs_test.go:155: timeout waiting for logs to exit
[2021-11-24T19:29:54.941Z]     --- FAIL: TestDockerSuite/TestLogsAPIUntilFutureFollow (20.72s)
[2021-11-24T19:29:54.941Z] 
[2021-11-24T19:29:54.941Z] === FAIL: amd64.integration-cli TestDockerSuite/TestLogsFollowSlowStdoutConsumer (1.39s)
[2021-11-24T19:29:54.941Z]     docker_cli_logs_test.go:246: assertion failed: 0 (actual int) != 150000 (expected int)
[2021-11-24T19:29:54.941Z]     --- FAIL: TestDockerSuite/TestLogsFollowSlowStdoutConsumer (1.39s)
[2021-11-24T19:29:54.941Z] 
[2021-11-24T19:29:54.941Z] === FAIL: amd64.integration-cli TestDockerSuite/TestLogsSinceFutureFollow (5.45s)
[2021-11-24T19:29:54.941Z]     docker_cli_logs_test.go:204: assertion failed: len(out) is 0: cannot read from empty log
[2021-11-24T19:29:54.941Z]     --- FAIL: TestDockerSuite/TestLogsSinceFutureFollow (5.45s)

@thaJeztah
Copy link
Member

I don't see an open issue for a flake in TestJSONFileLoggerWithOpts so I've kicked off a rerun to see if that fixes it.

I know it was flaky before (similar errors), but there's were some fixes merged for that (could be its flaky again though); see #36801

@thaJeztah
Copy link
Member

(But it looks green now!)

Comment on lines +153 to +173
if d.rdr == rdr {
return
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are failures related to reading logs in the tests:

Before I simply removed dec.Reset() on logfile.go's fsnotify.Write case and that broke the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test did it broke?
Maybe there should be two methods? (dec.FileTruncated() and dec.Reset())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ones @samuelkarp mentioned above were broken.

[2021-11-24T19:29:54.941Z] === Failed
[2021-11-24T19:29:54.941Z] === FAIL: amd64.integration-cli TestDockerSuite/TestLogsAPIUntilFutureFollow (20.72s)
[2021-11-24T19:29:54.941Z]     docker_api_logs_test.go:155: timeout waiting for logs to exit
[2021-11-24T19:29:54.941Z]     --- FAIL: TestDockerSuite/TestLogsAPIUntilFutureFollow (20.72s)
[2021-11-24T19:29:54.941Z] 
[2021-11-24T19:29:54.941Z] === FAIL: amd64.integration-cli TestDockerSuite/TestLogsFollowSlowStdoutConsumer (1.39s)
[2021-11-24T19:29:54.941Z]     docker_cli_logs_test.go:246: assertion failed: 0 (actual int) != 150000 (expected int)
[2021-11-24T19:29:54.941Z]     --- FAIL: TestDockerSuite/TestLogsFollowSlowStdoutConsumer (1.39s)
[2021-11-24T19:29:54.941Z] 
[2021-11-24T19:29:54.941Z] === FAIL: amd64.integration-cli TestDockerSuite/TestLogsSinceFutureFollow (5.45s)
[2021-11-24T19:29:54.941Z]     docker_cli_logs_test.go:204: assertion failed: len(out) is 0: cannot read from empty log
[2021-11-24T19:29:54.941Z]     --- FAIL: TestDockerSuite/TestLogsSinceFutureFollow (5.45s)

I prefer to keep Decoder's interface as is. It takes io.Reader and doesn't have to know the source of the reader. So mentioning files there doesn't match with the rest of the interface.

@thaJeztah thaJeztah added this to the 21.xx milestone Dec 2, 2021
@kzys
Copy link
Member Author

kzys commented Dec 8, 2021

(But it looks green now!)

Yes. Removing dec.Reset() was wrong. Now the code should be correct!

Copy link
Contributor

@fredericdalleau fredericdalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a try and at first I was not seeing it crash. It turned out, with enough ram, the issue may not be reproducible. There is no garantee here, and I had to reboot after a few tries. You can see in top output that dockerd is bigger.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1406 root      20   0   20.4g   9.5g  38868 S 202.3  61.1   0:16.56 dockerd

With the patch however, the memory is not exceeded:

2321 root      20   0 1864048  79484  39156 R 195.7   0.5   1:12.73 dockerd                                                                                

My understanding is that in some situations io.Readfull return (0, io.EOF), resulting in the upper layer calling Decode() again with what's following.
I'd have expected that io.Readfull do not return until the requested amount of data is read. Maybe the intent of the writer(@cpuguy83 ?) was that too. If this function is running in it's own goroutine, could it block there just waiting for next data instead?

With a retry (if err == io.EOF { continue }), I was not seeing the crash, but I can't tell if the code is spinning or blocking.

@@ -715,6 +715,7 @@ func followLogs(f *os.File, logWatcher *logger.LogWatcher, notifyRotate, notifyE
defer func() { oldSize = size }()
if size < oldSize { // truncated
f.Seek(0, 0)
dec.Reset(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you ensure that this would not be causing issues on other log drivers?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't any state that a decoder might have at this point broken anyways and a Reset the only sensible thing to do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the tests are ensuring it has no regressions. At least it caught my mistake correctly on the earlier revisions.

Regrading the Decoder interface, is it implemented by only

type decoder struct {
rdr io.Reader
dec *json.Decoder
jl *jsonlog.JSONLog
maxRetry int
}
and
type decoder struct {
rdr io.Reader
proto *logdriver.LogEntry
buf []byte
}
?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the right thing to do.
Can you think of a test that we can add that will fail before this change?

len, err := d.decodeSizeHeader()
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the value should be sanity checked here to avoid excessive allocation. len is read from a 32bits value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decodeSizeHeader is internally calling binary.BigEndian.Uint32(). So it wouldn't be larger than Unit32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's enough for the current issue to exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. As you mentioned, it is ultimately depends on the host's memory situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to check size and error than alloc something that is clearly wrong.
There is a maximum for each message (raw log message max is 1MB, IIRC, not sure what that would be in proto+metadata). We could check something like 512MB, which is way to much to bother allocating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know the 1MB cap. Is it defined by the files under docker/daemon/logger?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's even smaller,

defaultBufSize = 16 * 1024

Log entries are broken up at newline or 16K.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a line could be longer than 16K. Isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, 16K is the max and anything longer is broken up into multiple entries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual proto we record may be larger than 16K, of course, but that has a soft max just by virtue of the raw log message being capped.

@kzys
Copy link
Member Author

kzys commented Dec 8, 2021

I gave it a try and at first I was not seeing it crash. It turned out, with enough ram, the issue may not be reproducible.

Yes. It is not a leak technically. Docker Engine allocates an unnecessary large block due to the serialization issue (interpreting the non-size part of a message as its size). This would be freed by Go's GC later, but due to golang/go@05e6d28, RSS may not reflect the fact if your Go version is < 1.16.

len, err := d.decodeSizeHeader()
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's enough for the current issue to exist.

)

func (d *decoder) readRecord(size int) error {
var err error
for i := 0; i < maxDecodeRetry; i++ {
Copy link
Contributor

@fredericdalleau fredericdalleau Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this loop could be removed? On unexpected EOF error io.Readfull would be looping maxDecodeRetry times until data is available. If there is no data, the code needs to go back to waiting for more data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be. The retry is there from the very beginning (#37092) though. @cpuguy83 What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's just fix the original bug and make such changes separately?
This makes it easier to back port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

Comment on lines +153 to +173
if d.rdr == rdr {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test did it broke?
Maybe there should be two methods? (dec.FileTruncated() and dec.Reset())

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. Comments are mostly on organization.

}

msgLen := int(binary.BigEndian.Uint32(d.buf[:encodeBinaryLen]))
if len(d.buf) < msgLen+encodeBinaryLen {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in decodeLogEntry decode logic since we aren't just decoding the header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant the buf resizing logic below shouldn't be in decodeSizeHeader? I can agree.

	if len(d.buf) < msgLen+encodeBinaryLen {
		d.buf = make([]byte, msgLen+encodeBinaryLen)
	} else {
		if msgLen <= initialBufSize {
			d.buf = d.buf[:initialBufSize]
		} else {
			d.buf = d.buf[:msgLen+encodeBinaryLen]
		}
	}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting the if block in Decode() instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

for i := 0; i < maxDecodeRetry; i++ {
var n int
n, err = io.ReadFull(d.rdr, d.buf[read:encodeBinaryLen])
n, err = io.ReadFull(d.rdr, d.buf[d.offset:size])
d.offset += n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we are only using offset inside this function to handle ErrUnexpectedEOF. Do we need to store it on the object at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. readRecord() have no guarantees about where it ends. It would be the middle of the log message. So the decoder has to keep the fact and read the remaining in that case.

@@ -715,6 +715,7 @@ func followLogs(f *os.File, logWatcher *logger.LogWatcher, notifyRotate, notifyE
defer func() { oldSize = size }()
if size < oldSize { // truncated
f.Seek(0, 0)
dec.Reset(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the right thing to do.
Can you think of a test that we can add that will fail before this change?

@kzys
Copy link
Member Author

kzys commented Dec 9, 2021

Let me think about testing. Luckily followLogs is not tied to any structs. We may be able to test that somehow...

@kzys
Copy link
Member Author

kzys commented Dec 9, 2021

@cpuguy83 Does 3cf3929 the second commit work as the test for followLogs' change? Counting the number of the invocations is not ideal though.

Before this change, if Decode() couldn't read a log record fully,
the subsequent invocation of Decode() would read the record's non-header part
as a header and cause a huge heap allocation.

This change prevents such a case by having the intermediate buffer in
the decoder struct.

Fixes moby#42125.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
@kzys
Copy link
Member Author

kzys commented Dec 16, 2021

@cpuguy83 @thaJeztah The tests are all green now. Can you take a look?

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

}

func testDecode(t *testing.T, buf []byte, split int) {
fw, err := ioutil.TempFile("", t.Name())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On master, we already switched uses of ioutil for os, but we can update that in a follow-up (perhaps keeping it to use ioutil may make it easier to backport if we want to)

func (d *lineDecoder) Close() {
}

func TestFollowLogsHandleDecodeErr(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this test is flaky (seen two failures today 😅 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, let me take a look...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked closely yet myself, but thought I'd at least leave a comment (thanks!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM when following "local" logs of high-log-output container
6 participants