Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix potential log record corruption #30

Merged
merged 2 commits into from
Nov 6, 2014
Merged

fix potential log record corruption #30

merged 2 commits into from
Nov 6, 2014

Conversation

inconshreveable
Copy link
Owner

So I discovered a pretty horrible bug in log15 the other day that could cause corruption of log records when a logger is used concurrently (or by using the Channel/Buffered handlers).

The bug is in the creation of log records:

Ctx:  append(l.ctx, normalize(ctx)...),

The thing about the append() function is that it feels like an operation that works on an immutable argument because it returns you a new slice. And that's true, really. The slice you pass to append is not mutated.

However, the backing array of the slice is mutated. This can cause log record context corruption in a race like this:

goroutine1:
    r := &Record{
        // ...
        Ctx: append(l.ctx, "foo", "bar")
    }

goroutine2:
    r := &Record {
        // ...
        Ctx: append(l.ctx, "baz", "quux")
    }
 goroutine1:
    // r could have wrong context values "baz"/"quux" instead of "foo"/"bar"
    l.h.Handle(r)

I say "could have wrong context values" because whether or not the record gets corrupted actually depends on the behavior of the Go runtime. If when we initially allocated the slice for l.ctx the runtime chose to allocate the slice with not enough extra capacity for our new context values, it will allocate a new backing array and copy all of the values to it. In this case, the bug is not triggered.

Amusingly, it seems like for small context lengths (<=32 on Darwin), the runtime never chooses to allocate extra capacity and so this bug both never triggers and is undetectable by the race detector. This is why the test has the magical value of 34, which was the first size I could find that would trigger the bug.

@ChrisHines please review

const goroutines = 8
var res [goroutines]int
l.SetHandler(SyncHandler(FuncHandler(func(r *Record) error {
res[r.Ctx[ctxLen+1].(int)]++
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to figure out what this line does, and more generally how this test works. Perhaps some comments explaining the strategy would be helpful.

@ChrisHines
Copy link
Collaborator

Yes, this is a subtle bug, very good catch.

The fix LGTM.

The new test takes some work to figure out. See if you think my inline comments would help improve readability. It is subjective of course, so my comments are merely suggestions.

@inconshreveable
Copy link
Owner Author

fixes look okay chris?

@ChrisHines
Copy link
Collaborator

Yes, very helpful. LGTM.

@inconshreveable inconshreveable merged commit 60f30f8 into master Nov 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants