Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race in diode.Close with waiter #481

Merged
merged 1 commit into from Sep 15, 2022
Merged

Avoid race in diode.Close with waiter #481

merged 1 commit into from Sep 15, 2022

Conversation

ddn0
Copy link
Contributor

@ddn0 ddn0 commented Sep 15, 2022

In about 5% of instances of

go test -count 100000 -run TestClose -timeout 30s .

the test ends in a deadlock:

panic: test timed out after 30s

goroutine 6465 [running]:
testing.(*M).startAlarm.func1()
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:2029 +0x8e
created by time.goFunc
        /home/ddn/.conda/envs/katana-dev/go/src/time/sleep.go:176 +0x32

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000636000, {0x54a7e4?, 0x4bba45?}, 0x555b78)
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1487 +0x37a
testing.runTests.func1(0xc000636000?)
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1839 +0x6e
testing.tRunner(0xc000636000, 0xc000069cd8)
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1439 +0x102
testing.runTests(0xc0000a63c0?, {0x63b0e0, 0x2, 0x2}, {0x7fb6ff136108?, 0x40?, 0x643c00?})
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1837 +0x457
testing.(*M).Run(0xc0000a63c0)
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1719 +0x5d9
main.main()
        _testmain.go:55 +0x1aa

goroutine 27226 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00063a090, 0x0)
        /home/ddn/.conda/envs/katana-dev/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0xc00063a080?)
        /home/ddn/.conda/envs/katana-dev/go/src/sync/cond.go:56 +0x8c
github.com/rs/zerolog/diode/internal/diodes.(*Waiter).Next(0xc0004ee090)
        /home/ddn/w/zerolog/diode/internal/diodes/waiter.go:69 +0x9b
github.com/rs/zerolog/diode.Writer.poll({{0x57a600, 0xc0004ee060}, {0x57ae68, 0xc0004ee090}, 0xc000626040, 0xc000602120})
        /home/ddn/w/zerolog/diode/diode.go:96 +0x83
created by github.com/rs/zerolog/diode.NewWriter
        /home/ddn/w/zerolog/diode/diode.go:70 +0x36f

goroutine 27224 [chan receive]:
github.com/rs/zerolog/diode.Writer.Close({{0x57a600, 0xc0004ee060}, {0x57ae68, 0xc0004ee090}, 0xc000626040, 0xc000602120})
        /home/ddn/w/zerolog/diode/diode.go:86 +0x4a
github.com/rs/zerolog/diode_test.TestClose(0xc0006e6a00?)
        /home/ddn/w/zerolog/diode/diode_test.go:38 +0x1eb
testing.tRunner(0xc0006361a0, 0x555b78)
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
        /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1486 +0x35f
FAIL	github.com/rs/zerolog/diode	30.008s

This is due to a missed wakeup on Close. Avoid this by holding a mutex on Broadcast, which is not necessary in general but is necessary here.

In about 5% of instances of

    go test -count 100000 -run TestClose -timeout 30s .

the test ends in a deadlock:

    panic: test timed out after 30s

    goroutine 6465 [running]:
    testing.(*M).startAlarm.func1()
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:2029 +0x8e
    created by time.goFunc
            /home/ddn/.conda/envs/katana-dev/go/src/time/sleep.go:176 +0x32

    goroutine 1 [chan receive]:
    testing.(*T).Run(0xc000636000, {0x54a7e4?, 0x4bba45?}, 0x555b78)
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1487 +0x37a
    testing.runTests.func1(0xc000636000?)
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1839 +0x6e
    testing.tRunner(0xc000636000, 0xc000069cd8)
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1439 +0x102
    testing.runTests(0xc0000a63c0?, {0x63b0e0, 0x2, 0x2}, {0x7fb6ff136108?, 0x40?, 0x643c00?})
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1837 +0x457
    testing.(*M).Run(0xc0000a63c0)
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1719 +0x5d9
    main.main()
            _testmain.go:55 +0x1aa

    goroutine 27226 [sync.Cond.Wait]:
    sync.runtime_notifyListWait(0xc00063a090, 0x0)
            /home/ddn/.conda/envs/katana-dev/go/src/runtime/sema.go:513 +0x13d
    sync.(*Cond).Wait(0xc00063a080?)
            /home/ddn/.conda/envs/katana-dev/go/src/sync/cond.go:56 +0x8c
    github.com/rs/zerolog/diode/internal/diodes.(*Waiter).Next(0xc0004ee090)
            /home/ddn/w/zerolog/diode/internal/diodes/waiter.go:69 +0x9b
    github.com/rs/zerolog/diode.Writer.poll({{0x57a600, 0xc0004ee060}, {0x57ae68, 0xc0004ee090}, 0xc000626040, 0xc000602120})
            /home/ddn/w/zerolog/diode/diode.go:96 +0x83
    created by github.com/rs/zerolog/diode.NewWriter
            /home/ddn/w/zerolog/diode/diode.go:70 +0x36f

    goroutine 27224 [chan receive]:
    github.com/rs/zerolog/diode.Writer.Close({{0x57a600, 0xc0004ee060}, {0x57ae68, 0xc0004ee090}, 0xc000626040, 0xc000602120})
            /home/ddn/w/zerolog/diode/diode.go:86 +0x4a
    github.com/rs/zerolog/diode_test.TestClose(0xc0006e6a00?)
            /home/ddn/w/zerolog/diode/diode_test.go:38 +0x1eb
    testing.tRunner(0xc0006361a0, 0x555b78)
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1439 +0x102
    created by testing.(*T).Run
            /home/ddn/.conda/envs/katana-dev/go/src/testing/testing.go:1486 +0x35f
    FAIL	github.com/rs/zerolog/diode	30.008s

This is due to a missed wakeup on Close. Avoid this by holding a mutex
on Broadcast, which is not necessary in general but is necessary here.
@ddn0
Copy link
Contributor Author

ddn0 commented Sep 15, 2022

I tried to use the existing test TestNewWriter rather than introducing TestClose but it is flaky under frequent execution.

I get errors like

--- FAIL: TestNewWriter (0.00s)
    diode_test.go:29: Diode New Writer Test failed. got:, want:{"level":"debug","message":"test"}
        !

To highlight the specific bug that led me to this fix, I decided to write a separate unit test.

@rs rs merged commit 315967f into rs:master Sep 15, 2022
@ddn0 ddn0 deleted the fix/close branch September 16, 2022 01:25
pablitoc pushed a commit to nxcr-org/zerolog that referenced this pull request Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants