otelsarama: Fix WrapAsyncProducer race condition #881

pellared · 2021-07-12T10:21:31Z

Why

I have spotted that otelsarama tests are sometimes failing because not all spans are recorded in oteltest.SpanRecorder.

Before the fix:

$ go test -race -count=1000
--- FAIL: TestWrapAsyncProducer (0.00s)
    --- FAIL: TestWrapAsyncProducer/with_successes_config (0.00s)
        producer_test.go:289: 
                Error Trace:    producer_test.go:289
                Error:          Not equal: 
                                expected: trace.SpanID{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2}
                                actual  : trace.SpanID{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,3 +1,3 @@
                                 (trace.SpanID) (len=8) {
                                - 00000000  00 00 00 00 00 00 00 02                           |........|
                                + 00000000  00 00 00 00 00 00 00 00                           |........|
                                 }
                Test:           TestWrapAsyncProducer/with_successes_config
        producer_test.go:293: 
                Error Trace:    producer_test.go:293
                Error:          Not equal: 
                                expected: attribute.Value{vtype:4, numeric:0x0, stringly:"1", array:interface {}(nil)}
                                actual  : attribute.Value{vtype:4, numeric:0x0, stringly:"2", array:interface {}(nil)}
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -3,3 +3,3 @@
                                  numeric: (uint64) 0,
                                - stringly: (string) (len=1) "1",
                                + stringly: (string) (len=1) "2",
                                  array: (interface {}) <nil>
                Test:           TestWrapAsyncProducer/with_successes_config
                Messages:       messaging.message_id
        producer_test.go:297: 
                Error Trace:    producer_test.go:297
                Error:          Not equal: 
                                expected: int(0)
                                actual  : trace.SpanID(trace.SpanID{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3})
                Test:           TestWrapAsyncProducer/with_successes_config
        producer_test.go:289: 
                Error Trace:    producer_test.go:289
                Error:          Not equal: 
                                expected: trace.SpanID{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
                                actual  : trace.SpanID{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2}
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,3 +1,3 @@
                                 (trace.SpanID) (len=8) {
                                - 00000000  00 00 00 00 00 00 00 00                           |........|
                                + 00000000  00 00 00 00 00 00 00 02                           |........|
                                 }
                Test:           TestWrapAsyncProducer/with_successes_config
        producer_test.go:293: 
                Error Trace:    producer_test.go:293
                Error:          Not equal: 
                                expected: attribute.Value{vtype:4, numeric:0x0, stringly:"2", array:interface {}(nil)}
                                actual  : attribute.Value{vtype:0, numeric:0x0, stringly:"", array:interface {}(nil)}
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,5 +1,5 @@
                                 (attribute.Value) {
                                - vtype: (attribute.Type) 4,
                                + vtype: (attribute.Type) 0,
                                  numeric: (uint64) 0,
                                - stringly: (string) (len=1) "2",
                                + stringly: (string) "",
                                  array: (interface {}) <nil>
                Test:           TestWrapAsyncProducer/with_successes_config
                Messages:       messaging.message_id
        producer_test.go:293: 
                Error Trace:    producer_test.go:293
                Error:          Not equal: 
                                expected: attribute.Value{vtype:2, numeric:0x0, stringly:"", array:interface {}(nil)}
                                actual  : attribute.Value{vtype:0, numeric:0x0, stringly:"", array:interface {}(nil)}
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,3 +1,3 @@
                                 (attribute.Value) {
                                - vtype: (attribute.Type) 2,
                                + vtype: (attribute.Type) 0,
                                  numeric: (uint64) 0,
                Test:           TestWrapAsyncProducer/with_successes_config
                Messages:       messaging.kafka.partition
--- FAIL: TestWrapAsyncProducer_DrainsSuccessesAndErrorsChannels (0.00s)
    producer_test.go:389: 
                Error Trace:    producer_test.go:389
                Error:          "[%!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6360 {[0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 2] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684140728220 1067214001 0x98dc20} {13849452684140757820 1067243601 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 1 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6360 {[0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 7] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684140828420 1067314101 0x98dc20} {13849452684140909420 1067395101 0x98dc20} 1 test map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 0 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6360 {[0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 3] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684140764220 1067250001 0x98dc20} {13849452684140920220 1067405901 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 2 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6360 {[0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 4] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684140786620 1067272301 0x98dc20} {13849452684140930020 1067415701 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 3 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6360 {[0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 5] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684140801020 1067286701 0x98dc20} {13849452684140937120 1067422801 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 4 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6360 {[0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 6] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684140811420 1067297101 0x98dc20} {13849452684140944320 1067430101 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 5 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6360 {[0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 8] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684140839120 1067324801 0x98dc20} {13849452684140965820 1067451501 0x98dc20} 1 test map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 0 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4})]" should have 8 item(s), but has 7
                Test:           TestWrapAsyncProducer_DrainsSuccessesAndErrorsChannels
                Messages:       should record all spans
--- FAIL: TestWrapAsyncProducer_DrainsSuccessesAndErrorsChannels (0.00s)
    producer_test.go:389: 
                Error Trace:    producer_test.go:389
                Error:          "[%!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6780 {[0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 2] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684576586644 1429330501 0x98dc20} {13849452684576611244 1429355101 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 1 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6780 {[0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 7] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684576808244 1429552101 0x98dc20} {13849452684576877244 1429621101 0x98dc20} 1 test map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 0 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6780 {[0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 8] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684576821244 1429565201 0x98dc20} {13849452684576915544 1429659501 0x98dc20} 1 test map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 0 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6780 {[0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 3] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684576624644 1429368501 0x98dc20} {13849452684576957844 1429701701 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 2 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6780 {[0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 4] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684576695544 1429439401 0x98dc20} {13849452684576967144 1429711101 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 3 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6780 {[0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 5] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684576723444 1429467301 0x98dc20} {13849452684576989644 1429733601 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 4 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4}) %!s(*oteltest.Span=&{{{0 0} 0 0 0 0} 0xc0004a6780 {[0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 6] 0 {[]} false} [0 0 0 0 0 0 0 0] true kafka.produce {13849452684576774344 1429518301 0x98dc20} {13849452684576998144 1429742001 0x98dc20} 0  map[messaging.destination:{4 0 test-topic <nil>} messaging.destination_kind:{4 0 topic <nil>} messaging.kafka.partition:{2 0  <nil>} messaging.message_id:{4 0 5 <nil>} messaging.system:{4 0 kafka <nil>}] [] [] 4})]" should have 8 item(s), but has 7
                Test:           TestWrapAsyncProducer_DrainsSuccessesAndErrorsChannels
                Messages:       should record all spans
--- FAIL: TestWrapAsyncProducerError (0.00s)
    producer_test.go:325: 
                Error Trace:    producer_test.go:325
                Error:          Not equal: 
                                expected: string("test metadata")
                                actual  : trace.SpanID(trace.SpanID{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2})
                Test:           TestWrapAsyncProducerError
                Messages:       should preseve metadata
    producer_test.go:330: 
                Error Trace:    producer_test.go:330
                Error:          "[]" should have 1 item(s), but has 0
                Test:           TestWrapAsyncProducerError
panic: runtime error: index out of range [0] with length 0 [recovered]
        panic: runtime error: index out of range [0] with length 0

goroutine 20163 [running]:
testing.tRunner.func1.2(0x77bc40, 0xc0004c4738)
        /usr/local/go/src/testing/testing.go:1143 +0x49f
testing.tRunner.func1(0xc0006d3c80)
        /usr/local/go/src/testing/testing.go:1146 +0x695
panic(0x77bc40, 0xc0004c4738)
        /usr/local/go/src/runtime/panic.go:971 +0x499
go.opentelemetry.io/contrib/instrumentation/github.com/Shopify/sarama/otelsarama.TestWrapAsyncProducerError(0xc0006d3c80)
        /home/rpajak/repos/opentelemetry-go-contrib/instrumentation/github.com/Shopify/sarama/otelsarama/producer_test.go:332 +0x9c5
testing.tRunner(0xc0006d3c80, 0x7b4630)
        /usr/local/go/src/testing/testing.go:1193 +0x203
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:1238 +0x5d8
exit status 2
FAIL    go.opentelemetry.io/contrib/instrumentation/github.com/Shopify/sarama/otelsarama        1.925s

The bug was introduced here: #754

What

Properly synchronize the span's metadata cache (remember the value BEFORE sending to input channel)

After the fix:

$ go test -race -count=1000
PASS
ok      go.opentelemetry.io/contrib/instrumentation/github.com/Shopify/sarama/otelsarama        15.329s

PS. Hopefully, this is the last bug related to concurrency in otelsarama.

codecov · 2021-07-12T10:33:04Z

Codecov Report

Merging #881 (cabb771) into main (cfed0ec) will not change coverage.
The diff coverage is 100.0%.

@@          Coverage Diff          @@
##            main    #881   +/-   ##
=====================================
  Coverage   79.0%   79.0%           
=====================================
  Files         62      62           
  Lines       2731    2731           
=====================================
  Hits        2159    2159           
  Misses       440     440           
  Partials     132     132

Impacted Files	Coverage Δ
...n/github.com/Shopify/sarama/otelsarama/producer.go	`95.7% <100.0%> (ø)`

otelsarama: Fix WrapAsyncProducer race condition

bf0f005

pellared requested review from Aneurysm9, dashpole, evantorrie, jmacd, MadVikingGod, MrAlias, paivagustavo and XSAM as code owners July 12, 2021 10:21

Update producer.go

cabb771

Aneurysm9 approved these changes Jul 15, 2021

View reviewed changes

MrAlias approved these changes Jul 15, 2021

View reviewed changes

MrAlias merged commit 4c90cea into open-telemetry:main Jul 15, 2021

pellared deleted the otelsarama-race-condition branch July 15, 2021 21:01

Aneurysm9 mentioned this pull request Jul 26, 2021

Pre release v0.22.0 #928

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

otelsarama: Fix WrapAsyncProducer race condition #881

otelsarama: Fix WrapAsyncProducer race condition #881

pellared commented Jul 12, 2021 •

edited

codecov bot commented Jul 12, 2021 •

edited

otelsarama: Fix WrapAsyncProducer race condition #881

otelsarama: Fix WrapAsyncProducer race condition #881

Conversation

pellared commented Jul 12, 2021 • edited

Why

What

codecov bot commented Jul 12, 2021 • edited

Codecov Report

pellared commented Jul 12, 2021 •

edited

codecov bot commented Jul 12, 2021 •

edited