Adding locks where context is accessed #528

keer25 · 2020-08-25T20:01:52Z

This commit addresses data race issues where span.context is accessed without locks in methods which can be called concurrently with method setBaggageItem which modifies context

Resolves issue #526

Signed-off-by: Keerthana Selvakumar keerukeerthana8@gmail.com

Which problem is this PR solving?

This commit addresses data race issues where span.context is accessed without locks in methods which can be called concurrently with method setBaggageItem which modifies context

Short description of the changes

Called locks to fix data races

codecov · 2020-08-25T21:46:59Z

Codecov Report

Merging #528 into master will increase coverage by 0.02%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##           master     #528      +/-   ##
==========================================
+ Coverage   89.28%   89.30%   +0.02%     
==========================================
  Files          61       61              
  Lines        3919     3918       -1     
==========================================
  Hits         3499     3499              
+ Misses        294      292       -2     
- Partials      126      127       +1

Impacted Files	Coverage Δ
tracer.go	`95.91% <50.00%> (ø)`
span.go	`96.99% <100.00%> (-0.02%)`	⬇️
utils/reconnecting_udp_conn.go	`90.00% <0.00%> (-2.50%)`	⬇️
internal/baggage/remote/restriction_manager.go	`100.00% <0.00%> (+4.22%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf8927b...f2804f4. Read the comment docs.

yurishkuro

@joe-elliott you might be interested in reviewing this.

yurishkuro · 2020-08-26T02:38:25Z

span.go

@@ -427,10 +427,10 @@ func (s *Span) serviceName() string {

 func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) {
 	if !decision.Retryable {
-		s.context.samplingState.setFinal()
+		s.SpanContext().samplingState.setFinal()


The presence of the lock parameter in this function indicates that it might be called while the lock on the span is already acquired by the current goroutine. What would happen in that case if we call SpanContext() that will try to acquire a read-lock? I remember in the past that in Go the RWMutex is not re-entrable.

yeah rwmutex is not re-entrable, but so far this is not called in places where the caller already obtains a lock, the method is thread-safe in itself, so the caller shouldn't need to obtain locks, added documentation for that. lock param is set to false only in the start span method where too no locks are obtained.

Personally, I think it's dangerous to ignore the lock parameter even if right now there doesn't happen to be a codepath that deadlocks. Perhaps access s.context directly here and move the if lock code to the top?

Alternatively the way it's been changed we're always locking and we've already taken the performance hit. We could just remove the lock param. Depending on benchmarks this may be an option.

Yeah that is true, I started making changes that way to apply lock while calling all these methods, but this method also calls sampler and observer callbacks like sampler.OnSetOperationName(), etc., which won't be in the same package and have to call with locks, and there is high risk of these calling locks than callers to these internal methods.

We can call locks around these call backs, the code is going to lock and unlock twice at-least in each method, one before and after sampler callbacks.

func (s *Span) setTagInternal(key string, value interface{}, lock bool) opentracing.Span { s.observer.OnSetTag(key, value) if lock { s.RLock() } if key == string(ext.SamplingPriority) && !setSamplingPriority(s, value) { return s } if !s.isSamplingFinalized() { if lock { s.RUnlock() } decision := s.tracer.sampler.OnSetTag(s, key, value) if lock { s.Lock() defer s.Unlock() } s.applySamplingDecision(decision, lock) } else { if lock { s.Lock() defer s.Unlock() } } if s.isWriteable() { s.appendTagNoLocking(key, value) } return s }

This is how some three methods in span.go would look, if this looks good we can go with this.
As a rule not a good idea to hold a lock while calling external methods (like sampler callbacks), but internal methods we can document if it is to be called with or without locks. And I believe the lock parameter was just so locks are not called when we need not bother about concurrency like when the span is started.

some benchmarks can tell which approach is better, I will add some micro benchmarks over the weekend

span.go

yurishkuro · 2020-08-26T02:43:24Z

span_test.go

@@ -363,7 +363,6 @@ func TestSpan_References(t *testing.T) {
 }

 func TestSpanContextRaces(t *testing.T) {
-	t.Skip("Skipped: test will panic with -race, see https://github.com/jaegertracing/jaeger-client-go/issues/526")


It might be useful to add a few more mutators to this test, namely those that affect samplingState and flags.

Yeah added a few, and also called span.Finish in the end, and found a few more data race conditions in the reportSpan() method and modified it

This commit addresses data race issues where span.context is accessed without locks in methods which can be called concurrently with method setBaggageItem which modifies context Per issue jaegertracing#526 Signed-off-by: Keerthana Selvakumar <keerukeerthana8@gmail.com>

joe-elliott · 2020-08-26T13:35:19Z

span.go

@@ -427,10 +427,10 @@ func (s *Span) serviceName() string {

 func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) {
 	if !decision.Retryable {
-		s.context.samplingState.setFinal()
+		s.SpanContext().samplingState.setFinal()


Personally, I think it's dangerous to ignore the lock parameter even if right now there doesn't happen to be a codepath that deadlocks. Perhaps access s.context directly here and move the if lock code to the top?

Alternatively the way it's been changed we're always locking and we've already taken the performance hit. We could just remove the lock param. Depending on benchmarks this may be an option.

joe-elliott · 2020-08-26T13:39:16Z

span.go

@@ -120,8 +123,8 @@ func (s *Span) setTagInternal(key string, value interface{}, lock bool) opentrac

 // SpanContext returns span context
 func (s *Span) SpanContext() SpanContext {
-	s.Lock()
-	defer s.Unlock()
+	s.RLock()


There are a number of other places that hold .Lock() when only .RLock() is needed. Seeing .StartTime(), .Duration(), .Tags() for instance. Probably not in the scope of this fix though.

joe-elliott · 2020-08-26T14:06:53Z

span_test.go

+	})
+	go accessor(func() {
+		span.SpanContext().samplingState.setFlag(flagDebug)
+	})


I believe reset still races, but unsure if this is worth protecting. This is only called when the syncpool is enabled and generally the lifecycle code seems dangerous. It would be trivial for the calling application to hold a *Span after .Finish() and do terrible things with it so I think we have to trust the calling code.

Added a comment to the main thread with some final thoughts.

joe-elliott · 2020-08-26T14:10:21Z

tracer.go

@@ -439,7 +439,7 @@ func (t *Tracer) emitNewSpanMetrics(sp *Span, newTrace bool) {
 func (t *Tracer) reportSpan(sp *Span) {
 	if !sp.isSamplingFinalized() {
 		t.metrics.SpansFinishedDelayedSampling.Inc(1)
-	} else if sp.context.IsSampled() {
+	} else if sp.SpanContext().IsSampled() {


store the value of SpanContext().IsSampled() at the top of the method to only call SpanContext() once?

joe-elliott · 2020-08-26T14:28:04Z

span.go

@@ -445,12 +451,12 @@ func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) {

 // Span can be written to if it is sampled or the sampling decision has not been finalized.
 func (s *Span) isWriteable() bool {
-	state := s.context.samplingState
+	state := s.SpanContext().samplingState


isWriteable and isSamplingFinalized are called beneath setTagInternal which receives a lock param that is ignored here.

joe-elliott · 2020-08-27T14:27:03Z

So I've spent some time thinking about this and the following makes sense to me.

Change isWriteable(), isSamplingFinalized() and setSamplingPriority() to take a SpanContext. These functions will no longer have to care about locking b/c we consider the SpanContext immutable. Then the calling code can make a choice to either use .SpanContext() or .context depending on whether or not it should or should not lock.
- isSamplingFinalized is the only method called in multiple places and I think its straightforward in each place if locking is needed or not.
Make applySamplingDecision and setTagInternal obey their lock parameters (including Context access). I think this can be done relatively cleanly with something like this at the top. Then ctx can just be passed to all the helper methods. (Good call on not making a callback under lock.)
```
var ctx SpanContext
if lock {
  ctx = span.SpanContext()
} else {
  ctx = span.context
}
```
FinishWithOptions -> tracer.reportSpan() -> reset() does need to be protected.
Add reset() and SetOperationName() to TestSpanContextRaces()

@yurishkuro does this make sense?

yurishkuro · 2020-08-30T20:01:55Z

@joe-elliott

Change isWriteable(), isSamplingFinalized() and setSamplingPriority() to take a SpanContext.

This bothers me a little. The first two are actually derived from the settings stored in the SpanContext, so passing them the SpanContext seems backwards, could we just call them on the SpanContext directly?

setSamplingPriority - I'd instead pass the samplingState than SpanContext.

I think doing it this way reduces the coupling.

joe-elliott · 2020-08-30T22:12:10Z

This bothers me a little. The first two are actually derived from the settings stored in the SpanContext, so passing them the SpanContext seems backwards, could we just call them on the SpanContext directly?

I see no issue with having a SpanContext as a receiver instead of taking one as a parameter. It's kind of a stylistic choice and I was considering suggesting it as well. 👍

setSamplingPriority - I'd instead pass the samplingState than SpanContext.

Agree.

joe-elliott · 2020-10-10T12:40:41Z

@keer25 are you still working on this issue? if not, i would like to take it up.

yurishkuro · 2020-10-14T15:25:35Z

Addressed differently in #544. But thank you @keer25 for the first attempt, it informed the final solution.

keer25 force-pushed the master branch 2 times, most recently from 665f930 to 255ce9d Compare August 25, 2020 21:17

yurishkuro reviewed Aug 26, 2020

View reviewed changes

keer25 force-pushed the master branch from 255ce9d to b138fa1 Compare August 26, 2020 08:37

keer25 force-pushed the master branch from b138fa1 to f2804f4 Compare August 26, 2020 08:39

joe-elliott reviewed Aug 26, 2020

View reviewed changes

yurishkuro mentioned this pull request Oct 8, 2020

Lock RemotelyControlledSampler.sampler on callbacks #543

Merged

yurishkuro closed this Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding locks where context is accessed #528

Adding locks where context is accessed #528

keer25 commented Aug 25, 2020 •

edited

codecov bot commented Aug 25, 2020 •

edited

yurishkuro left a comment

yurishkuro Aug 26, 2020

keer25 Aug 26, 2020

joe-elliott Aug 26, 2020

keer25 Aug 26, 2020

keer25 Aug 26, 2020

keer25 Aug 26, 2020 •

edited

keer25 Aug 26, 2020

yurishkuro Aug 26, 2020

keer25 Aug 26, 2020

joe-elliott Aug 26, 2020

joe-elliott Aug 26, 2020

joe-elliott Aug 26, 2020 •

edited

joe-elliott Aug 26, 2020

joe-elliott Aug 26, 2020

joe-elliott commented Aug 27, 2020 •

edited

yurishkuro commented Aug 30, 2020

joe-elliott commented Aug 30, 2020

joe-elliott commented Oct 10, 2020

yurishkuro commented Oct 14, 2020

Adding locks where context is accessed #528

Adding locks where context is accessed #528

Conversation

keer25 commented Aug 25, 2020 • edited

Which problem is this PR solving?

Short description of the changes

codecov bot commented Aug 25, 2020 • edited

Codecov Report

yurishkuro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keer25 Aug 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joe-elliott Aug 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joe-elliott commented Aug 27, 2020 • edited

yurishkuro commented Aug 30, 2020

joe-elliott commented Aug 30, 2020

joe-elliott commented Oct 10, 2020

yurishkuro commented Oct 14, 2020

keer25 commented Aug 25, 2020 •

edited

codecov bot commented Aug 25, 2020 •

edited

keer25 Aug 26, 2020 •

edited

joe-elliott Aug 26, 2020 •

edited

joe-elliott commented Aug 27, 2020 •

edited