Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in github.com/hashicorp/consul/watch library #4357

Closed
KJTsanaktsidis opened this issue Jul 8, 2018 · 2 comments
Closed

Data race in github.com/hashicorp/consul/watch library #4357

KJTsanaktsidis opened this issue Jul 8, 2018 · 2 comments
Labels
type/bug Feature does not function as expected

Comments

@KJTsanaktsidis
Copy link

Overview of the Issue

When using the github.com/hashicorp/consul/watch package as a library in client code, there appears to be a data race detected by the race detector. Full race detector output is in this gist: https://gist.github.com/KJTsanaktsidis/acef6be673f311488c4ee06d80d74b81

It appears to be a race between setting p.cancelFunc here:

  github.com/zendesk/outbound/vendor/github.com/hashicorp/consul/watch.makeQueryOptionsWithContext()
      /go/src/github.com/zendesk/outbound/vendor/github.com/hashicorp/consul/watch/funcs.go:310 +0xd7

and reading it here:

  github.com/zendesk/outbound/vendor/github.com/hashicorp/consul/watch.(*Plan).Stop()
      /go/src/github.com/zendesk/outbound/vendor/github.com/hashicorp/consul/watch/plan.go:136 +0xd1

I guess setting p.cancelFunc in makeQueryOptionsWithContext should be protected by p.stopLock, but it's less obvious to me how to protect the places where p.cancelFunc is called.

One option might be to turn p.stopLock into a RWMutex and do this in the places where it is called:

defer func(){
    p.stopLock.RLock()
   defer p.stopLock.RUnlock()
   if p.cancelFunc != nil {
       p.cancelFunc()
    }
}()

although this is kind of ugly... Another option might be to return a local copy of cancel from makeQueryOptionsWithContext and call that in the places where p.cancelFunc is currently called (except for inside Stop of course).

I'm happy to submit a patch for either of these two options (or any other way you can think of to slice this).

Reproduction Steps

I had this function to setup a Consul watch, and was running in the race detector. It's not a full program but I think it illustrates the issue - see shardelection.go in the gist: https://gist.github.com/KJTsanaktsidis/acef6be673f311488c4ee06d80d74b81

This was using the v1.2.0 tag of Consul.

Consul info for both Client and Server

N/A, it's a client library issue

Operating system and Environment details

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.4 LTS
Release:	16.04
Codename:	xenial
$ uname -a
Linux 8ac6a25517ab 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
@mitchellh
Copy link
Contributor

Thank you, this indeed looks like a real race. Note that only cancelFunc in Stop has a race since all the other calls have a guaranteed non-concurrent read-after-write since they're strictly serial. Regardless, there is a race between Stop and the init.

@pearkes pearkes added the needs-investigation The issue described is detailed and complex. label Jul 26, 2018
@dawxy
Copy link
Contributor

dawxy commented Oct 24, 2018

@mitchellh
I also encountered the same problem. Is there any plan to solve this problem?

@pearkes pearkes added type/bug Feature does not function as expected and removed needs-investigation The issue described is detailed and complex. labels Oct 26, 2018
@dawxy dawxy mentioned this issue Nov 30, 2018
mkeeler pushed a commit that referenced this issue Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants