-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/race: race detector misses race condition in case of fmt.Println #27756
Comments
@gopherbot please add label RaceDetector |
The race detector works by tracking reads and writes to memory addresses from different goroutines. In your case, the introduction of Multiple reads from the same location are not considered race conditions. only after a write has occurred subsequent reads or writes will trigger. The modified version of your program below illustrates this, as the race detector will trigger on the second print in
Unfortunately that's a consequence of how the race detector works. The race detector will not issue "false positives" but it is possible for it to miss something ("false negative") in short-lived programs. As the blog post about the race detector concluded:
|
The race detector should catch this. The race detector should find the race, even though the read gets executed before the write. It remembers reads so that it can issue errors if subsequent writes happen without a happens-before relationship. Fixed repro: make sure the goroutine has a chance to run.
|
My analysis was incorrect, I apologize. OTOH I think I may have found an even simpler repro, one that doesn't involve fmt at all. The code below does not report a race, however if the last two lines of
I could not find in the documentation whether the channel communication clears the state of the writes to not be racy. |
@mirtchovski : your last example doesn't have a race. The channel operations establish a happens-before relationship between the operations in |
thank you. that's the part i was missing. |
Also race detector should sleep for few seconds before existing specifically yo catch such cases.
We also have some acquire/release race annotations in syscall package around Read/Write. But Printf should only Write? |
I observed the same behaviour from @randall77's reproducer if I replace |
This is due to |
Why are we doing this fdMutex business for blocking descriptors? That's just pure slow down, we never promised that they can be closed concurrently with read/write and it never worked. I think it can break things too because sure there are some weird files in linux pseudo file systems that require writing second time to unblock a preceding write, but fdMutex won't allow that. |
Was that intentional? We have the non-blocking flag, so we could check it before we take the fdMutex. But back to races: I think we should ignore fdMutex for race detector. We don't guarantee any user-visible synchronization there, right? I think we also explicitly catch races between Read/Write and Close for anything other than net.Conn by annotating Close as write. That's what we used to detect for os.File because Read/Write read the fd field and Close wrote -1 there. |
It's now explicitly OK to call The upshot of that is that I think we need to keep using the |
Is this new behavior useful for blocking descriptors? It looks like closing a chan concurrently with sends. |
It's useful for descriptors that actually block, yes. See, for example, #18507. It's true that concurrently closing and writing to a descriptor is a race. But it's a race that is mediated by the kernel to be harmless. And it's a race that is difficult to avoid when you aren't in control of the other end of the descriptor. I understand that this is a problem in that it can hide some legitimate races. And it adds a few atomic instructions to every file I/O operation. Is there another problem? |
But is it useful for non-pollable descriptors? It looks like it can only mask bugs. The race is not mediated by kernel. If we actually race write/close in the kernel, it will be quite harmful. But we mediate it with fdMutex. The problem is (1) additional contended atomic operations (kernel operations can be completely independent, e.g. reading random entropy from per-cpu pool), (2) masking bugs on file descriptors (even if it does not crash now, write success is still non-deterministic), (3) masking other unrelated races, (4) deadlocking when a file descriptor requires 2 parallel writes to proceed, but we block the second write on fdMutex. |
Why do you say that racing write and close in the kernel is harmful? That is a normal operation. One or both will fail. The result is non-deterministic but not otherwise harmful. There are non-pollable non-standard descriptors that can block. For these descriptors the ability to close a descriptor that is hanging in a read is useful. I agree with your points 1, 2, and 3, but I don't understand when a descriptor would require 2 parallel writes. |
Because in this case you can't know on what descriptor you actually issue the write. Consider:
Option 1: write gets into kernel and resolves fd number to file description object and acquire a reference to it first (before close). Everything is fine. Since write and close are concurrent, you can't possibly avoid 2a/2b. Now, fdMutex avoids this race by delaying close, but you really don't want to race write/close in general. |
Consider a custom file (like something in devfs, procfs, sysfs, debugfs). Writes to such file are really just some operation in kernel (don't have anything to do with write semantics). Now consider that semantics are so that a second write unblocks a first write. For example, first write says "I want to receive a message from a peer" and the second write says "I am sending a message to a peer". |
But for non-pollable descriptors we will not issue close while there is an outstanding read/write. Close will only try to evict the fd from poller (which should be no-op for non-pollable descriptors) and then delay the actual close until after read/write returns. |
But the kind of write/close race you describe can not happen with the current internal/poll code. |
What are you suggesting we should do? |
One option is always to do nothing :) Then it depends on if we promised that Close/Read/Write races are OK for all files or not. You said
|
I don't think it's explicitly documented, but I believe that it should be OK to call |
What version of Go are you using (
go version
)?go version go1.11 linux/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?What did you do?
cat /home/dd/.GoLand2018.2/config/scratches/scratch_5.go
go run -race /home/dd/.GoLand2018.2/config/scratches/scratch_5.go
What did you expect to see?
What did you see instead?
I think this is related with: #12664
cc @dvyukov
The text was updated successfully, but these errors were encountered: