Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unable to acquire - semaphore out of sync #16646

Closed
karalabe opened this issue Aug 9, 2016 · 15 comments

Comments

Projects
None yet
9 participants
@karalabe
Copy link
Contributor

commented Aug 9, 2016

Today one of our CI tests failed on AppVeyor, Windows, Go 1.6.2 with the error message seen in the title. I don't have a reliable way to reproduce it, it's inside a huge project, but here's the complete stack dump if it helps, at least to provide some hints whether it's our code or Go. (I assume the fault is on our end but I haven't ever seen this message so can't trace is properly).

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2016

Are you positive thing codebase contains no data races?

On Tue, 9 Aug 2016, 22:27 Péter Szilágyi notifications@github.com wrote:

Today one of our CI tests failed on AppVeyor, Windows, Go 1.6.2 with the
error message seen in the title. I don't have a reliable way to reproduce
it, it's inside a huge project, but here's the complete stack dump
https://ci.appveyor.com/project/tgerring/go-ethereum/build/develop.298#L236
if it helps, at least to provide some hints whether it's our code or Go. (I
assume the fault is on our end but I haven't ever seen this message so
can't trace is properly).


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16646, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAcA0vBuI0FKBZhBSZ-_qdGwRYX_31pks5qeHJPgaJpZM4JgBXz
.

@fjl

This comment has been minimized.

Copy link

commented Aug 9, 2016

We are reasonably confident that the codebase doesn't contain data races. But of course there is no way to prove it to you ;)

@karalabe

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2016

@fjl I did merge in one of your PRs a few hours ago, so it's worth a double check, though it's a very strange error that I haven't seen before.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2016

Thank you for confirming. Go 1.7 will be released next week, so any fix to
this issue will land in go 1.7 if it is not already fixed. Can you please
test with the latest go 1.7 release candidate.

On Tue, 9 Aug 2016, 22:56 Felix Lange notifications@github.com wrote:

We are reasonably confident that the codebase doesn't contain data races.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#16646 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA1mVFak2ASyM-3I6SbXtU1ulHoJtks5qeHkHgaJpZM4JgBXz
.

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Aug 9, 2016

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2016

You should have gotten a stack backtrace with the error. Can you attach it here?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2016

Oh, sorry, I see you provided a link to the stack trace above.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2016

I don't know what the bug is, but here is what is going on. Your program is entering the stop-the-world phase of a garbage collection. The goroutine that started that is telling all the other goroutines to stop. It is sleeping on a note waiting for a notification that a goroutine has stopped, with a deadline of 100us. The code (notetsleep_internal) calls WaitForSingleObject with a deadline of 100us. WaitForSingleObject returned an error, assumed to indicate a timeout, meaning that the deadline has expired. When the goroutine goes to check the note, it finds that the note has been woken up. It calls WaitForSingleObject with no deadline, expecting to acquire the semaphore. That calls fails unexpectedly.

A call to WaitForSingleObject with no deadline should not fail. I think what we need to do is modify os_windows.go to report the actual failure in that case. That might help clarify what has happened here.

@jboelter

This comment has been minimized.

Copy link

commented Aug 9, 2016

It looks like a few things could be cleaned up here. The return value isn't checked from CreateEvent which could be returning a null event if it failed to create an event.

WaitForSingleObject in semasleep is assuming any non-zero return values is a timeout, there's some nuance here. Timeout is a 0x0102, other errors may be returned.

@gopherbot

This comment has been minimized.

Copy link

commented Aug 10, 2016

CL https://golang.org/cl/26655 mentions this issue.

@fjl

This comment has been minimized.

Copy link

commented Aug 16, 2016

@jboelter

This comment has been minimized.

Copy link

commented Aug 16, 2016

Are you able to test with a custom build from the CL above? It should give you better insight into the failure.

@fjl

This comment has been minimized.

Copy link

commented Aug 16, 2016

Not really. I'll try to get go built with the Cl onto AppVeyor this week.

I haven't been able to reproduce the failure locally, the test passes 300+ iterations in my Windows VM.

gopherbot pushed a commit that referenced this issue Oct 12, 2016

runtime: check for errors returned by windows sema calls
Add checks for failure of CreateEvent, SetEvent or
WaitForSingleObject. Any failures are considered fatal and
will throw() after printing an informative message.

Updates #16646

Change-Id: I3bacf9001d2abfa8667cc3aff163ff2de1c99915
Reviewed-on: https://go-review.googlesource.com/26655
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 14, 2017

Hi @fjl. CL 26655 was released as part of Go 1.8. Have you had any failures of this sort on 1.8?

@fjl

This comment has been minimized.

Copy link

commented Jun 14, 2017

No, it hasn't happened again. You can close.

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 14, 2017

Thanks!

@aclements aclements closed this Jun 14, 2017

@golang golang locked and limited conversation to collaborators Jun 14, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.