Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: potential deadlock cycle caused by scavenge.lock [1.13 backport] #34150

Closed
gopherbot opened this issue Sep 6, 2019 · 3 comments
Closed
Assignees
Milestone

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 6, 2019

@mknyszek requested issue #34047 to be considered for backport to the next 1.13 minor release.

This has the potential for reduced stability in Go 1.13. While the chance of deadlock is extremely low, when it hits you it will tend to hit you consistently, because stack depth is consistent (for example #32105).

CC @aclements

@gopherbot Please open a backport issue for 1.13.

@gallir

This comment has been minimized.

Copy link

@gallir gallir commented Sep 9, 2019

Just in case.

It seems we were affected by this in https://github.com/gallir/smart-relayer It was in production and we were not able to debug/trace it properly but we found that most of the connections were blocked (every one is managed by one or several goroutines) and the RSS reached the instances RAM (32 GB) and were killed by the OOM.

We went back to a version compiled with 1.12 and didn't see the issue again:

$ ps axl| grep smart-relayer
4 0 7584 1 20 0 3754992 947680 - Ssl ? 46:10 /usr/local/bin/smart-relayer -c /usr/local/etc/relayer.conf

As you can see, the runtime starts tens of threads:
$ ps -T -p 7584 | wc
103 515 4522

@bcmills bcmills modified the milestones: Go1.13.1, Go1.13.2 Sep 25, 2019
@mknyszek mknyszek self-assigned this Sep 26, 2019
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Sep 26, 2019

Change https://golang.org/cl/197501 mentions this issue: [release-branch.go1.13] runtime: fix lock acquire cycles related to scavenge.lock

@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 2, 2019

Closed by merging 8a6cd7a to release-branch.go1.13.

@gopherbot gopherbot closed this Oct 2, 2019
gopherbot pushed a commit that referenced this issue Oct 2, 2019
…cavenge.lock

There are currently two edges in the lock cycle graph caused by
scavenge.lock: with sched.lock and mheap_.lock. These edges appear
because of the call to ready() and stack growths respectively.
Furthermore, there's already an invariant in the code wherein
mheap_.lock must be acquired before scavenge.lock, hence the cycle.

The fix to this is to bring scavenge.lock higher in the lock cycle
graph, such that sched.lock and mheap_.lock are only acquired once
scavenge.lock is already held.

To faciliate this change, we move scavenger waking outside of
gcSetTriggerRatio such that it doesn't have to happen with the heap
locked. Furthermore, we check scavenge generation numbers with the heap
locked by using gopark instead of goparkunlock, and specify a function
which aborts the park should there be any skew in generation count.

Fixes #34150.

Change-Id: I3519119214bac66375e2b1262b36ce376c820d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/191977
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
(cherry picked from commit 62e4156)
Reviewed-on: https://go-review.googlesource.com/c/go/+/197501
Reviewed-by: Austin Clements <austin@google.com>
@katiehockman katiehockman modified the milestones: Go1.13.2, Go1.13.3 Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.