Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

storage: fix LazyChunkReader.join potential deadlock #1670

Merged
merged 1 commit into from
Aug 15, 2019
Merged

Conversation

janos
Copy link
Member

@janos janos commented Aug 14, 2019

A goroutine leak was observed on a node in private cluster. Full goroutine dump was attached. The problem was a wait on wait group.

goroutine 16761764 [semacquire, 254 minutes]:
sync.runtime_Semacquire(0xc00225a238)
    /usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00225a230)
    /usr/local/go/src/sync/waitgroup.go:130 +0x65
github.com/ethersphere/swarm/storage.(*LazyChunkReader).join(0xc00258b180, 0x126f1a0, 0xc0042e8090, 0xc0045e6000, 0x20000, 0x20000, 0xde0000, 0xe00000, 0x2, 0x80000, ...)
    /swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:559 +0x41f
github.com/ethersphere/swarm/storage.(*LazyChunkReader).join.func1(0xc000cf2090, 0x68, 0x70, 0xc00258b180, 0x126f1a0, 0xc0042e8090, 0xc0026e24e0, 0xde0000, 0xc001f00088, 0xc0026e2120, ...)
    /swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:556 +0x73f
created by github.com/ethersphere/swarm/storage.(*LazyChunkReader).join
    /swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:533 +0x366
goroutine 17801050 [semacquire, 249 minutes]:
sync.runtime_Semacquire(0xc002dcf7f8)
	/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc002dcf7f0)
	/usr/local/go/src/sync/waitgroup.go:130 +0x65
github.com/ethersphere/swarm/storage.(*LazyChunkReader).ReadAt.func2(0xc002dcf7f0, 0xc002b7e180)
	/swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:470 +0x2b
created by github.com/ethersphere/swarm/storage.(*LazyChunkReader).ReadAt
	/swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:469 +0x47f

It is possible that in case of error in LazyChunkReader.join wait group counter is not decremented. This PR adds a simple fix.

@janos janos requested review from zelig and acud August 14, 2019 18:13
@zelig zelig requested a review from jmozah August 14, 2019 18:35
@zelig zelig added this to Backlog in Swarm Core - Sprint planning via automation Aug 14, 2019
@zelig zelig added this to the 0.5.0 milestone Aug 14, 2019
Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@acud acud merged commit ec471e8 into master Aug 15, 2019
Swarm Core - Sprint planning automation moved this from Backlog to Done Aug 15, 2019
@acud acud deleted the lcr-join-deadlock branch August 15, 2019 08:26
chadsr added a commit to chadsr/swarm that referenced this pull request Sep 23, 2019
* 'master' of github.com:ethersphere/swarm:
  chunk, storage: chunk.Store multiple chunk Put (ethersphere#1681)
  storage: fix pyramid chunker and hasherstore possible deadlocks (ethersphere#1679)
  pss: Use distance to determine single guaranteed recipient (ethersphere#1672)
  storage: fix possible hasherstore deadlock on waitC channel (ethersphere#1674)
  network: Add adaptive capabilities message (ethersphere#1619)
  p2p/protocols, p2p/testing; conditional propagation of context (ethersphere#1648)
  api/http: remove unnecessary conversion (ethersphere#1673)
  storage: fix LazyChunkReader.join potential deadlock (ethersphere#1670)
  HTTP API support for pinning contents (ethersphere#1658)
  pot: Add Distance methods with tests (ethersphere#1621)
  README.md: Update Vendored Dependencies section (ethersphere#1667)
  network, p2p, vendor: move vendored p2p/testing under swarm (ethersphere#1647)
  build, vendor: use go modules for vendoring (ethersphere#1532)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants