Skip to content

Comments

fix: Fix deadlock raised in #11#13

Closed
said-saifi wants to merge 2 commits intoneilotoole:masterfrom
alpacahq:master
Closed

fix: Fix deadlock raised in #11#13
said-saifi wants to merge 2 commits intoneilotoole:masterfrom
alpacahq:master

Conversation

@said-saifi
Copy link

@said-saifi said-saifi commented Apr 1, 2022

This fixes #11 .

Root cause:

  • When an error occurs we exit a worker by returning from startG function.
  • If the number of workers gCount reaches 0 while all other functions are already in the buffer then a deadlock will be reached since there is no chance a new worker will be created and the current number of workers is 0.

Fix: (Note: there are different ways of achieving this)

  • Before adding to the buffer check if there was already an error, in case there was then just exit and don't try to add to buffer.
  • Unit test added


wg sync.WaitGroup

errOnce sync.Once
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now we will be reading err continuously then we needed an RWMutex already.
We could keep the sync.Once but the same can be achieved with RWMutex that is already needed, so removed the sync.Once. Let me know if you think it's cleaner to keep both sync.Once and sync.RWMutex.

}

g.qCh <- f
for {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this for loop we simply do the following:

  • Get a read lock on err
  • If err is NOT nil we return as there is no need to send function to buffer
  • If err is nil we try to send to channel, if we can't send to channel then we break the select and go back to checking if err is not nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only concern with this is it could become a busy loop. I wonder if sync.Cond can do something better here

return
}

if err := f(); err != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nicer but as explained above we are already needing the RWMutex so this can be achieved by using the RWMutex instead of also introducing a new field sync.Once.

So let me know if you think its better to stay using sync.Once in addition to the RWMutex just for readability.

@said-saifi
Copy link
Author

said-saifi commented Apr 1, 2022

As stated here this can be tried out using

go mod edit -replace github.com/neilotoole/errgroup=github.com/alpacahq/errgroup@master

err error

// errMu protects err.
errMu sync.RWMutex
Copy link

@umitanuki umitanuki May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomc.Value could also work?

@said-saifi said-saifi closed this Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock with context

2 participants