Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inotify: Watcher.Remove may deadlock when multiple fs events occur #195

Closed
jerryz920 opened this issue Jan 27, 2017 · 3 comments · Fixed by #203
Closed

inotify: Watcher.Remove may deadlock when multiple fs events occur #195

jerryz920 opened this issue Jan 27, 2017 · 3 comments · Fixed by #203
Labels

Comments

@jerryz920
Copy link

Before reporting an issue, please ensure you are using the latest release of fsnotify.

Which operating system (GOOS) and version are you using?

Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty
go version: go1.7.3 linux/amd64

Please describe the issue that occurred.

I run into this issue, and from issue list I think people seeming to have similar problem, so I spend some time digging the cause.

The code I am using is latest commit fd9ec7d
Assuming select loop for events is sequential, see code example below

Inotify may generate multiple events at same time. When this happens, if one happens to call Watcher.Remove on any valid watch path (e.g. for error handling), then it will block on the Watcher's condition variable (inotify.go:158), waiting for a broadcast on sender side (inotify.go:299). However, the channel size is 1, so it will also block waiting for the select loop to consume events, which is already in waiting state.

I tried to increase the event channel size, which allows sender side to proceed, or alternatively one could use goroutine for Remove in order to avoid the deadlock. But I am not sure if this is ideal.

Are you able to reproduce the issue? Please provide steps to reproduce and a code sample if possible.

Code below is to reproduce the issue: a test folder "tests" has a file "file1" inside, and the code tries to rename "tests/file1" to "file2" (which generates two events). If Remove is called inside select loop, it will cause deadlock alarm. Changing watcher.Remove to go watcher.Remove or modify watcher's Events channel would fix it.

package main

import (
        "log"
        "os"
        "time"

        "github.com/fsnotify/fsnotify"
)

func main() {
        watcher, _ := fsnotify.NewWatcher()
        watcher.Add("tests")
        watcher.Add("tests/file1")

        done := make(chan bool)
        first := true

        go func() {
                for {
                        select {
                        case event := <-watcher.Events:
                                log.Printf("event name: %s", event.String())
                                if first {
                                        // *boom*
                                        watcher.Remove("tests/file1")
                                        first = false
                                }
                                //log.Printf("is this solved?")
                        }
                }
        }()

        time.Sleep(2 * time.Second)
        os.Rename("tests/file1", "file2")
        <-done
}
@aarondl
Copy link
Contributor

aarondl commented Feb 7, 2017

Have the same issue here. Adding a goroutine really isn't a feasible alternative here since you need to check for errors and the goroutine that owns the watcher/list of watched directories (because we have no recursive API) needs to know when this goroutine succeeds or fails in order to go ahead with deletion from the watched dirs. This really complicates implementing the recursive watcher :(

@nullbio
Copy link

nullbio commented Mar 27, 2017

Having this problem as well, please see above reference. What can we do about this?

@nullbio
Copy link

nullbio commented Mar 27, 2017

Same issues: #123 and #115

aarondl added a commit to aarondl/fsnotify that referenced this issue Mar 29, 2017
Several people have reported this issue where if you are using a
single goroutine to watch for fs events and you call Remove in
that goroutine it can deadlock. The cause for this is that the Remove
was made synchronous by PR fsnotify#73. The reason for this was to try and
ensure that maps were no longer leaking.

In this PR: IN_IGNORE was used as the event to ensure map cleanup.
This worked fine when Remove() was called and the next event was
IN_IGNORE, but when a different event was received the main goroutine
that's supposed to be reading from the Events channel would be stuck
waiting for the sync.Cond, which would never be hit because the select
would then block waiting for someone to receive the non-IN_IGNORE event
from the channel so it could proceed to process the IN_IGNORE event that
was waiting in the queue. Deadlock :)

Removing the synchronization then created two nasty races where Remove
followed by Remove would error unnecessarily, and one where Remove
followed by an Add could result in the maps being cleaned up AFTER the
Add call which means the inotify watch is active, but our maps don't
have the values anymore. It then becomes impossible to delete the
watches via the fsnotify code since it checks it's local data before
calling InotifyRemove.

This code attempts to use IN_DELETE_SELF as a means to know when a watch
was deleted as part of an unlink(). That means that we didn't delete the
watch via the fsnotify lib and we should clean up our maps since that
watch no longer exists. This allows us to clean up the maps immediately
when calling Remove since we no longer try to synchronize cleanup
using IN_IGNORE as the sync point.

- Fix fsnotify#195
- Fix fsnotify#123
- Fix fsnotify#115
markbates pushed a commit that referenced this issue Mar 29, 2017
Several people have reported this issue where if you are using a
single goroutine to watch for fs events and you call Remove in
that goroutine it can deadlock. The cause for this is that the Remove
was made synchronous by PR #73. The reason for this was to try and
ensure that maps were no longer leaking.

In this PR: IN_IGNORE was used as the event to ensure map cleanup.
This worked fine when Remove() was called and the next event was
IN_IGNORE, but when a different event was received the main goroutine
that's supposed to be reading from the Events channel would be stuck
waiting for the sync.Cond, which would never be hit because the select
would then block waiting for someone to receive the non-IN_IGNORE event
from the channel so it could proceed to process the IN_IGNORE event that
was waiting in the queue. Deadlock :)

Removing the synchronization then created two nasty races where Remove
followed by Remove would error unnecessarily, and one where Remove
followed by an Add could result in the maps being cleaned up AFTER the
Add call which means the inotify watch is active, but our maps don't
have the values anymore. It then becomes impossible to delete the
watches via the fsnotify code since it checks it's local data before
calling InotifyRemove.

This code attempts to use IN_DELETE_SELF as a means to know when a watch
was deleted as part of an unlink(). That means that we didn't delete the
watch via the fsnotify lib and we should clean up our maps since that
watch no longer exists. This allows us to clean up the maps immediately
when calling Remove since we no longer try to synchronize cleanup
using IN_IGNORE as the sync point.

- Fix #195
- Fix #123
- Fix #115
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants