Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchman goes nuts and consumes lots of CPU #55

Closed
dturner-tw opened this issue Sep 11, 2014 · 6 comments
Closed

Watchman goes nuts and consumes lots of CPU #55

dturner-tw opened this issue Sep 11, 2014 · 6 comments

Comments

@dturner-tw
Copy link
Contributor

I'm going to apologize in advance for the uselessness of this bug report. I'll update later if I get some better info. We're using watchman with git ( https://github.com/dturner-tw/git/tree/watchman ). Occasionally, on OS X (I think 10.9.4 but I'll update the ticket if I hear otherwise), watchman goes completely nuts and consumes all of the CPU. This tends to be correlated with git commands running, unsurprisingly (probably checkout or diff). Since I personally don't use a Mac, I have to rely on reports from others. Here's a log from one of the times when this happened: https://gist.github.com/dturner-tw/d6e54782978e55241deb (he then killed watchman and went on with his life). I know this log is not from the most recent version of watchman (it's from a late June version, IIRC), so it probably doesn't have the info that you need. I've since upgraded watchman, so hopefully next time we can get a better log.

I asked the same guy to run dtruss on the misbehaving process the next time it happened, to see if it was doing anything interesting. His machine immediately froze hard, so he sent a screenshot taken with his camera:
2014-09-11 10 51 10

Looking at that screenshot, I notice that it's calling psynch_cvwait, which I think probably correspond to pthread_cond_timedwait. It looks like the two calls to pthread_cond_timedwait in the code don't check to ensure that they are the only thread awake (pthread_cond_signal is allowed to wake up more than one process.). But I didn't thoroughly audit the code -- I just glanced briefly.

I'll send more info when I have it.

@wez
Copy link
Contributor

wez commented Sep 11, 2014

Based on that log, I think 8ff5d80 and b9e6899 will help resolve one error case.

More troubling though is kFSEventStreamEventFlagKernelDropped, which is equivalent to the linux IN_Q_OVERFLOW. I haven't been able to find information on bumping up the equivalent limits for the mac (not even sure if they exist).

If the system is unable to keep on top of the notifications then watchman will spend a bunch of time and resources re-crawling and re-examining the tree to try to keep current.

How many discrete watches are in use on that system? Is .watchmanconfig present in those watches, or is it just running with the defaults?

@wez
Copy link
Contributor

wez commented Sep 11, 2014

... and are those repos big (lot of files and dirs)? Roughly how many?

@dturner-tw
Copy link
Contributor Author

If by discrete watches, you mean the length of the list watchman watch-list returns, I'm not sure (I'll ask him to report that as well next time he has issues) -- but I see maybe a couple dozen listed in the log. The watchman stuff I wrote for git does one watch per git repo.

No .watchmanconfig -- just the defaults.

None of the repos are huge -- the largest two are each roughly 25k dirs, 70k files (the rest are much smaller).

@wez
Copy link
Contributor

wez commented Sep 11, 2014

an interesting data point might also be whether any of those watches are recursive (eg: watching both /foo/bar and /foo) as this would double the number of updates that need to be processed. We have configuration options to restrict watchman initiated watches to repo roots if it looks like this is happening.

@dturner-tw
Copy link
Contributor Author

I would be surprised if any of them were recursive; there is never a reason for anyone to do that with how we're using watchman, and as you can see from the log, none of the listed ones (which include the largest directory trees) appear to do it.

@wez
Copy link
Contributor

wez commented Nov 1, 2014

I'm going to close this out; I think the main contributor here was a couple of bugs in deciding to recrawl and then tripping over ourselves in the overflow-and-recrawl case. If you're still impacted by this on a more recent build, please re-open.

@wez wez closed this as completed Nov 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants