New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck in CPU consuming accept loop when out of file descriptors #5025

Closed
jcs opened this Issue Jun 22, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@jcs

jcs commented Jun 22, 2018

Syncthing Version: v0.14.47, OpenBSD (64 bit)
OS Version: OpenBSD 6.3-current
Browser Version: N/A

I'm trying to sync a new directory with many subdirectories and files. A few minutes in, syncthing's web interface starts showing:

 2018-06-22 17:26:51: Listen (BEP/tcp): Accepting connection: accept tcp 0.0.0.0:22000: accept: too many open files 

The two syncthing processes are now eating as much CPU as possible:

USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
jcs      13038 57.4  1.3 108016 104036 p2  RN/1   5:15PM    5:40.42 /usr/local/bin/syncthing -no-browser
jcs      19093 42.5  0.2 28560 19244 p2  R/1    5:15PM    2:21.42 /usr/local/bin/syncthing -no-browser

A ktrace shows the first process spinning on accept and logging that it has too many files open.

jcs@frob:~> fstat -p 13038 | wc -l
    8003

My login class has a limit of 8000 file descriptors, which I've already raised a bunch of times just trying to get this sync going. OpenBSD's default process limit is 1024 files.

Syncthing should be using getrlimit at startup to check its open file limit and not open that many files. At the very least, when it hits the file limit, it should back down and either close connections or something to not spin trying to do the same thing over and over again.

@AudriusButkevicius

This comment has been minimized.

Member

AudriusButkevicius commented Jun 22, 2018

Sadly inotify/kqueue/et al consumes as many descriptors as there are folders. There already exist guides on how to increase the number. Alternaticely, disable folder watching and go back to periodic scans if this is an issue to you.

@jcs

This comment has been minimized.

jcs commented Jun 22, 2018

Maybe a better user experience would be to detect this situation and make it easy to go back to periodic scans so it doesn't just eat all CPU forever? I wasn't even sure how to disable kqueue watching (I only enabled it because the web interface presented me with a "this is new, do you want it on?" dialog) and guessed by editing the XML config. The web interface was not very responsive because both processes were eating all CPU at this point.

@calmh

This comment has been minimized.

Member

calmh commented Jun 23, 2018

Spinning CPU in this case is a bug. And yes, the watcher should probably back off automatically in this case. It does in other situations so not sure what the thing was here.

Option to disable is about as clear as any of our options, though.

screen shot 2018-06-23 at 10 40 07

@calmh calmh changed the title from Using too many file descriptors to Stuck in CPU consuming accept loop when out of file descriptors Jun 23, 2018

@calmh

This comment has been minimized.

Member

calmh commented Jun 26, 2018

The accept loop is easy to treat; a return instead of a continue will put this in the domain of the supervisor, which will enforce a cooldown period after a few failures. That said I'm not sure if it's useful. There are going to be lots and lots of things that fail when we are out of fd:s, and most places will probably treat it as a temporary error and retry. If everything is anyway going to come down like a house of cards, how much energy do we want to put into handling out-of-fd:s scenarios?

@AudriusButkevicius

This comment has been minimized.

Member

AudriusButkevicius commented Jun 26, 2018

Perhaps the accept loop is the only nasty thing. Also, I don't think we should return, perhaps we should just sleep before reentering the loop. Half working is better than consuming 100% cpu I guess.

calmh added a commit to calmh/syncthing that referenced this issue Jun 27, 2018

@calmh calmh closed this in ff441d3 Jun 27, 2018

@calmh calmh added the bug label Jun 27, 2018

@calmh calmh added this to the v0.14.50 milestone Jun 27, 2018

calmh added a commit to calmh/syncthing that referenced this issue Jul 4, 2018

Merge branch 'master' into recvonly3
* master:
  vendor: Update github.com/thejerf/suture
  vendor: Update github.com/thejerf/suture
  lib/model: Release both locks when waiting for services to stop (fixes syncthing#5028)
  gui: Ship license files (syncthing#5037)
  gui: Add license verbiage to vendored angular.js (syncthing#5035)
  lib/connections: Don't spin on accept failures (fixes syncthing#5025) (syncthing#5036)
  authors: Patch anonymouse64
  build: Improve snap generation (fixes syncthing#4863, fixes syncthing#5000) (syncthing#5034)
  authors: Add anonymouse64
  gui, man: Update docs & translations
  gui: Use info icon for folder ID (syncthing#5031)
  cmd/syncthing, lib/db: Abort execution if db version is too high (fixes syncthing#4994) (syncthing#5022)
  build: Let "go generate" create assets

@calmh calmh added this to the v0.14.49 milestone Jul 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment