Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncthing gets stuck in a high CPU "Failed Files" state - Possibly a symlink issue? #5641

Closed
pineapplemachine opened this issue Apr 8, 2019 · 9 comments

Comments

Projects
None yet
3 participants
@pineapplemachine
Copy link

commented Apr 8, 2019

Syncthing Version: v1.1.1
OS Version: Windows 7 & OSX 10.9 & Ubuntu 18.04

What I did

I've set up Syncthing on three machines - my desktop and my laptop that I need to sync files between, and a dedicated AWS EC2 instance for backups and so that stuff still get synced if both of my computers aren't on at the same time. (They usually aren't.)

First I set it up on the AWS instance, then I set it up on my Macbook. I set things synchronizing and after a couple of hiccups I had a couple folders duplicated on the dedicated server. (Per hiccups: Syncing slows down massively when the GUI is open?? I mean, I get why this could happen, but some kind of notice in the GUI would have been really helpful, I spent like an hour troubleshooting why network would be having problems that could explain the slow transfer rate when in fact it all had to do with the GUI.)

Then I set it up on Windows. I have been using Dropbox up to now, and the reason I'm setting up Syncthing is because Dropbox is ending support for my Macbook's OS. This means that all the files I need to sync were already being synchronized by Dropbox. I shared the smaller of the two folders I've been trying out with Syncthing between all the devices and everything was great. Syncthing recognized right away that everything was already synchronized. Then I shared the larger one and things were fine at first, but then my Macbook and the AWS instance both started getting confused.

What happened next

Except for when I pause the larger folder being shared, both my Macbook and the AWS instance are reporting very high CPU usage. The GUI shows the folder in a "Failed Files" state. Syncthing seemed to be repeatedly scanning the folder. I did not have this problem when sharing the folder from my Macbook with the AWS instance, this issue only occurred after adding my Windows machine to the bunch.

The log

Here is a portion of the very noisy log from syncthing on OSX. The log's frequent mention of symlinks is the reason why I suspect the issue to be connected to symlinks.

[NDTXR] 23:08:59 INFO: Puller (folder "Projects" (mxwe3-xw9bp), file "PyDwarf/df/libs/SDL_ttf.framework/Headers/SDL_ttf.h"): checking parent dirs: traverses symlink: PyDwarf/df/libs/SDL_ttf.framework/Headers
[NDTXR] 23:08:59 INFO: Puller (folder "Projects" (mxwe3-xw9bp), file "PyDwarf/df/libs/SDL_ttf.framework/Resources/Info.plist"): checking parent dirs: traverses symlink: PyDwarf/df/libs/SDL_ttf.framework/Resources
[NDTXR] 23:08:59 INFO: Puller (folder "Projects" (mxwe3-xw9bp), file "PyDwarf/df/libs/SDL_ttf.framework/Versions/Current/Headers/SDL_ttf.h"): checking parent dirs: traverses symlink: PyDwarf/df/libs/SDL_ttf.framework/Versions/Current
[NDTXR] 23:08:59 INFO: Puller (folder "Projects" (mxwe3-xw9bp), file "PyDwarf/df/libs/SDL_ttf.framework/Versions/Current/Resources/Info.plist"): checking parent dirs: traverses symlink: PyDwarf/df/libs/SDL_ttf.framework/Versions/Current
[NDTXR] 23:08:59 INFO: Puller (folder "Projects" (mxwe3-xw9bp), file "PyDwarf/df/libs/SDL.framework/Versions/Current/Headers/SDL_syswm.h"): checking parent dirs: traverses symlink: PyDwarf/df/libs/SDL.framework/Versions/Current
[NDTXR] 23:08:59 INFO: Puller (folder "Projects" (mxwe3-xw9bp), file "rust/hello_world/target/debug/hello_world.dSYM/Contents/Resources/DWARF/hello_world-149ec8b9da1f89ba"): checking parent dirs: traverses symlink: rust/hello_world/target/debug/hello_world.dSYM
[NDTXR] 23:08:59 INFO: Puller (folder "Projects" (mxwe3-xw9bp), file "rust/hello_world/target/debug/hello_world.dSYM/Contents/Info.plist"): checking parent dirs: traverses symlink: rust/hello_world/target/debug/hello_world.dSYM
@calmh

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

So, are there symlinks? Did you add Syncthing to an already existing file structure (on more than one device) that already contained symlinks?

Syncthing doesn't sync files behind symlinks, but the error above indicates that somehow it's been forced to try anyway. Typically that would mean that one side one of the path components is a symlink and on the other it's not. That shouldn't happen if it's Syncthing doing the initial copying, but if that was already the state when it was setup I could see some confusion happening.

@pineapplemachine

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

@calmh Yes, I am using Syncthing to replace Dropbox. These directories had already been synced between the OSX and Windows machines. I don't think I specifically created symlinks in that directory, but evidently some compilers must have done.

I can completely understand that it's a confusing edge case and wouldn't complain if Syncthing asked me to resolve the anomaly manually, but Syncthing failing to recognize the problem and devouring all my CPU cycles in perpetuity feels like a bug to me.

@calmh

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

Failed items are retried indefinitely, once a minute. If there are a lot of failed items this might be a lot of work, unnecessarily, but Syncthing doesn't know that. In this case I think you will need to clean up the symlink situation manually. The presence of "failed items" in the GUI is the prompt to do so.

devouring all my CPU cycles in perpetuity feels like a bug to me.

Perhaps we could realize that some errors are not recoverable and never retry them but that's a hard decision to make. Most things can change, especially when a human is involved. As is, I think this is functions-as-designed, even though it's not optimal for you in this case.

@pineapplemachine

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

@calmh

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

@imsodin

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

Potentially the high cpu usage is due to a discrepancy between the pauses between successive failed tries and how long one try takes: If it takes e.g. 10min, you get 10min activity, 1min pause, 10min activity, 2min pause, 10min activity, 4min pause, ... - so essentially lots of activity. I'd propose to time the try and set the pause to at least that time (capped at 1h). If anything happens (scan, remote changes) that triggers a pull anyway, so you shouldn't "miss out" on a resolution like this. Does this sound reasonable?

I think the problem with symlinks/directories is in how we handle (or not handle) conflicts for them: If a symlink or file wins a conflict over a directory, we will just remove that directory (we never create conflicts for symlinks or directories). That's obviously problematic if the directory has children. What I think we should do is let directories with children always win conflicts over files/symlinks regardless of mtime and move the file/symlink for conflict. And not directly related to the problem, I would also create conflicts for symlinks, instead of just deleting them as we do now.

From the wording of this issue, I'd consider the first part directly relevant and the second part worthy of a separate issue. Does that sound about right?

imsodin added a commit to imsodin/syncthing that referenced this issue Apr 10, 2019

@pineapplemachine

This comment has been minimized.

Copy link
Author

commented Apr 10, 2019

@imsodin This sounds like a reasonable change to me, but this doesn't seem like it's related to the issue I'm encountering. For as long as I was paying attention to system stats, my CPU usage did not seem to ever go down. I don't see Folder %v isn't making sync progress - retrying in %v. messages in my log, not even after the failed paths in the log start to repeat.

image

Here also is the list of failed items as it appears within the GUI in case it helps.

image

@pineapplemachine

This comment has been minimized.

Copy link
Author

commented Apr 10, 2019

Now you can see my full 150,000-line log produced by running Syncthing for about 5 minutes here. The log begins when I start Syncthing. A very short while later I un-pause sync for the problematic folder. The log ends shortly after I pause sync for that folder again.

http://files.pineapplemachine.com/public/logs/Syncthing_Log_2019_04_10.txt

@imsodin

This comment has been minimized.

Copy link
Member

commented Apr 10, 2019

The list of failed items matches exactly the described symlink/dir conflict.

And the log shows that our logging policy might be a bit questionable: One line for every single error, and the same time no indication at the start and end of one iteration of trying to pull. Anyway the log excerpt you show is just one pull operation (and three iterations), the pause would only kick in the log. And some entries say that there is stuff to be scanned, which might trap you in a loop that doesn't let the pause kick in: Pull schedules scan, scan doesn't see anything (symlink), but still schedules another pull (that shouldn't happen though as far as I remember) -> start again. Anyhow, that would be resolved by the proposed handling of conflicts of directories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.