Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rsync watcher is really slow #3249

Closed
patrickheeney opened this issue Mar 17, 2014 · 70 comments
Closed

Rsync watcher is really slow #3249

patrickheeney opened this issue Mar 17, 2014 · 70 comments

Comments

@patrickheeney
Copy link

Running version 1.5.1. The issue I have now is the rsync watcher is incredibly slow. It takes at least several minutes to detect that a file has changed and initiate rsync.

After about 10 minutes sometimes it goes rsync crazy and does 7 or 8 in a row.

Some others have issues #3159 (comment).

Also it may be helpful to have a whitelist of folders to watch. In my case I only need to watch the www/* folder as the rest are irrelevant:

config.vm.synced_folder ".", "/srv/www/", type: "rsync", rsync__include: ["www/"], rsync__exclude: [".git/", "tmp/", "cache/"]

(My original issue here was the fact that I had a packer .iso file and rsync was not outputting any progress so it looks like it was hanging for 15 minutes.)

@patrickheeney patrickheeney changed the title Rsync Issues / Experience Rsync watcher is really slow Mar 17, 2014
@drpebcak
Copy link

👍 for this issue. #3108 has other examples of people hitting this slowness, as well as one person's workaround using grunt.

I'm also having other issues with rsync_auto, documented here: #3196

@smerrill
Copy link

I can confirm this - it takes a minute or more to detect a file change before rsync-auto fires for myself and a coworker under OS X 10.9.2. It's also possible to save a file a few times before an rsync has fired and then multiple rsyncs happen in quick succession when it "catches up." It seems that OS X might batch up these events before sending them to rsync-auto?

@drpebcak
Copy link

Supposedly vagrant watches fsevents for this. You can use fseventer to see how quickly these appear on osx's side.

@smerrill
Copy link

Yes - I can confirm this. I added a puts _latency line before https://github.com/guard/listen/blob/master/lib/listen/adapter/darwin.rb#L30 in my copy of Vagrant 1.5.1, and the puts fires within 100ms or so of saving a file, but Vagrant takes considerably longer pick it up.

@smerrill
Copy link

Interestingly, while this is happening ruby from the vagrant rsync-auto process shows up in my iStat Menus and Activity Monitor as using between 98% and 102% CPU.

@smerrill
Copy link

The CPU usage starts as soon as a file is saved and does not stop until after the rsync has completed.

1___zsh_tmux_plugin_run_new__tmux_

@smerrill
Copy link

dtruss shows that the rsync-auto process is doing an open() and a read() on every file being watched after every fsevent that it receives.

@smerrill
Copy link

This makes sense - it is listen computing md5 hashes for all the files. Would it be possible to try upgrading the bundled version of listen to at least 2.6.0? According to guard/listen#184 (comment), it may lazily load MD5 hashes better past this version.

@patrickheeney
Copy link
Author

That thread is concerning. It seems to struggle with 20k files? I have
projects with much more than that. I thought vagrant rsync was going to be
more performant than nfs according to the blog post. Once you factor in the
watcher it is way off the marl.
On Mar 17, 2014 8:20 PM, "Steven Merrill" notifications@github.com wrote:

This makes sense - it is listen computing md5 hashes for all the files.
Would it be possible to try upgrading the bundled version of listen to at
least 2.6.0? According to guard/listen#184 (comment)guard/listen#184 (comment),
it may lazily load MD5 hashes better past this version.

Reply to this email directly or view it on GitHubhttps://github.com//issues/3249#issuecomment-37896132
.

@mitchellh
Copy link
Contributor

Hm this is very odd. Upgrading to 2.6.0 will certainly help. I'll take a look and see what parts may be slow.

@patrickheeney rsync specifically optimizes file read/write performance at the expense of syncing latency. However, this is a lot of sync latency so I'll take a look.

@smerrill
Copy link

In looking at how guard and guard-shell work, I noticed that new files don't get picked up, and it uses the md5 comparison to ensure that file contents actually change before syncing, whereas a tool like http://www.fernlightning.com/doku.php?id=software:fseventer:start picks up file creation and writes in a matter of milliseconds. It would be nice if there were a configuration for Listen:Adapter:Darwin that would call its callbacks unconditionally upon getting fsevents representing file creation or writes to files without checking the checksum. Perhaps Vagrant could then expose that option.

@patrickheeney
Copy link
Author

@mitchellh Ya totally get the syncing latency, I was just not sure what an acceptable range would be. I am somewhat spoiled it seems with gulpjs and grunt in that everything happens in a less than a second which is faster than I can alt+tab+refresh.

It seems as though the amount of files it watches has a crucial impact so I definitely hope we can narrow this down and have something configurable. For example my frontend guys are not going to be editing a lot of backend code so doing something like this vagrant rsync-auto --path=web/content/*.js would be great. Essentially a whitelist for directories on a global scale and CLI arguments on a local scale. The fewer files we seem to watch the faster it is going to be. Excluding hundreds of directory structures seems like a frustrating approach.

@smerrill
Copy link

I've had really good luck with a script just like this:

require 'rb-fsevent'

options = {:latency => 1.5, :no_defer => false }

fsevent = FSEvent.new
fsevent.watch "/path/to/my/project", options do |directories|
  if directories.select { |i| !i.match("\.(git|idea|svn|hg|cvs)") }.empty?
    puts "Discarding these changes: #{directories.inspect}"
  else
    puts "Detected change inside: #{directories.inspect}"
    system("vagrant rsync")
  end
end
fsevent.run

I know it isn't as cross-platform as listen, but that works very well with a 28k+ file directory (and also has the side-effect of showing off the amount of fsevents that happen in the various .git directories in a project as I'm working on it.)

@ThePixelDeveloper
Copy link

Throwing it out there. Can this problem not be solved with the rsync daemon? Bootstrap the rsync daemon on the box and then run rsync host side for a speedier transfer. I'm not too familiar with the advantages of the daemon so feel free to say if this isn't going to work out.

@arnaudbreton
Copy link

As mentioned in #3159 it seems the main issue with Guard/Listen is that it's a blacklist instead of a whitelist like grunt watch does (see. https://gist.github.com/arnaudbreton/9517344), which explains the perf. difference when using the last.

The solution offered by @patrickheeney with the path option is a good option, combined with a way to specify these path directly in the Vagrantfile.

@patrickheeney
Copy link
Author

@arnaudbreton My solution was the band-aid approach. It should speed things up but it appears there are deeper issues here that @smerrill discovered. Ideally both would be addressed.

@smerrill
Copy link

@ThePixelDeveloper The problem is on the host side, where monitoring for changes is quite slow. The rsync daemon might help to lower transactional CPU usage, though, so it would probably be worth looking at as a new option for rsynced folders in another issue.

@dougmarcey
Copy link

I work with @smerrill and have tweaked the script a bit to read from my Vagrantfile.

#!/usr/bin/env ruby
require 'rb-fsevent'

options = {:latency => 1.5, :no_defer => false }
pathRegex = /^(?!\s*#).*config.vm.synced_folder\s*"(.*?)"\s*,\s*".*?"\s*,.*?type:\s*"rsync".*/
paths = Array.new
File.open('Vagrantfile').each_line do |line|
    paths << File.expand_path($1) if pathRegex.match(line)
end
puts "Watching: #{paths}"
fsevent = FSEvent.new
fsevent.watch paths, options do |directories|
  if not directories.select { |i| !i.match("\.(git|idea|svn|hg|cvs)") }.empty?
    puts "Detected change inside: #{directories.inspect}"
    system("vagrant rsync")
  end
end
fsevent.run

I've dropped this into the directory with my Vagrantfile. I am running it via vagrant up && ./watch.rb. It's doing fine on two directories with about 96,500 files total. No killing my performance or IO, and responds within the latency to every change I've done so far. Only works on mac, and YMMV, but it has helped me a lot today.

@djdevin
Copy link

djdevin commented Mar 20, 2014

I actually tried using the rsync daemon while trying to fix the latency issues, and for me it didn't seem to matter from benchmarks, with the latency hovering around 1 second over either rsync or rsync+ssh with 35,000 files. Surprisingly the daemon actually takes a little longer when dealing with 75,000+ (2-3s vs 1-2s) and I'm not sure why. I ended up using an simple inotify-tools wrapper to watch the current directory and run the rsync command on any change. This works really well for me with 35,000 files and finally solves our NFS woes.

I think another bottleneck here is that when the number of files grows, you end up having watchers on all the files, then when those detect a change, an rsync command runs that tries to detect changes again and sync the entire tree. Something like Unison might work better here on large trees because (someone correct me if I'm wrong) it has the same watches BUT does not try to resync the entire tree on a change but only sends the affected files.

@ThePixelDeveloper
Copy link

Ah ok, glad someone else looked into the daemon.

I took a similar solution to many of the other people here. I created a nodejs listener script that listens for changes and pings off a request to a simple server in the VM which does a rebuild of the assets. Works really well, even though it's an icky hack.

@smerrill
Copy link

@dougmarcey and I have released an alpha implementation of a lighter-weight rsync-auto command that uses the same rb-fsevent and rb-inotify libraries under the covers: https://github.com/smerrill/vagrant-gatling-rsync .

It's been moderately tested on OS X and more lightly tested under Linux. It also outputs copious log messages if you run it with VAGRANT_LOG=info vagrant gatling-rsync-auto.

We'd love your feedback if you want to try it out.

@mitchellh
Copy link
Contributor

@smerrill Awesome, if this avenue turns out to be much more performant, I may end up wanting to merge your plugin into core! Wouldn't make sense for a 1.5.x so I'll keep continuing to try to improve the Listen interaction but it may come to that in a future version.

@JacobDorman
Copy link

https://github.com/smerrill/vagrant-gatling-rsync works a lot better than the existing implementation. I'm hoping this can be merged into the core, and extended to allow two-way-sync (#3062 (comment))

I simply can't get the performance i'm looking for using VirtualBox synced folders or nfs. Hoping that bidirectional-rsync is the way forward.

current workflow:

  • git applications are cloned into the guest filesystem (per https://github.com/protobox/protobox)
  • after provision these need syncing back to the host
  • development is done on the host but grunt-watch+grunt-livereload is running on the guest
  • files need to be synced back to the guest instantly
  • git operations may be performed on either the host or the guest, so needs to be able to handle syncing the .git/ directories

@emslade
Copy link

emslade commented Apr 7, 2014

On Linux I was struggling with this. Ended up using https://github.com/hollow/inosync instead of rsync-auto which works brilliantly with ~35k files.

@mitchellh
Copy link
Contributor

I updated the listen gem requirement to 2.7.1, since that appears to help performance for OS X a bit.

@e2
Copy link

e2 commented May 18, 2014

First of all, it seems when running vagrant rsync-auto an initial change event is triggered (and in my case logged to the log.txt file) for every single file existing in the shared folder. Does that make sense?

That's Listen creating a snapshot to detect complex changes, like moving whole trees, e.g.
if you were to move "c:\program files" into "c:\Documents and settings", the only events the operating system will tell you about is that "c:\program files as removed" and "c:\documents and settings changed" ...

... so listen compares the directories with it's internal record ("snapshot") and generates additional events - the removing of ALL the files that used to be in c:\program files ... and the addition of all the "new" files in "c:\documents and settings"\Program files.

Why is that? Shouldn't it log it only once?

This depends on a lot of things, e.g. what your editor ACTUALLY is doing (e.g. backups, swap files, moving, renaming, deleting, setting the files to read-only, pid files, ...). Later on (in listener.rb) the duplicates events are removed. In short - saving a single file can generate LOTS of file system events, so it depends what you're doing.

I guess this is just as fast as it gets when using rsync for syncing files?

Add the logging to listener.rb (_wait_for_changes) - because it's until that point that the delay happens (so the real delay is between the change happening in change.rb and being forwarded to the callback in listen.rb).

I don't know how Vagrant uses rsync exactly, so these may not apply, but you could try:

  • disabling compression for Rsync
  • getting RSync to just check file size
  • changing the crypto algorithm to a faster one (blowfish?) [crypto on Windows used to be horribly slow]
  • ignoring files in Listen (editor, swap files, tmp files, generated files, etc.) - the less invocations in Listen, the better
  • more exclude rules for rsync, so it's not comparing irrelevant files

Is this due the md5-hash

MD5 hashing is just a Mac workaround, because the file timestamps there are unreliable. So if it's doing MD5 hashing in Windows, it's a bug.

@thasmo
Copy link

thasmo commented May 18, 2014

Just tested listener.rb and the Windows adapter and those seem fine, both logging almost instantly.
I guess it's just the time it needs to sync the file over rsync from the host to the guest. I've already disabled compression - I will try to change some other arguments later.

@e2
Copy link

e2 commented Jun 4, 2014

For anyone interested in Listen's performance and status: guard/listen#207 (comment)

@e2
Copy link

e2 commented Jun 5, 2014

This is fixed in Listen v2.7.7

The "slowness" was caused by frequent task/thread switching with many sleeping mutexes/conditionals - now hundreds of thousands of files should be "indexed" within seconds (given the files are cached, of course).

@mitchellh
Copy link
Contributor

This is really great to hear. I'll include that in the next release of Vagrant and we should hear feedback pretty soon!

@e2
Copy link

e2 commented Jun 5, 2014

👍

@Taytay
Copy link

Taytay commented Jun 6, 2014

Thanks all!
Is there a way to temporarily upgrade my Vagrant to listen 2.7.7 to see if it improves things? I didn't know if it was as simple as copying a gem somewhere, or if it required more changes.

@jeanmichelcote
Copy link

Just to say, you guys are pretty awesome.
Thanks a bunch.

@geerlingguy
Copy link
Contributor

👍

@alch3m1st
Copy link

@thasmo I have exactly the same problem and it is very annoying.

I will tell you my "solution" (it is not a real one but it is sth). I copy the file I save from IDE (host) to guest every time I save it, automatically.

I use PhpStorm which has a plugin called "File watcher". I created a custom watcher that has the parameters:

Disable immediate syncronization
Filetype:Any
Scope:Project
Program: PATH_TO/pscp.exe
Arguments: -pw YOUR_VM_PASSWORD $FilePath$ vagrant@YOUR_VM_IP:/YOUR_VM_SHARED_FOLDER/$ModuleName$/$/FileRelativeDir$

moduleName (=project Name) must map the folder name inside my shared folder

This does not handle deletes. In this case you must periodically do a vagrant rsync.
The result is an instant update. This should work with any interpreted language.

You can also try : https://github.com/GM-Alex/vagrant-winnfsd (details here http://www.jankowfsky.com/blog/2013/11/28/nfs-for-vagrant-under-windows/)

@andriesss
Copy link

Any idea when the next release is planned? I look forward to this improvement :)

@Globegitter
Copy link

Having the same issue on Mac, would be great to know when the next release with some if the here mentioned improvements is planned.

@timglabisch
Copy link

seems that this works for me:

fswatch -r . | sed -l -E '/(\.git|\.idea|\_\_\_|DS\_Store)/d' | xargs -n1 bash -c "vagrant rsync"

would be great if vagrant rsync would support a path that should be synced. -.-'

@kensykora
Copy link

+1, very small file set (~200 files) and am still seeing ~8 seconds between when I hit save and when rsync-auto picks up and pushes the file over.

@thedotwriter
Copy link

@Taytay You may test it that way: Install Vagrant from source but before installing the gem, try to modify the Listen version via the vagrant.gemspec file. Didn't try it yet but I guess it should work :)

@Globegitter
Copy link

Yeah, I just looked into https://github.com/guard/listen and it seems version 2.7.7 should fix or at the least improve this. Anyone tried it out yet? @mitchellh are there plans to update this for the next vagrant release? Would be really useful.

@Globegitter
Copy link

@mitchellh Is there a reason this is stalling and neither of the last two releases included the update to listen?

@mitchellh
Copy link
Contributor

@Globegitter Just being areful with updates on patch release. 1.7 will update listen.

@mitchellh
Copy link
Contributor

Closing this due to new version of listen. Please check 1.7 once released and if its still slow lets reopen to discuss.

@treyhyde
Copy link

The latency is way down but this still keeps the CPU around 100%. gatling-rsync-auto is still the only viable sync tool.

@e2
Copy link

e2 commented Dec 30, 2014

TL;DR - please report any "slowness" in guard/listen as a new issue - with a full debug output (or at least go through the troubleshooting section).

@treyhyde - your case is probably special. Open an issue in https://github.com/guard/listen and link to a Gist with the output from using LISTEN_GEM_DEBUGGING=2.

Listen works pretty much the same way as gatling-rsync-auto, so it's likely a difference in configuration.

E.g. you could be watching log files or database files, which makes no sense - but can be exactly the reason you're getting 100% CPU usage.

Also, you don't even mention whether you're on OSX or Linux, so I'm assuming you're using OSX (which has a few extra issues to know about).

Again, the LISTEN_GEM_DEBUGGING environment variable is the ONLY way we can even attempt to guess what the problem it.

@tslater
Copy link

tslater commented Apr 28, 2015

I'm having slowness and I ran things with debug on.
I'm wondering if the slowness is related to these errors? I get a bunch of these:
DEBUG -- : unknown: file:/path/to/file ({})
The file exists at that path and rsyncing seems to be working so I'm trying to figure out what the deal is. I CHMOD'ed them to 775 and their owned by my user that is running the command. Any thoughts?

@e2
Copy link

e2 commented Apr 28, 2015

@tslater - open a new issue in https://github.com/guard/listen, and use LISTEN_GEM_DEBUGGING=2 environment variable - this should give you full info what's going on and how long things take. You can then upload the log/output to a gist for analysis. You can also check out the Listen Wiki for additional troubleshooting and ideas (slowness can be caused by many things).

@hashicorp hashicorp locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests