Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rsync watcher is really slow #3249

Closed
patrickheeney opened this issue Mar 17, 2014 · 70 comments
Closed

Rsync watcher is really slow #3249

patrickheeney opened this issue Mar 17, 2014 · 70 comments

Comments

@patrickheeney
Copy link

@patrickheeney patrickheeney commented Mar 17, 2014

Running version 1.5.1. The issue I have now is the rsync watcher is incredibly slow. It takes at least several minutes to detect that a file has changed and initiate rsync.

After about 10 minutes sometimes it goes rsync crazy and does 7 or 8 in a row.

Some others have issues #3159 (comment).

Also it may be helpful to have a whitelist of folders to watch. In my case I only need to watch the www/* folder as the rest are irrelevant:

config.vm.synced_folder ".", "/srv/www/", type: "rsync", rsync__include: ["www/"], rsync__exclude: [".git/", "tmp/", "cache/"]

(My original issue here was the fact that I had a packer .iso file and rsync was not outputting any progress so it looks like it was hanging for 15 minutes.)

@patrickheeney patrickheeney changed the title Rsync Issues / Experience Rsync watcher is really slow Mar 17, 2014
@drpebcak
Copy link

@drpebcak drpebcak commented Mar 17, 2014

👍 for this issue. #3108 has other examples of people hitting this slowness, as well as one person's workaround using grunt.

I'm also having other issues with rsync_auto, documented here: #3196

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

I can confirm this - it takes a minute or more to detect a file change before rsync-auto fires for myself and a coworker under OS X 10.9.2. It's also possible to save a file a few times before an rsync has fired and then multiple rsyncs happen in quick succession when it "catches up." It seems that OS X might batch up these events before sending them to rsync-auto?

@drpebcak
Copy link

@drpebcak drpebcak commented Mar 18, 2014

Supposedly vagrant watches fsevents for this. You can use fseventer to see how quickly these appear on osx's side.

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

Yes - I can confirm this. I added a puts _latency line before https://github.com/guard/listen/blob/master/lib/listen/adapter/darwin.rb#L30 in my copy of Vagrant 1.5.1, and the puts fires within 100ms or so of saving a file, but Vagrant takes considerably longer pick it up.

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

Interestingly, while this is happening ruby from the vagrant rsync-auto process shows up in my iStat Menus and Activity Monitor as using between 98% and 102% CPU.

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

The CPU usage starts as soon as a file is saved and does not stop until after the rsync has completed.

1___zsh_tmux_plugin_run_new__tmux_

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

dtruss shows that the rsync-auto process is doing an open() and a read() on every file being watched after every fsevent that it receives.

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

This makes sense - it is listen computing md5 hashes for all the files. Would it be possible to try upgrading the bundled version of listen to at least 2.6.0? According to guard/listen#184 (comment), it may lazily load MD5 hashes better past this version.

@patrickheeney
Copy link
Author

@patrickheeney patrickheeney commented Mar 18, 2014

That thread is concerning. It seems to struggle with 20k files? I have
projects with much more than that. I thought vagrant rsync was going to be
more performant than nfs according to the blog post. Once you factor in the
watcher it is way off the marl.
On Mar 17, 2014 8:20 PM, "Steven Merrill" notifications@github.com wrote:

This makes sense - it is listen computing md5 hashes for all the files.
Would it be possible to try upgrading the bundled version of listen to at
least 2.6.0? According to guard/listen#184 (comment)guard/listen#184 (comment),
it may lazily load MD5 hashes better past this version.

Reply to this email directly or view it on GitHubhttps://github.com//issues/3249#issuecomment-37896132
.

@mitchellh
Copy link
Member

@mitchellh mitchellh commented Mar 18, 2014

Hm this is very odd. Upgrading to 2.6.0 will certainly help. I'll take a look and see what parts may be slow.

@patrickheeney rsync specifically optimizes file read/write performance at the expense of syncing latency. However, this is a lot of sync latency so I'll take a look.

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

In looking at how guard and guard-shell work, I noticed that new files don't get picked up, and it uses the md5 comparison to ensure that file contents actually change before syncing, whereas a tool like http://www.fernlightning.com/doku.php?id=software:fseventer:start picks up file creation and writes in a matter of milliseconds. It would be nice if there were a configuration for Listen:Adapter:Darwin that would call its callbacks unconditionally upon getting fsevents representing file creation or writes to files without checking the checksum. Perhaps Vagrant could then expose that option.

@patrickheeney
Copy link
Author

@patrickheeney patrickheeney commented Mar 18, 2014

@mitchellh Ya totally get the syncing latency, I was just not sure what an acceptable range would be. I am somewhat spoiled it seems with gulpjs and grunt in that everything happens in a less than a second which is faster than I can alt+tab+refresh.

It seems as though the amount of files it watches has a crucial impact so I definitely hope we can narrow this down and have something configurable. For example my frontend guys are not going to be editing a lot of backend code so doing something like this vagrant rsync-auto --path=web/content/*.js would be great. Essentially a whitelist for directories on a global scale and CLI arguments on a local scale. The fewer files we seem to watch the faster it is going to be. Excluding hundreds of directory structures seems like a frustrating approach.

@smerrill
Copy link

@smerrill smerrill commented Mar 18, 2014

I've had really good luck with a script just like this:

require 'rb-fsevent'

options = {:latency => 1.5, :no_defer => false }

fsevent = FSEvent.new
fsevent.watch "/path/to/my/project", options do |directories|
  if directories.select { |i| !i.match("\.(git|idea|svn|hg|cvs)") }.empty?
    puts "Discarding these changes: #{directories.inspect}"
  else
    puts "Detected change inside: #{directories.inspect}"
    system("vagrant rsync")
  end
end
fsevent.run

I know it isn't as cross-platform as listen, but that works very well with a 28k+ file directory (and also has the side-effect of showing off the amount of fsevents that happen in the various .git directories in a project as I'm working on it.)

@ThePixelDeveloper
Copy link

@ThePixelDeveloper ThePixelDeveloper commented Mar 18, 2014

Throwing it out there. Can this problem not be solved with the rsync daemon? Bootstrap the rsync daemon on the box and then run rsync host side for a speedier transfer. I'm not too familiar with the advantages of the daemon so feel free to say if this isn't going to work out.

@arnaudbreton
Copy link

@arnaudbreton arnaudbreton commented Mar 19, 2014

As mentioned in #3159 it seems the main issue with Guard/Listen is that it's a blacklist instead of a whitelist like grunt watch does (see. https://gist.github.com/arnaudbreton/9517344), which explains the perf. difference when using the last.

The solution offered by @patrickheeney with the path option is a good option, combined with a way to specify these path directly in the Vagrantfile.

@patrickheeney
Copy link
Author

@patrickheeney patrickheeney commented Mar 19, 2014

@arnaudbreton My solution was the band-aid approach. It should speed things up but it appears there are deeper issues here that @smerrill discovered. Ideally both would be addressed.

@smerrill
Copy link

@smerrill smerrill commented Mar 19, 2014

@ThePixelDeveloper The problem is on the host side, where monitoring for changes is quite slow. The rsync daemon might help to lower transactional CPU usage, though, so it would probably be worth looking at as a new option for rsynced folders in another issue.

@dougmarcey
Copy link

@dougmarcey dougmarcey commented Mar 19, 2014

I work with @smerrill and have tweaked the script a bit to read from my Vagrantfile.

#!/usr/bin/env ruby
require 'rb-fsevent'

options = {:latency => 1.5, :no_defer => false }
pathRegex = /^(?!\s*#).*config.vm.synced_folder\s*"(.*?)"\s*,\s*".*?"\s*,.*?type:\s*"rsync".*/
paths = Array.new
File.open('Vagrantfile').each_line do |line|
    paths << File.expand_path($1) if pathRegex.match(line)
end
puts "Watching: #{paths}"
fsevent = FSEvent.new
fsevent.watch paths, options do |directories|
  if not directories.select { |i| !i.match("\.(git|idea|svn|hg|cvs)") }.empty?
    puts "Detected change inside: #{directories.inspect}"
    system("vagrant rsync")
  end
end
fsevent.run

I've dropped this into the directory with my Vagrantfile. I am running it via vagrant up && ./watch.rb. It's doing fine on two directories with about 96,500 files total. No killing my performance or IO, and responds within the latency to every change I've done so far. Only works on mac, and YMMV, but it has helped me a lot today.

@djdevin
Copy link

@djdevin djdevin commented Mar 20, 2014

I actually tried using the rsync daemon while trying to fix the latency issues, and for me it didn't seem to matter from benchmarks, with the latency hovering around 1 second over either rsync or rsync+ssh with 35,000 files. Surprisingly the daemon actually takes a little longer when dealing with 75,000+ (2-3s vs 1-2s) and I'm not sure why. I ended up using an simple inotify-tools wrapper to watch the current directory and run the rsync command on any change. This works really well for me with 35,000 files and finally solves our NFS woes.

I think another bottleneck here is that when the number of files grows, you end up having watchers on all the files, then when those detect a change, an rsync command runs that tries to detect changes again and sync the entire tree. Something like Unison might work better here on large trees because (someone correct me if I'm wrong) it has the same watches BUT does not try to resync the entire tree on a change but only sends the affected files.

@ThePixelDeveloper
Copy link

@ThePixelDeveloper ThePixelDeveloper commented Mar 20, 2014

Ah ok, glad someone else looked into the daemon.

I took a similar solution to many of the other people here. I created a nodejs listener script that listens for changes and pings off a request to a simple server in the VM which does a rebuild of the assets. Works really well, even though it's an icky hack.

@smerrill
Copy link

@smerrill smerrill commented Mar 24, 2014

@dougmarcey and I have released an alpha implementation of a lighter-weight rsync-auto command that uses the same rb-fsevent and rb-inotify libraries under the covers: https://github.com/smerrill/vagrant-gatling-rsync .

It's been moderately tested on OS X and more lightly tested under Linux. It also outputs copious log messages if you run it with VAGRANT_LOG=info vagrant gatling-rsync-auto.

We'd love your feedback if you want to try it out.

@mitchellh
Copy link
Member

@mitchellh mitchellh commented Mar 24, 2014

@smerrill Awesome, if this avenue turns out to be much more performant, I may end up wanting to merge your plugin into core! Wouldn't make sense for a 1.5.x so I'll keep continuing to try to improve the Listen interaction but it may come to that in a future version.

@JacobDorman
Copy link

@JacobDorman JacobDorman commented Mar 28, 2014

https://github.com/smerrill/vagrant-gatling-rsync works a lot better than the existing implementation. I'm hoping this can be merged into the core, and extended to allow two-way-sync (#3062 (comment))

I simply can't get the performance i'm looking for using VirtualBox synced folders or nfs. Hoping that bidirectional-rsync is the way forward.

current workflow:

  • git applications are cloned into the guest filesystem (per https://github.com/protobox/protobox)
  • after provision these need syncing back to the host
  • development is done on the host but grunt-watch+grunt-livereload is running on the guest
  • files need to be synced back to the guest instantly
  • git operations may be performed on either the host or the guest, so needs to be able to handle syncing the .git/ directories
@emslade
Copy link

@emslade emslade commented Apr 7, 2014

On Linux I was struggling with this. Ended up using https://github.com/hollow/inosync instead of rsync-auto which works brilliantly with ~35k files.

mitchellh added a commit that referenced this issue Apr 9, 2014
@mitchellh
Copy link
Member

@mitchellh mitchellh commented Apr 9, 2014

I updated the listen gem requirement to 2.7.1, since that appears to help performance for OS X a bit.

@e2
Copy link

@e2 e2 commented May 18, 2014

First of all, it seems when running vagrant rsync-auto an initial change event is triggered (and in my case logged to the log.txt file) for every single file existing in the shared folder. Does that make sense?

That's Listen creating a snapshot to detect complex changes, like moving whole trees, e.g.
if you were to move "c:\program files" into "c:\Documents and settings", the only events the operating system will tell you about is that "c:\program files as removed" and "c:\documents and settings changed" ...

... so listen compares the directories with it's internal record ("snapshot") and generates additional events - the removing of ALL the files that used to be in c:\program files ... and the addition of all the "new" files in "c:\documents and settings"\Program files.

Why is that? Shouldn't it log it only once?

This depends on a lot of things, e.g. what your editor ACTUALLY is doing (e.g. backups, swap files, moving, renaming, deleting, setting the files to read-only, pid files, ...). Later on (in listener.rb) the duplicates events are removed. In short - saving a single file can generate LOTS of file system events, so it depends what you're doing.

I guess this is just as fast as it gets when using rsync for syncing files?

Add the logging to listener.rb (_wait_for_changes) - because it's until that point that the delay happens (so the real delay is between the change happening in change.rb and being forwarded to the callback in listen.rb).

I don't know how Vagrant uses rsync exactly, so these may not apply, but you could try:

  • disabling compression for Rsync
  • getting RSync to just check file size
  • changing the crypto algorithm to a faster one (blowfish?) [crypto on Windows used to be horribly slow]
  • ignoring files in Listen (editor, swap files, tmp files, generated files, etc.) - the less invocations in Listen, the better
  • more exclude rules for rsync, so it's not comparing irrelevant files

Is this due the md5-hash

MD5 hashing is just a Mac workaround, because the file timestamps there are unreliable. So if it's doing MD5 hashing in Windows, it's a bug.

@thasmo
Copy link

@thasmo thasmo commented May 18, 2014

Just tested listener.rb and the Windows adapter and those seem fine, both logging almost instantly.
I guess it's just the time it needs to sync the file over rsync from the host to the guest. I've already disabled compression - I will try to change some other arguments later.

@e2
Copy link

@e2 e2 commented Jun 4, 2014

For anyone interested in Listen's performance and status: guard/listen#207 (comment)

@e2
Copy link

@e2 e2 commented Jun 5, 2014

This is fixed in Listen v2.7.7

The "slowness" was caused by frequent task/thread switching with many sleeping mutexes/conditionals - now hundreds of thousands of files should be "indexed" within seconds (given the files are cached, of course).

@mitchellh
Copy link
Member

@mitchellh mitchellh commented Jun 5, 2014

This is really great to hear. I'll include that in the next release of Vagrant and we should hear feedback pretty soon!

@e2
Copy link

@e2 e2 commented Jun 5, 2014

👍

@Taytay
Copy link

@Taytay Taytay commented Jun 6, 2014

Thanks all!
Is there a way to temporarily upgrade my Vagrant to listen 2.7.7 to see if it improves things? I didn't know if it was as simple as copying a gem somewhere, or if it required more changes.

@jeanmichelcote
Copy link

@jeanmichelcote jeanmichelcote commented Jun 6, 2014

Just to say, you guys are pretty awesome.
Thanks a bunch.

@geerlingguy
Copy link
Contributor

@geerlingguy geerlingguy commented Jun 6, 2014

👍

@alch3m1st
Copy link

@alch3m1st alch3m1st commented Jun 28, 2014

@thasmo I have exactly the same problem and it is very annoying.

I will tell you my "solution" (it is not a real one but it is sth). I copy the file I save from IDE (host) to guest every time I save it, automatically.

I use PhpStorm which has a plugin called "File watcher". I created a custom watcher that has the parameters:

Disable immediate syncronization
Filetype:Any
Scope:Project
Program: PATH_TO/pscp.exe
Arguments: -pw YOUR_VM_PASSWORD $FilePath$ vagrant@YOUR_VM_IP:/YOUR_VM_SHARED_FOLDER/$ModuleName$/$/FileRelativeDir$

moduleName (=project Name) must map the folder name inside my shared folder

This does not handle deletes. In this case you must periodically do a vagrant rsync.
The result is an instant update. This should work with any interpreted language.

You can also try : https://github.com/GM-Alex/vagrant-winnfsd (details here http://www.jankowfsky.com/blog/2013/11/28/nfs-for-vagrant-under-windows/)

@andriesss
Copy link

@andriesss andriesss commented Jul 3, 2014

Any idea when the next release is planned? I look forward to this improvement :)

@Globegitter
Copy link

@Globegitter Globegitter commented Jul 3, 2014

Having the same issue on Mac, would be great to know when the next release with some if the here mentioned improvements is planned.

@timglabisch
Copy link

@timglabisch timglabisch commented Jul 24, 2014

seems that this works for me:

fswatch -r . | sed -l -E '/(\.git|\.idea|\_\_\_|DS\_Store)/d' | xargs -n1 bash -c "vagrant rsync"

would be great if vagrant rsync would support a path that should be synced. -.-'

@kensykora
Copy link

@kensykora kensykora commented Jul 31, 2014

+1, very small file set (~200 files) and am still seeing ~8 seconds between when I hit save and when rsync-auto picks up and pushes the file over.

@thedotwriter
Copy link

@thedotwriter thedotwriter commented Aug 12, 2014

@Taytay You may test it that way: Install Vagrant from source but before installing the gem, try to modify the Listen version via the vagrant.gemspec file. Didn't try it yet but I guess it should work :)

@Globegitter
Copy link

@Globegitter Globegitter commented Aug 12, 2014

Yeah, I just looked into https://github.com/guard/listen and it seems version 2.7.7 should fix or at the least improve this. Anyone tried it out yet? @mitchellh are there plans to update this for the next vagrant release? Would be really useful.

@Globegitter
Copy link

@Globegitter Globegitter commented Sep 25, 2014

@mitchellh Is there a reason this is stalling and neither of the last two releases included the update to listen?

@mitchellh
Copy link
Member

@mitchellh mitchellh commented Oct 24, 2014

@Globegitter Just being areful with updates on patch release. 1.7 will update listen.

@mitchellh
Copy link
Member

@mitchellh mitchellh commented Oct 24, 2014

Closing this due to new version of listen. Please check 1.7 once released and if its still slow lets reopen to discuss.

@mitchellh mitchellh closed this Oct 24, 2014
@treyhyde
Copy link

@treyhyde treyhyde commented Dec 30, 2014

The latency is way down but this still keeps the CPU around 100%. gatling-rsync-auto is still the only viable sync tool.

@e2
Copy link

@e2 e2 commented Dec 30, 2014

TL;DR - please report any "slowness" in guard/listen as a new issue - with a full debug output (or at least go through the troubleshooting section).

@treyhyde - your case is probably special. Open an issue in https://github.com/guard/listen and link to a Gist with the output from using LISTEN_GEM_DEBUGGING=2.

Listen works pretty much the same way as gatling-rsync-auto, so it's likely a difference in configuration.

E.g. you could be watching log files or database files, which makes no sense - but can be exactly the reason you're getting 100% CPU usage.

Also, you don't even mention whether you're on OSX or Linux, so I'm assuming you're using OSX (which has a few extra issues to know about).

Again, the LISTEN_GEM_DEBUGGING environment variable is the ONLY way we can even attempt to guess what the problem it.

@tslater
Copy link

@tslater tslater commented Apr 28, 2015

I'm having slowness and I ran things with debug on.
I'm wondering if the slowness is related to these errors? I get a bunch of these:
DEBUG -- : unknown: file:/path/to/file ({})
The file exists at that path and rsyncing seems to be working so I'm trying to figure out what the deal is. I CHMOD'ed them to 775 and their owned by my user that is running the command. Any thoughts?

@e2
Copy link

@e2 e2 commented Apr 28, 2015

@tslater - open a new issue in https://github.com/guard/listen, and use LISTEN_GEM_DEBUGGING=2 environment variable - this should give you full info what's going on and how long things take. You can then upload the log/output to a gist for analysis. You can also check out the Listen Wiki for additional troubleshooting and ideas (slowness can be caused by many things).

@hashibot hashibot bot locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet