Implements an evented file system monitor #22254

fxn · 2015-11-11T05:25:17Z

Implements an evented file system monitor to asynchronously detect changes in the application source code, routes, locales, etc.

To opt-in load the listen gem in Gemfile:

group :development do
  gem 'listen', '~> 3.0.4'
end

Original work by @puneet24 for GSoC 2015, later iterated by yours truly.

This is the implementation of the file update checker written by Puneet Agarwal for GSoC 2015 (except for the tiny version of the listen gem, which was 3.0.2 in the original patch). Puneet's branch became too out of sync with upstream. This is the final work in one single clean commit. Credit goes in the first line using a convention understood by the contrib app.

…ci skip]

3.0.3 has a bug in OS X.

This commit also bases everything on Pathname internally.

In particular files are no longer created in the current working directory, but in a temporary folder.

Mac OS X tries by all means to hide that /var is /private/var, and that is what FSEvents reports back.

This sucks, but otherwise I get occasional Fs on Mac OS X.

"checker" is the name being used everywhere.

rymai · 2015-11-11T08:52:51Z

❤️ 💚 💓 💛 💙

PapePathe · 2015-11-11T15:33:38Z

+1111100000000

fxposter · 2015-11-14T11:33:56Z

@fxn shouldn't @updated be an atomic boolean, ie: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent/atomic/atomic_boolean.rb. If listen don't block itself, that means that it will call changed method in a different thread and reading and writing in different threads requires variable to be atomic. It should work in MRI due to it's implementation and GIL, but @jruby will need atomic var here. Not sure about @rubinius, but it probably needs atomic too. /cc @headius @brixen

fxn · 2015-11-14T12:32:29Z

For the use case of this monitor it probably does not matter. If you edit a file simultaneously to firing a request... well, you probably can't either know if it should reload or not.

Once the flag is true, it stays true.

Also, dev mode is single-threaded.

But if you guys have a good justification for its need, we could change it.

fxposter · 2015-11-14T12:43:12Z

The problem is that if one thread actually sets true to the variable, then you can't guarantee that the other thread will read true and not the old false value cached somewhere inside the processor's caches. You can try reading http://shipilev.net/blog/2014/jmm-pragmatics/, but in your case I don't see any guarantees that there is any "happens-before" bridge between writing @updated and reading it, which means that it is actually allowed for one thread to "cache" false value forever, even if other threads actually change it.
It is not even close to analyzing impact of "simultaneous requests and files changing" :)

fxn · 2015-11-14T14:20:50Z

@fxposter really? Was not aware of that possibility, I believed it opened the door to a race condition, but that eventually the shared value would be updated. That is, atomic vs not atomic.

The main use case for these monitors is not multi-threaded, but I'll read that document you linked to. Meanwhile, please anybody feel free to chime in.

brixen · 2015-11-14T19:27:01Z

I don't think it's my place to suggest anything about how you want to implement this, but I will clarify that Rubinius does not have a GIL/GVL and that instance variables do not have volatile semantics. If you want multiple threads to have precisely the same view of a memory location at a point in time, you must use a Mutex, memory fence, or other operation that implies synchronizing the view of that memory location across threads.

fxn · 2015-11-14T20:02:44Z

Yes, yes, code that is not thread-safe it is not thread-safe and there may be race conditions and the outcome be non-deterministic. That is clear.

What is beyond my understanding is that without synchronization not only you're exposed to race conditions, but that a stale value can be cached by the CPU forever (as said above). Do synchronization idioms actually flush caches in addition to prevent parallel execution?

Note that this particular code is not expected to be run by multiple threads, because its main use case is app reloading, which uses constant autoloading, which is not thread-safe per se as of this writing. But we could certainly consider making the monitor thread-safe, and in any case I find interesting and would like to understand that observation made by @fxposter (reading the link pending).

fxposter · 2015-11-14T20:13:46Z

That's not a realistic scenario, but if that variable for some reasons is not evicted from CPU cache by something else - CPU does not need to remove it and "update" - that's not how CPUs work. Anyway - that's not the thing I'd actually rely upon. :)

fxn · 2015-11-14T20:19:07Z

@fxposter but the memory is shared. If you do not synchronize there may be race conditions that may affect who writes last (and in a general scenario which value ends up being stored). But sync'ed or not sync'ed, aren't the threads just writing to a memory location like anything else does? What is different here?

fxposter · 2015-11-14T20:22:47Z

@fxn yes, you are right - memory is shared. Not CPU caches. And threads can write to CPU caches and nothing forces cache line to be written to the main memory except synchronization or memory barriers.

fxn · 2015-11-14T23:53:42Z

@fxposter ah! that was the bit I didn't know about, that threads can write to caches without going through memory. I thought the CPU cache was an optimization transparent to programs, whose only interface was memory. Then, a synchronized block forces that?

fxposter · 2015-11-15T10:17:44Z

Yes, synchronized block or volatile (atomic, in our case) variables. But if you want to make it synchronized - you want to use synchronized both for reads and writes.

fxn · 2015-11-15T12:01:51Z

Awesome, thanks @fxposter!

fxposter · 2015-11-15T18:02:49Z

@fxn Thank you for improving Rails! 👍

headius · 2015-11-20T22:31:49Z

FWIW, I know Rails 5+ now depends on concurrent-ruby, which provides facilities for atomic and volatile variables. That's the recommended path forward rather than full synchronization, if you don't need locking.

fxn · 2015-11-21T17:15:38Z

Revised in 49a5b40.

I chose to lock around the changed callback, because in theory there could be two invocations of changed in a moment in which no update has happened. If they enter the body of unless and the one that updates @updated last sets it to false, we would miss a possible true from the first one. Due to the purpose of this class, any true has to be detected, you cannot miss it.

@fxposter does it look good to you?

fxposter · 2015-11-22T10:01:15Z

@fxn Commented in commit. In general - I don't think that we need mutex there if we change implementation a bit, but with current implementation mutex is needed there and it will work.

fxn · 2015-11-22T11:33:42Z

@fxposter It seems we are done. Thanks a lot for your feedback.

The fact that I didn't know reading ivars needed synchronization even if its writer was synchronized tells me I need to dig into portable Ruby multithreading. I'll need to find a trustworthy reference, do you have any recommendation?

fxposter · 2015-11-22T11:53:18Z

Well, it's a tough question... I'd recommend reading stuff on Java Memory Model and C++ Memory Model (the more you read - the better). In terms of people - start with Gil Tene, Martin Thompson, Michael Barker, Aleksey Shipilëv (he is Russian, but sometimes blogs an talks in English), then you'll probably find others (this is 99% about Java and JMM). Jeff Preshing has a lot of interesting stuff in his blog (http://preshing.com/archives/, search "memory", "atomic", "barrier", "sync", "volatile"). Watching talks from Java/C++ conferences related to performance and multithreading also helps.

From the Ruby point of view - jRuby uses JMM and https://github.com/jruby/jruby/wiki/Concurrency-in-jruby. MRI uses GIL, which basically means that there is a happens-before relation between any read and write to shared variable because they are basically "synchronized" by the GIL (it doesn't mean that they are atomic, but it means that writes from one thread are visible to the others). Otherwise, MRI has no specified behaviors of how things should or shouldn't work. :(

Also, you can dig into internals of https://github.com/ruby-concurrency/concurrent-ruby/ and watch @jdantonio's talks on concurrent ruby (there are a couple of them).

fxn · 2015-11-22T14:02:36Z

Awesome reply, thanks very much!!!

jdantonio · 2015-11-22T14:13:48Z

@fxn The talk I gave at RubyConf last week specifically addresses the GIL and why it doesn't make volatility/visibility promises. It may answer your question. The video isn't online at Confreaks yet, but it should be available within a few days. Keep an eye on this page for it to appear.

@fxposter Thanks for the shout-out!

fxposter · 2015-11-22T14:32:29Z

why it doesn't make volatility/visibility promises

@jdantonio wow, so MRI requires synchronization for reading/writing shared vars in different threads? Waiting for your talk too.

jdantonio · 2015-11-22T14:39:11Z

The TL;DR is that MRI makes no guarantees--it has no formal memory model. The GIL implicitly makes most var reads and writes safe, but nothing would prevent future updates from changing that. There are many operations within MRI which silently release the GIL. The best practice is to not depend in the GIL for thread safety.

fxposter · 2015-11-22T14:45:37Z

nothing would prevent future updates from changing that

True, but we have no choice. :)

The best practice is to not depend in the GIL for thread safety.

But thread-safety and visibility are not the same thing.

There are many operations within MRI which silently release the GIL.

This is the most interesting thing. :) But as far as I understand - if you are in ruby-only land (ie: don't use extensions) - you are safe, because there always be a "happens-before" relationship between writing var in one thread, and reading in the other, isn't it?

jdantonio · 2015-11-22T14:56:21Z

Yes, MRI has happens-before semantics, but not a guarantee. That's a side-effect. If you are only writing for MRI you generally don't have to worry. My personal preference is to use the atomic variables in concurrent-ruby. Then you can take advantage of our stronger guarantees and also support the other runtimes. But I'm biased. :-)

fxn added 30 commits November 8, 2015 22:49

initial edit pass over the evented file checker patch

087a79a

revises the implementation of the evented file monitor

0736944

move the listen gem in the Gemfile to the development group

c27caee

no need to have access to the listener

0462329

remove explicit File.expand_path call

b685095

document the evented file system monitor opt-in in the config guide […

7d2ae13

…ci skip]

upgrade listen to 3.0.4

d47b982

3.0.3 has a bug in OS X.

remove unused constants in the file monitor suites

cfb4875

stop ascending at the longest common subpath

a62387d

This commit also bases everything on Pathname internally.

s/@modified/@updated/g

7ea3207

let listen stop all listeners on teardown

e8fbf68

refactors the file monitors test suite

60de179

In particular files are no longer created in the current working directory, but in a temporary folder.

renames the monitor creation helper

e972004

encapsulate sleep margin when touching files

dda54de

create the tmpdir under test

e201b9d

Mac OS X tries by all means to hide that /var is /private/var, and that is what FSEvents reports back.

more ad-hoc sleeps

19e54ba

This sucks, but otherwise I get occasional Fs on Mac OS X.

s/watcher/checker/g

ab0c915

"checker" is the name being used everywhere.

adds more tests for the file monitors

9a59bea

improves waiting in the file monitors suite

a22d431

editorial pass over test method names

3cd4775

adds another test case in the monitors suite

9fa366a

better tmp file management in the monitors suite

8a64824

the evented monitor filters out descendants

eda503c

simplifies PathHelper with a Pathname refinement

8793f77

renames the module with shared tests for file monitors

b7c8ce5

encapsulates the logic to choose the file monitor in app config

bf4532d

indents private methods as per our guidelines

4e1cff0

simplifies the implementation of existing parent

fe918b3

rewrites bare loop as until

0a2e9b0

chancancode mentioned this pull request Nov 28, 2015

Gsoc - Evented File system Monitoring #21763

Closed

rubys mentioned this pull request Jan 27, 2016

Adding new images requires a server restart in development rails/sprockets-rails#321

Closed

mikhailov mentioned this pull request Feb 6, 2016

Speed up ActionDispatch::Reloader for static assets. Fix #23510 #23518

Closed

benaubin mentioned this pull request Sep 14, 2016

The listen gem breaks my laptop #26158

Closed

Implements an evented file system monitor #22254

Implements an evented file system monitor #22254

Uh oh!

Conversation

fxn commented Nov 11, 2015

Uh oh!

rymai commented Nov 11, 2015

Uh oh!

PapePathe commented Nov 11, 2015

Uh oh!

fxposter commented Nov 14, 2015

Uh oh!

fxn commented Nov 14, 2015

Uh oh!

fxposter commented Nov 14, 2015

Uh oh!

fxn commented Nov 14, 2015

Uh oh!

brixen commented Nov 14, 2015

Uh oh!

fxn commented Nov 14, 2015

Uh oh!

fxposter commented Nov 14, 2015

Uh oh!

fxn commented Nov 14, 2015

Uh oh!

fxposter commented Nov 14, 2015

Uh oh!

fxn commented Nov 14, 2015

Uh oh!

fxposter commented Nov 15, 2015

Uh oh!

fxn commented Nov 15, 2015

Uh oh!

fxposter commented Nov 15, 2015

Uh oh!

headius commented Nov 20, 2015

Uh oh!

fxn commented Nov 21, 2015

Uh oh!

fxposter commented Nov 22, 2015

Uh oh!

fxn commented Nov 22, 2015

Uh oh!

fxposter commented Nov 22, 2015

Uh oh!

fxn commented Nov 22, 2015

Uh oh!

jdantonio commented Nov 22, 2015

Uh oh!

fxposter commented Nov 22, 2015

Uh oh!

jdantonio commented Nov 22, 2015

Uh oh!

fxposter commented Nov 22, 2015

Uh oh!

jdantonio commented Nov 22, 2015

Uh oh!

Uh oh!