New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements an evented file system monitor #22254

Merged
merged 36 commits into from Nov 11, 2015

Conversation

Projects
None yet
8 participants
@fxn
Member

fxn commented Nov 11, 2015

Implements an evented file system monitor to asynchronously detect changes in the application source code, routes, locales, etc.

To opt-in load the listen gem in Gemfile:

group :development do
  gem 'listen', '~> 3.0.4'
end

Original work by @puneet24 for GSoC 2015, later iterated by yours truly.

fxn added some commits Oct 12, 2015

implements an evented file update checker [Puneet Agarwal]
This is the implementation of the file update checker written
by Puneet Agarwal for GSoC 2015 (except for the tiny version
of the listen gem, which was 3.0.2 in the original patch).

Puneet's branch became too out of sync with upstream. This is
the final work in one single clean commit.

Credit goes in the first line using a convention understood
by the contrib app.
upgrade listen to 3.0.4
3.0.3 has a bug in OS X.
stop ascending at the longest common subpath
This commit also bases everything on Pathname internally.
refactors the file monitors test suite
In particular files are no longer created in the current working directory,
but in a temporary folder.
create the tmpdir under test
Mac OS X tries by all means to hide that /var is /private/var, and that is
what FSEvents reports back.
more ad-hoc sleeps
This sucks, but otherwise I get occasional Fs on Mac OS X.
s/watcher/checker/g
"checker" is the name being used everywhere.
@rafaelfranca

View changes

Show outdated Hide outdated activesupport/lib/active_support/file_evented_update_checker.rb
@rafaelfranca

View changes

Show outdated Hide outdated activesupport/lib/active_support/file_evented_update_checker.rb
class PathHelper
using Module.new {
refine Pathname do

This comment has been minimized.

@rafaelfranca
@rafaelfranca
@rafaelfranca

View changes

Show outdated Hide outdated activesupport/test/file_update_checker_shared_tests.rb
@rafaelfranca

View changes

Show outdated Hide outdated activesupport/test/file_update_checker_test.rb
[for *BSD](https://github.com/guard/listen#on-bsd) and
[for Windows](https://github.com/guard/listen#on-windows).
Note that [some setups are unsupported](https://github.com/guard/listen#issues--limitations).

This comment has been minimized.

@rafaelfranca

rafaelfranca Nov 11, 2015

Member

There is an extra space before the [

@rafaelfranca

rafaelfranca Nov 11, 2015

Member

There is an extra space before the [

@rafaelfranca

This comment has been minimized.

Show comment
Hide comment
@rafaelfranca

rafaelfranca Nov 11, 2015

Member

Implementation looks good to me :shipit:

Awesome work @fxn @puneet24!

Member

rafaelfranca commented Nov 11, 2015

Implementation looks good to me :shipit:

Awesome work @fxn @puneet24!

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 11, 2015

Member

@rafaelfranca thanks for the review ❤️.

Member

fxn commented Nov 11, 2015

@rafaelfranca thanks for the review ❤️.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 11, 2015

Member

Merged.

Member

fxn commented Nov 11, 2015

Merged.

@fxn fxn closed this Nov 11, 2015

@fxn fxn merged commit 7edc8d9 into rails:master Nov 11, 2015

@rymai

This comment has been minimized.

Show comment
Hide comment
@rymai

rymai Nov 11, 2015

Contributor

❤️ 💚 💓 💛 💙

Contributor

rymai commented Nov 11, 2015

❤️ 💚 💓 💛 💙

@PapePathe

This comment has been minimized.

Show comment
Hide comment
@PapePathe

PapePathe Nov 11, 2015

+1111100000000

PapePathe commented Nov 11, 2015

+1111100000000

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 14, 2015

Contributor

@fxn shouldn't @updated be an atomic boolean, ie: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent/atomic/atomic_boolean.rb. If listen don't block itself, that means that it will call changed method in a different thread and reading and writing in different threads requires variable to be atomic. It should work in MRI due to it's implementation and GIL, but @jruby will need atomic var here. Not sure about @rubinius, but it probably needs atomic too. /cc @headius @brixen

Contributor

fxposter commented Nov 14, 2015

@fxn shouldn't @updated be an atomic boolean, ie: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent/atomic/atomic_boolean.rb. If listen don't block itself, that means that it will call changed method in a different thread and reading and writing in different threads requires variable to be atomic. It should work in MRI due to it's implementation and GIL, but @jruby will need atomic var here. Not sure about @rubinius, but it probably needs atomic too. /cc @headius @brixen

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 14, 2015

Member

For the use case of this monitor it probably does not matter. If you edit a file simultaneously to firing a request... well, you probably can't either know if it should reload or not.

Once the flag is true, it stays true.

Also, dev mode is single-threaded.

But if you guys have a good justification for its need, we could change it.

Member

fxn commented Nov 14, 2015

For the use case of this monitor it probably does not matter. If you edit a file simultaneously to firing a request... well, you probably can't either know if it should reload or not.

Once the flag is true, it stays true.

Also, dev mode is single-threaded.

But if you guys have a good justification for its need, we could change it.

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 14, 2015

Contributor

The problem is that if one thread actually sets true to the variable, then you can't guarantee that the other thread will read true and not the old false value cached somewhere inside the processor's caches. You can try reading http://shipilev.net/blog/2014/jmm-pragmatics/, but in your case I don't see any guarantees that there is any "happens-before" bridge between writing @updated and reading it, which means that it is actually allowed for one thread to "cache" false value forever, even if other threads actually change it.
It is not even close to analyzing impact of "simultaneous requests and files changing" :)

Contributor

fxposter commented Nov 14, 2015

The problem is that if one thread actually sets true to the variable, then you can't guarantee that the other thread will read true and not the old false value cached somewhere inside the processor's caches. You can try reading http://shipilev.net/blog/2014/jmm-pragmatics/, but in your case I don't see any guarantees that there is any "happens-before" bridge between writing @updated and reading it, which means that it is actually allowed for one thread to "cache" false value forever, even if other threads actually change it.
It is not even close to analyzing impact of "simultaneous requests and files changing" :)

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 14, 2015

Member

@fxposter really? Was not aware of that possibility, I believed it opened the door to a race condition, but that eventually the shared value would be updated. That is, atomic vs not atomic.

The main use case for these monitors is not multi-threaded, but I'll read that document you linked to. Meanwhile, please anybody feel free to chime in.

Member

fxn commented Nov 14, 2015

@fxposter really? Was not aware of that possibility, I believed it opened the door to a race condition, but that eventually the shared value would be updated. That is, atomic vs not atomic.

The main use case for these monitors is not multi-threaded, but I'll read that document you linked to. Meanwhile, please anybody feel free to chime in.

@brixen

This comment has been minimized.

Show comment
Hide comment
@brixen

brixen Nov 14, 2015

I don't think it's my place to suggest anything about how you want to implement this, but I will clarify that Rubinius does not have a GIL/GVL and that instance variables do not have volatile semantics. If you want multiple threads to have precisely the same view of a memory location at a point in time, you must use a Mutex, memory fence, or other operation that implies synchronizing the view of that memory location across threads.

brixen commented Nov 14, 2015

I don't think it's my place to suggest anything about how you want to implement this, but I will clarify that Rubinius does not have a GIL/GVL and that instance variables do not have volatile semantics. If you want multiple threads to have precisely the same view of a memory location at a point in time, you must use a Mutex, memory fence, or other operation that implies synchronizing the view of that memory location across threads.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 14, 2015

Member

Yes, yes, code that is not thread-safe it is not thread-safe and there may be race conditions and the outcome be non-deterministic. That is clear.

What is beyond my understanding is that without synchronization not only you're exposed to race conditions, but that a stale value can be cached by the CPU forever (as said above). Do synchronization idioms actually flush caches in addition to prevent parallel execution?

Note that this particular code is not expected to be run by multiple threads, because its main use case is app reloading, which uses constant autoloading, which is not thread-safe per se as of this writing. But we could certainly consider making the monitor thread-safe, and in any case I find interesting and would like to understand that observation made by @fxposter (reading the link pending).

Member

fxn commented Nov 14, 2015

Yes, yes, code that is not thread-safe it is not thread-safe and there may be race conditions and the outcome be non-deterministic. That is clear.

What is beyond my understanding is that without synchronization not only you're exposed to race conditions, but that a stale value can be cached by the CPU forever (as said above). Do synchronization idioms actually flush caches in addition to prevent parallel execution?

Note that this particular code is not expected to be run by multiple threads, because its main use case is app reloading, which uses constant autoloading, which is not thread-safe per se as of this writing. But we could certainly consider making the monitor thread-safe, and in any case I find interesting and would like to understand that observation made by @fxposter (reading the link pending).

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 14, 2015

Contributor

That's not a realistic scenario, but if that variable for some reasons is not evicted from CPU cache by something else - CPU does not need to remove it and "update" - that's not how CPUs work. Anyway - that's not the thing I'd actually rely upon. :)

Contributor

fxposter commented Nov 14, 2015

That's not a realistic scenario, but if that variable for some reasons is not evicted from CPU cache by something else - CPU does not need to remove it and "update" - that's not how CPUs work. Anyway - that's not the thing I'd actually rely upon. :)

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 14, 2015

Member

@fxposter but the memory is shared. If you do not synchronize there may be race conditions that may affect who writes last (and in a general scenario which value ends up being stored). But sync'ed or not sync'ed, aren't the threads just writing to a memory location like anything else does? What is different here?

Member

fxn commented Nov 14, 2015

@fxposter but the memory is shared. If you do not synchronize there may be race conditions that may affect who writes last (and in a general scenario which value ends up being stored). But sync'ed or not sync'ed, aren't the threads just writing to a memory location like anything else does? What is different here?

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 14, 2015

Contributor

@fxn yes, you are right - memory is shared. Not CPU caches. And threads can write to CPU caches and nothing forces cache line to be written to the main memory except synchronization or memory barriers.

Contributor

fxposter commented Nov 14, 2015

@fxn yes, you are right - memory is shared. Not CPU caches. And threads can write to CPU caches and nothing forces cache line to be written to the main memory except synchronization or memory barriers.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 14, 2015

Member

@fxposter ah! that was the bit I didn't know about, that threads can write to caches without going through memory. I thought the CPU cache was an optimization transparent to programs, whose only interface was memory. Then, a synchronized block forces that?

Member

fxn commented Nov 14, 2015

@fxposter ah! that was the bit I didn't know about, that threads can write to caches without going through memory. I thought the CPU cache was an optimization transparent to programs, whose only interface was memory. Then, a synchronized block forces that?

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 15, 2015

Contributor

Yes, synchronized block or volatile (atomic, in our case) variables. But if you want to make it synchronized - you want to use synchronized both for reads and writes.

Contributor

fxposter commented Nov 15, 2015

Yes, synchronized block or volatile (atomic, in our case) variables. But if you want to make it synchronized - you want to use synchronized both for reads and writes.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 15, 2015

Member

Awesome, thanks @fxposter!

Member

fxn commented Nov 15, 2015

Awesome, thanks @fxposter!

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 15, 2015

Contributor

@fxn Thank you for improving Rails! 👍

Contributor

fxposter commented Nov 15, 2015

@fxn Thank you for improving Rails! 👍

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 20, 2015

Contributor

FWIW, I know Rails 5+ now depends on concurrent-ruby, which provides facilities for atomic and volatile variables. That's the recommended path forward rather than full synchronization, if you don't need locking.

Contributor

headius commented Nov 20, 2015

FWIW, I know Rails 5+ now depends on concurrent-ruby, which provides facilities for atomic and volatile variables. That's the recommended path forward rather than full synchronization, if you don't need locking.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 21, 2015

Member

Revised in 49a5b40.

I chose to lock around the changed callback, because in theory there could be two invocations of changed in a moment in which no update has happened. If they enter the body of unless and the one that updates @updated last sets it to false, we would miss a possible true from the first one. Due to the purpose of this class, any true has to be detected, you cannot miss it.

@fxposter does it look good to you?

Member

fxn commented Nov 21, 2015

Revised in 49a5b40.

I chose to lock around the changed callback, because in theory there could be two invocations of changed in a moment in which no update has happened. If they enter the body of unless and the one that updates @updated last sets it to false, we would miss a possible true from the first one. Due to the purpose of this class, any true has to be detected, you cannot miss it.

@fxposter does it look good to you?

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 22, 2015

Contributor

@fxn Commented in commit. In general - I don't think that we need mutex there if we change implementation a bit, but with current implementation mutex is needed there and it will work.

Contributor

fxposter commented Nov 22, 2015

@fxn Commented in commit. In general - I don't think that we need mutex there if we change implementation a bit, but with current implementation mutex is needed there and it will work.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 22, 2015

Member

@fxposter It seems we are done. Thanks a lot for your feedback.

The fact that I didn't know reading ivars needed synchronization even if its writer was synchronized tells me I need to dig into portable Ruby multithreading. I'll need to find a trustworthy reference, do you have any recommendation?

Member

fxn commented Nov 22, 2015

@fxposter It seems we are done. Thanks a lot for your feedback.

The fact that I didn't know reading ivars needed synchronization even if its writer was synchronized tells me I need to dig into portable Ruby multithreading. I'll need to find a trustworthy reference, do you have any recommendation?

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 22, 2015

Contributor

Well, it's a tough question... I'd recommend reading stuff on Java Memory Model and C++ Memory Model (the more you read - the better). In terms of people - start with Gil Tene, Martin Thompson, Michael Barker, Aleksey Shipilëv (he is Russian, but sometimes blogs an talks in English), then you'll probably find others (this is 99% about Java and JMM). Jeff Preshing has a lot of interesting stuff in his blog (http://preshing.com/archives/, search "memory", "atomic", "barrier", "sync", "volatile"). Watching talks from Java/C++ conferences related to performance and multithreading also helps.

From the Ruby point of view - jRuby uses JMM and https://github.com/jruby/jruby/wiki/Concurrency-in-jruby. MRI uses GIL, which basically means that there is a happens-before relation between any read and write to shared variable because they are basically "synchronized" by the GIL (it doesn't mean that they are atomic, but it means that writes from one thread are visible to the others). Otherwise, MRI has no specified behaviors of how things should or shouldn't work. :(

Also, you can dig into internals of https://github.com/ruby-concurrency/concurrent-ruby/ and watch @jdantonio's talks on concurrent ruby (there are a couple of them).

Contributor

fxposter commented Nov 22, 2015

Well, it's a tough question... I'd recommend reading stuff on Java Memory Model and C++ Memory Model (the more you read - the better). In terms of people - start with Gil Tene, Martin Thompson, Michael Barker, Aleksey Shipilëv (he is Russian, but sometimes blogs an talks in English), then you'll probably find others (this is 99% about Java and JMM). Jeff Preshing has a lot of interesting stuff in his blog (http://preshing.com/archives/, search "memory", "atomic", "barrier", "sync", "volatile"). Watching talks from Java/C++ conferences related to performance and multithreading also helps.

From the Ruby point of view - jRuby uses JMM and https://github.com/jruby/jruby/wiki/Concurrency-in-jruby. MRI uses GIL, which basically means that there is a happens-before relation between any read and write to shared variable because they are basically "synchronized" by the GIL (it doesn't mean that they are atomic, but it means that writes from one thread are visible to the others). Otherwise, MRI has no specified behaviors of how things should or shouldn't work. :(

Also, you can dig into internals of https://github.com/ruby-concurrency/concurrent-ruby/ and watch @jdantonio's talks on concurrent ruby (there are a couple of them).

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Nov 22, 2015

Member

Awesome reply, thanks very much!!!

Member

fxn commented Nov 22, 2015

Awesome reply, thanks very much!!!

@jdantonio

This comment has been minimized.

Show comment
Hide comment
@jdantonio

jdantonio Nov 22, 2015

Contributor

@fxn The talk I gave at RubyConf last week specifically addresses the GIL and why it doesn't make volatility/visibility promises. It may answer your question. The video isn't online at Confreaks yet, but it should be available within a few days. Keep an eye on this page for it to appear.

@fxposter Thanks for the shout-out!

Contributor

jdantonio commented Nov 22, 2015

@fxn The talk I gave at RubyConf last week specifically addresses the GIL and why it doesn't make volatility/visibility promises. It may answer your question. The video isn't online at Confreaks yet, but it should be available within a few days. Keep an eye on this page for it to appear.

@fxposter Thanks for the shout-out!

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 22, 2015

Contributor

why it doesn't make volatility/visibility promises

@jdantonio wow, so MRI requires synchronization for reading/writing shared vars in different threads? Waiting for your talk too.

Contributor

fxposter commented Nov 22, 2015

why it doesn't make volatility/visibility promises

@jdantonio wow, so MRI requires synchronization for reading/writing shared vars in different threads? Waiting for your talk too.

@jdantonio

This comment has been minimized.

Show comment
Hide comment
@jdantonio

jdantonio Nov 22, 2015

Contributor

The TL;DR is that MRI makes no guarantees--it has no formal memory model. The GIL implicitly makes most var reads and writes safe, but nothing would prevent future updates from changing that. There are many operations within MRI which silently release the GIL. The best practice is to not depend in the GIL for thread safety.

Contributor

jdantonio commented Nov 22, 2015

The TL;DR is that MRI makes no guarantees--it has no formal memory model. The GIL implicitly makes most var reads and writes safe, but nothing would prevent future updates from changing that. There are many operations within MRI which silently release the GIL. The best practice is to not depend in the GIL for thread safety.

@fxposter

This comment has been minimized.

Show comment
Hide comment
@fxposter

fxposter Nov 22, 2015

Contributor

nothing would prevent future updates from changing that

True, but we have no choice. :)

The best practice is to not depend in the GIL for thread safety.

But thread-safety and visibility are not the same thing.

There are many operations within MRI which silently release the GIL.

This is the most interesting thing. :) But as far as I understand - if you are in ruby-only land (ie: don't use extensions) - you are safe, because there always be a "happens-before" relationship between writing var in one thread, and reading in the other, isn't it?

Contributor

fxposter commented Nov 22, 2015

nothing would prevent future updates from changing that

True, but we have no choice. :)

The best practice is to not depend in the GIL for thread safety.

But thread-safety and visibility are not the same thing.

There are many operations within MRI which silently release the GIL.

This is the most interesting thing. :) But as far as I understand - if you are in ruby-only land (ie: don't use extensions) - you are safe, because there always be a "happens-before" relationship between writing var in one thread, and reading in the other, isn't it?

@jdantonio

This comment has been minimized.

Show comment
Hide comment
@jdantonio

jdantonio Nov 22, 2015

Contributor

Yes, MRI has happens-before semantics, but not a guarantee. That's a side-effect. If you are only writing for MRI you generally don't have to worry. My personal preference is to use the atomic variables in concurrent-ruby. Then you can take advantage of our stronger guarantees and also support the other runtimes. But I'm biased. :-)

Contributor

jdantonio commented Nov 22, 2015

Yes, MRI has happens-before semantics, but not a guarantee. That's a side-effect. If you are only writing for MRI you generally don't have to worry. My personal preference is to use the atomic variables in concurrent-ruby. Then you can take advantage of our stronger guarantees and also support the other runtimes. But I'm biased. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment