Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements an evented file system monitor #22254

Merged
merged 36 commits into from
Nov 11, 2015
Merged

Implements an evented file system monitor #22254

merged 36 commits into from
Nov 11, 2015

Conversation

fxn
Copy link
Member

@fxn fxn commented Nov 11, 2015

Implements an evented file system monitor to asynchronously detect changes in the application source code, routes, locales, etc.

To opt-in load the listen gem in Gemfile:

group :development do
  gem 'listen', '~> 3.0.4'
end

Original work by @puneet24 for GSoC 2015, later iterated by yours truly.

fxn added 30 commits November 8, 2015 22:49
This is the implementation of the file update checker written
by Puneet Agarwal for GSoC 2015 (except for the tiny version
of the listen gem, which was 3.0.2 in the original patch).

Puneet's branch became too out of sync with upstream. This is
the final work in one single clean commit.

Credit goes in the first line using a convention understood
by the contrib app.
3.0.3 has a bug in OS X.
This commit also bases everything on Pathname internally.
In particular files are no longer created in the current working directory,
but in a temporary folder.
Mac OS X tries by all means to hide that /var is /private/var, and that is
what FSEvents reports back.
This sucks, but otherwise I get occasional Fs on Mac OS X.
"checker" is the name being used everywhere.
@rymai
Copy link
Contributor

rymai commented Nov 11, 2015

❤️ 💚 💓 💛 💙

@PapePathe
Copy link

+1111100000000

@fxposter
Copy link
Contributor

@fxn shouldn't @updated be an atomic boolean, ie: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent/atomic/atomic_boolean.rb. If listen don't block itself, that means that it will call changed method in a different thread and reading and writing in different threads requires variable to be atomic. It should work in MRI due to it's implementation and GIL, but @jruby will need atomic var here. Not sure about @rubinius, but it probably needs atomic too. /cc @headius @brixen

@fxn
Copy link
Member Author

fxn commented Nov 14, 2015

For the use case of this monitor it probably does not matter. If you edit a file simultaneously to firing a request... well, you probably can't either know if it should reload or not.

Once the flag is true, it stays true.

Also, dev mode is single-threaded.

But if you guys have a good justification for its need, we could change it.

@fxposter
Copy link
Contributor

The problem is that if one thread actually sets true to the variable, then you can't guarantee that the other thread will read true and not the old false value cached somewhere inside the processor's caches. You can try reading http://shipilev.net/blog/2014/jmm-pragmatics/, but in your case I don't see any guarantees that there is any "happens-before" bridge between writing @updated and reading it, which means that it is actually allowed for one thread to "cache" false value forever, even if other threads actually change it.
It is not even close to analyzing impact of "simultaneous requests and files changing" :)

@fxn
Copy link
Member Author

fxn commented Nov 14, 2015

@fxposter really? Was not aware of that possibility, I believed it opened the door to a race condition, but that eventually the shared value would be updated. That is, atomic vs not atomic.

The main use case for these monitors is not multi-threaded, but I'll read that document you linked to. Meanwhile, please anybody feel free to chime in.

@brixen
Copy link

brixen commented Nov 14, 2015

I don't think it's my place to suggest anything about how you want to implement this, but I will clarify that Rubinius does not have a GIL/GVL and that instance variables do not have volatile semantics. If you want multiple threads to have precisely the same view of a memory location at a point in time, you must use a Mutex, memory fence, or other operation that implies synchronizing the view of that memory location across threads.

@fxn
Copy link
Member Author

fxn commented Nov 14, 2015

Yes, yes, code that is not thread-safe it is not thread-safe and there may be race conditions and the outcome be non-deterministic. That is clear.

What is beyond my understanding is that without synchronization not only you're exposed to race conditions, but that a stale value can be cached by the CPU forever (as said above). Do synchronization idioms actually flush caches in addition to prevent parallel execution?

Note that this particular code is not expected to be run by multiple threads, because its main use case is app reloading, which uses constant autoloading, which is not thread-safe per se as of this writing. But we could certainly consider making the monitor thread-safe, and in any case I find interesting and would like to understand that observation made by @fxposter (reading the link pending).

@fxposter
Copy link
Contributor

That's not a realistic scenario, but if that variable for some reasons is not evicted from CPU cache by something else - CPU does not need to remove it and "update" - that's not how CPUs work. Anyway - that's not the thing I'd actually rely upon. :)

@fxn
Copy link
Member Author

fxn commented Nov 14, 2015

@fxposter but the memory is shared. If you do not synchronize there may be race conditions that may affect who writes last (and in a general scenario which value ends up being stored). But sync'ed or not sync'ed, aren't the threads just writing to a memory location like anything else does? What is different here?

@fxposter
Copy link
Contributor

@fxn yes, you are right - memory is shared. Not CPU caches. And threads can write to CPU caches and nothing forces cache line to be written to the main memory except synchronization or memory barriers.

@fxn
Copy link
Member Author

fxn commented Nov 14, 2015

@fxposter ah! that was the bit I didn't know about, that threads can write to caches without going through memory. I thought the CPU cache was an optimization transparent to programs, whose only interface was memory. Then, a synchronized block forces that?

@fxposter
Copy link
Contributor

Yes, synchronized block or volatile (atomic, in our case) variables. But if you want to make it synchronized - you want to use synchronized both for reads and writes.

@fxn
Copy link
Member Author

fxn commented Nov 15, 2015

Awesome, thanks @fxposter!

@fxposter
Copy link
Contributor

@fxn Thank you for improving Rails! 👍

@headius
Copy link
Contributor

headius commented Nov 20, 2015

FWIW, I know Rails 5+ now depends on concurrent-ruby, which provides facilities for atomic and volatile variables. That's the recommended path forward rather than full synchronization, if you don't need locking.

@fxn
Copy link
Member Author

fxn commented Nov 21, 2015

Revised in 49a5b40.

I chose to lock around the changed callback, because in theory there could be two invocations of changed in a moment in which no update has happened. If they enter the body of unless and the one that updates @updated last sets it to false, we would miss a possible true from the first one. Due to the purpose of this class, any true has to be detected, you cannot miss it.

@fxposter does it look good to you?

@fxposter
Copy link
Contributor

@fxn Commented in commit. In general - I don't think that we need mutex there if we change implementation a bit, but with current implementation mutex is needed there and it will work.

@fxn
Copy link
Member Author

fxn commented Nov 22, 2015

@fxposter It seems we are done. Thanks a lot for your feedback.

The fact that I didn't know reading ivars needed synchronization even if its writer was synchronized tells me I need to dig into portable Ruby multithreading. I'll need to find a trustworthy reference, do you have any recommendation?

@fxposter
Copy link
Contributor

Well, it's a tough question... I'd recommend reading stuff on Java Memory Model and C++ Memory Model (the more you read - the better). In terms of people - start with Gil Tene, Martin Thompson, Michael Barker, Aleksey Shipilëv (he is Russian, but sometimes blogs an talks in English), then you'll probably find others (this is 99% about Java and JMM). Jeff Preshing has a lot of interesting stuff in his blog (http://preshing.com/archives/, search "memory", "atomic", "barrier", "sync", "volatile"). Watching talks from Java/C++ conferences related to performance and multithreading also helps.

From the Ruby point of view - jRuby uses JMM and https://github.com/jruby/jruby/wiki/Concurrency-in-jruby. MRI uses GIL, which basically means that there is a happens-before relation between any read and write to shared variable because they are basically "synchronized" by the GIL (it doesn't mean that they are atomic, but it means that writes from one thread are visible to the others). Otherwise, MRI has no specified behaviors of how things should or shouldn't work. :(

Also, you can dig into internals of https://github.com/ruby-concurrency/concurrent-ruby/ and watch @jdantonio's talks on concurrent ruby (there are a couple of them).

@fxn
Copy link
Member Author

fxn commented Nov 22, 2015

Awesome reply, thanks very much!!!

@jdantonio
Copy link
Contributor

@fxn The talk I gave at RubyConf last week specifically addresses the GIL and why it doesn't make volatility/visibility promises. It may answer your question. The video isn't online at Confreaks yet, but it should be available within a few days. Keep an eye on this page for it to appear.

@fxposter Thanks for the shout-out!

@fxposter
Copy link
Contributor

why it doesn't make volatility/visibility promises

@jdantonio wow, so MRI requires synchronization for reading/writing shared vars in different threads? Waiting for your talk too.

@jdantonio
Copy link
Contributor

The TL;DR is that MRI makes no guarantees--it has no formal memory model. The GIL implicitly makes most var reads and writes safe, but nothing would prevent future updates from changing that. There are many operations within MRI which silently release the GIL. The best practice is to not depend in the GIL for thread safety.

@fxposter
Copy link
Contributor

nothing would prevent future updates from changing that

True, but we have no choice. :)

The best practice is to not depend in the GIL for thread safety.

But thread-safety and visibility are not the same thing.

There are many operations within MRI which silently release the GIL.

This is the most interesting thing. :) But as far as I understand - if you are in ruby-only land (ie: don't use extensions) - you are safe, because there always be a "happens-before" relationship between writing var in one thread, and reading in the other, isn't it?

@jdantonio
Copy link
Contributor

Yes, MRI has happens-before semantics, but not a guarantee. That's a side-effect. If you are only writing for MRI you generally don't have to worry. My personal preference is to use the atomic variables in concurrent-ruby. Then you can take advantage of our stronger guarantees and also support the other runtimes. But I'm biased. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants