Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If the OS allows it, build the docs (ri and rdoc) in a forked background process #352

Merged
merged 4 commits into from Jul 6, 2012

Conversation

@alexch
Copy link
Contributor

alexch commented Jul 6, 2012

ri and rdoc are amazingly great, but sadly, many people are impatient, so the Ruby world is filled with advice to add --no-rdoc and --no-ri everywhere, and people are then forced to use their web browsers and visit ruby-doc.org when a quick "ri titleize" would suffice, with much less risk of flow interruption.

This patch attempts to resolve this tension between install-time and dev-time impatience by forking a background process to build the docs after the gems have been installed. The only risk will be if someone tries to use ri or rdoc during the seconds or minutes immediately after the install, which is unlikely and at any rate, there are enough messages that people should realize the docs aren't quite built yet. And all that would happen is that the docs wouldn't be found; no risk of corruption or even misleading errors as far as I can tell.

The code is fairly elegant but it currently relies on the fact that the only user of the done_installing_hooks is the documentation builder. If any other code wants to add hooks then we'd have to make it a little more configurable -- possibly allowing different status messages, possibly adding a new hook array for just backgroundable tasks and restoring done_installing_hooks to just run synchronously like before.

Is there a need to add a "--no-background" option? I can't think of a real use case. If we're on Windows (or another forkless Ruby) it'll revert to synchronous. Perhaps rubygems.org needs to build docs synchronously, but if it's using the RDoc code directly this patch won't affect it.

One thing I feel strongly about is that backgrounding doc gen should be the default behavior. If we hide this behind an option then we're just perpetuating the original problem -- the perception that "gem install" is slow so you should just not install docs. (Which you actually should.)

There are no unit tests yet. It's kind of difficult to unit test fork but if you insist I could try to give it a go. Maybe we'd mock fork just to assert that it was called without actually forking anything during the test.

I just tried a clean install of rails and here are some stats:

time to install 2 dozen gems: 13sec
time to build docs for same: 238sec

So I guess it's no wonder people prefer --no-rdoc...

@travisbot

This comment has been minimized.

Copy link

travisbot commented Jul 6, 2012

This pull request fails (merged 322fac3 into f65ea70).

@alexch

This comment has been minimized.

Copy link
Contributor Author

alexch commented Jul 6, 2012

Interesting. Those tests are failing because the temporary dir no longer exists by the time the background process runs. Looks like I need to worry about unit tests after all.

@travisbot

This comment has been minimized.

Copy link

travisbot commented Jul 6, 2012

This pull request passes (merged 55df43d into f65ea70).

@luislavena

This comment has been minimized.

Copy link
Member

luislavena commented Jul 6, 2012

I'm not fond with this change beyond posix or windows aspect.

I would rather invest time in making ri lazily generate docs of newer gems
than firing a background job that will take considerable amount of time (eg
any rails gem documentation).

Spawned process could run wildly without control of you install 30 gems in
one row (like installing rails)

Sorry for top posting. Sent from mobile.
On Jul 6, 2012 12:13 AM, "Alex Chaffee" <
reply@reply.github.com>
wrote:

ri and rdoc are amazingly great, but sadly, many people are impatient, so
the Ruby world is filled with advice to add --no-rdoc and --no-ri
everywhere, and people are then forced to use their web browsers and visit
ruby-doc.org when a quick "ri titleize" would suffice, with much less
risk of flow interruption.

This patch attempts to resolve this tension between install-time and
dev-time impatience by forking a background process to build the docs after
the gems have been installed. The only risk will be if someone tries to use
ri or rdoc during the seconds or minutes immediately after the install,
which is unlikely and at any rate, there are enough messages that people
should realize the docs aren't quite built yet. And all that would happen
is that the docs wouldn't be found; no risk of corruption or even
misleading errors as far as I can tell.

The code is fairly elegant but it currently relies on the fact that the
only user of the done_installing_hooks is the documentation builder. If any
other code wants to add hooks then we'd have to make it a little more
configurable -- possibly allowing different status messages, possibly
adding a new hook array for just backgroundable tasks and restoring
done_installing_hooks to just run synchronously like before.

Is there a need to add a "--no-background" option? I can't think of a real
use case. If we're on Windows (or another forkless Ruby) it'll revert to
synchronous. Perhaps rubygems.org needs to build docs synchronously, but
if it's using the RDoc code directly this patch won't affect it.

One thing I feel strongly about is that backgrounding doc gen should be
the default behavior. If we hide this behind an option then we're just
perpetuating the original problem -- the perception that "gem install" is
slow so you should just not install docs. (Which you actually should.)

There are no unit tests yet. It's kind of difficult to unit test fork
but if you insist I could try to give it a go. Maybe we'd mock fork just to
assert that it was called without actually forking anything during the test.

I just tried a clean install of rails and here are some stats:

time to install 2 dozen gems: 13sec
time to build docs for same: 238sec

So I guess it's no wonder people prefer --no-rdoc...

You can merge this Pull Request by running:

git pull https://github.com/alexch/rubygems fork_docs

Or you can view, comment on it, or merge it online at:

#352

-- Commit Summary --

  • ignore .idea (RubyMine settings) dir
  • if possible, build documentation in a background process, so people can
    start using their newly-installed gems sooner
  • add timing info

-- File Changes --

M .gitignore (3)
M lib/rubygems/dependency_installer.rb (26)

-- Patch Links --

https://github.com/rubygems/rubygems/pull/352.patch
https://github.com/rubygems/rubygems/pull/352.diff


Reply to this email directly or view it on GitHub:
#352

@alexch

This comment has been minimized.

Copy link
Contributor Author

alexch commented Jul 6, 2012

Spawned process could run wildly without control of you install 30 gems in one row (like installing rails)

Nope. Only one process forks; that process generates each gem's docs in turn, just like it happens now (in the original process).

I would rather invest time in making ri lazily generate docs of newer gems
than firing a background job that will take considerable amount of time (eg
any rails gem documentation).

I don't understand. Installing new gems always generates new docs. And the fact that it takes time is entirely the point of this patch.

And this patch is really pretty simple; it took me less than an hour to write and the behavior is isolated to 2 methods (plus a config setting to allow tests to disable it).

@luislavena

This comment has been minimized.

Copy link
Member

luislavena commented Jul 6, 2012

Forked process will run, in the background generating documentation for all
the gems you just installed.

A pristine install of rails means around 26 docs of gems that will be
processed by a single uncontrolled process that is running in the
background that you aren't aware of.

Or is parent process waiting for it to complete? Guess no, if not defeats
the purpose of "async" you stated.

Or I'm missing something? Haven't played with your changes so I'm basing my
comments in your description and a quick look to the chsbges.

There was a lazy rdoc generator for "gem server", I would like something
similar for ri.

Invoking "ri" coud trigger generation of docs for the newer gems ri is not
aware of.

Sorry for top posting. Sent from mobile.
On Jul 6, 2012 1:19 AM, "Alex Chaffee" <
reply@reply.github.com>
wrote:

Spawned process could run wildly without control of you install 30 gems
in one row (like installing rails)

Nope. Only one process forks; that process generates each gem's docs in
turn, just like it happens now (in the original process).

I would rather invest time in making ri lazily generate docs of newer
gems
than firing a background job that will take considerable amount of time
(eg
any rails gem documentation).

I don't understand. Installing new gems always generates new docs. And the
fact that it takes time is entirely the point of this patch.

And this patch is really pretty simple; it took me less than an hour to
write and the behavior is isolated to 2 methods (plus a config setting to
allow tests to disable it).


Reply to this email directly or view it on GitHub:
#352 (comment)

@alexch

This comment has been minimized.

Copy link
Contributor Author

alexch commented Jul 6, 2012

A pristine install of rails means around 26 docs of gems that will be
processed by a single uncontrolled process that is running in the
background that you aren't aware of.

Yes and no. Yes, it's in the background; no, you're not unaware of it,
since it's still emitting progress reports (and eventually a
completion report) to the console. It's a way of giving you back the
steering wheel, maybe letting you start running your tests or your
server or edit some files during the 238 seconds you would otherwise
have been staring at the screen.

If you want to be able to control it, we could emit its pid -- but
that seems excessively nerdy.

Alternately, if it's too noisy, we could redirect its output to a log
file, but personally I like the current behavior.

Lazy ri generation doesn't seem great to me since the first time you
call "ri reverse" it would have to generate docs for all (new) gems
just in case one of them defines a public method named "reverse" --
which would mean people would start thinking "ri is slow" and keep not
using it. I guess lazy gen would work better for rdoc since there
you're always asking for a single gem's docs.

@zenspider

This comment has been minimized.

Copy link
Contributor

zenspider commented Jul 6, 2012

Typing 'ri' and having to wait as it parses and indexes those 30 gems is worse.

I'd rather we actually try this than reject it out of fear of lack of control. We can improve it or roll it back if it doesn't work out.

zenspider added a commit that referenced this pull request Jul 6, 2012
+ If the OS allows it, build the docs (ri and rdoc) in a forked background process. (alexch)
@zenspider zenspider merged commit 31f87d9 into rubygems:master Jul 6, 2012
@luislavena

This comment has been minimized.

Copy link
Member

luislavena commented Jul 6, 2012

I'd rather we actually try this than reject it out of fear of lack of control.

Is not actually fear, generating the docs for gems forks a background process, that process will keep adding stuff to the same output (attached), so is not done, stuff is still going on.

But heck, what do I know about that...

@@ -304,6 +304,8 @@ def test_execute_rdoc
assert_equal 0, e.exit_code
end

Process.wait rescue Errno::ECHILD if Process.respond_to?(:fork)

This comment has been minimized.

Copy link
@bhenderson

bhenderson Jul 6, 2012

what's the Errno::ECHILD for? It seems like this line isn't doing what it was intended to do. (same for line 277)

This comment has been minimized.

Copy link
@alexch

alexch Jul 7, 2012

Author Contributor

If the child process has already exited, Process.wait will raise it. Note that the ri docs for Process.wait say it will raise a SystemError but actually, Errno::ECHILD < SystemCallError < StandardError (tested under 1.9.3 and 1.8.7). You can verify this by simply calling Process.wait from irb.

This comment has been minimized.

Copy link
@alexch

alexch Jul 7, 2012

Author Contributor

To be more precise, the line is to wait for the process to exit, but not to fail if either there never was a child process forked, or if the process has already been reaped by an earlier Process.wait.

This comment has been minimized.

Copy link
@bhenderson

bhenderson Jul 7, 2012

thanks @alexch for getting back to me. What I mean is that when you rescue like that, you're describing the return value, not the Exception class to rescue.

>> Process.wait rescue Errno::ECHILD
=> Errno::ECHILD
>> Process.wait rescue 'return value'
=> "return value"

also,

>> begin Process.wait; rescue; Errno::ECHILD end
=> Errno::ECHILD
>> begin Process.wait; rescue Errno::ECHILD; end
=> nil

(I only have 1.8.7 in front of me to test, but I'm pretty sure 1.9.x does the same)

I realize this is a very minor issue. I do think it would be nice if ruby allowed for specifying the exception class in this way.

thanks for your time.

This comment has been minimized.

Copy link
@alexch

alexch Jul 7, 2012

Author Contributor

Ouch! You just dropped some knowledge on me.

You're right; as far as I can tell the one-line rescue's argument is
merely a return value, not a specifier. I'll make a new PR to fix that
(since in the rare case Process.wait raises a different exception
it'd be nice not to swallow it).

On Sat, Jul 7, 2012 at 9:41 AM, Brian Henderson
reply@reply.github.com
wrote:

@@ -304,6 +304,8 @@ def test_execute_rdoc
assert_equal 0, e.exit_code
end

  • Process.wait rescue Errno::ECHILD if Process.respond_to?(:fork)

thanks @alexch for getting back to me. What I mean is that when you rescue like that, you're describing the return value, not the Exception class to rescue.

>> Process.wait rescue Errno::ECHILD
=> Errno::ECHILD
>> Process.wait rescue 'return value'
=> "return value"

also,

>> begin Process.wait; rescue; Errno::ECHILD end
=> Errno::ECHILD
>> begin Process.wait; rescue Errno::ECHILD; end
=> nil

(I only have 1.8.7 in front of me to test, but I'm pretty sure 1.9.x does the same)

I realize this is a very minor issue. I do think it would be nice if ruby allowed for specifying the exception class in this way.

thanks for your time.


Reply to this email directly or view it on GitHub:
https://github.com/rubygems/rubygems/pull/352/files#r1118440

Alex Chaffee - alex@stinky.com
http://alexchaffee.com
http://twitter.com/alexch

This comment has been minimized.

Copy link
@alexch

alexch Jul 7, 2012

Author Contributor

This comment has been minimized.

Copy link
@bhenderson

bhenderson Jul 7, 2012

great. thanks.

This comment has been minimized.

Copy link
@zenspider

zenspider Jul 7, 2012

Contributor

applied

@raggi

This comment has been minimized.

Copy link
Contributor

raggi commented Jul 7, 2012

This won't work on JRuby.

@alexch

This comment has been minimized.

Copy link
Contributor Author

alexch commented Jul 8, 2012

This won't work on JRuby.

Well, that's a shame. The docs for MRI Process.fork clearly say "If
fork is not usable, Process.respond_to?(:fork) returns false." but on
JRuby 1.6.1 (Ruby 1.8.7) Process.fork raises a NotImplementedError.
Arguably a better solution, but still a departure from spec. We can
check for that exception.

That'll fix it failing. As for not working, well, that's why the
subject says "if the OS allows it". :-)

@alexch

This comment has been minimized.

Copy link
Contributor Author

alexch commented Jul 8, 2012

thanks @raggi ! fixed in 6f87401

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.