Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute native git commands directly #48

Merged
21 commits merged into from
Dec 23, 2010
Merged

Execute native git commands directly #48

21 commits merged into from
Dec 23, 2010

Conversation

rtomayko
Copy link
Collaborator

This removes as many as 3 fork/exec's from all native git calls, makes it possible to retrieve the git process's exit status, removes the need to shell escape command arguments, and allows setting environment variables on the child process without changing the parent process's environment.

Previously, #native calls created a process hierarchy like this:

grit ruby process
- <fork again to reparent under init>
  - /bin/sh -c '/usr/bin/env git ...'
    - /usr/bin/env ...
      - /usr/local/bin/git ...

This #native implementation forks once and execs git directly:

grit ruby process
  - /usr/local/bin/git ...

Some light benchmarks shows decent gains.

Linux:

$ uname -a
Linux fe1.rs.github.com 2.6.26-2-amd64 1 SMP Thu Sep 16 15:56:38 UTC 2010 x86_64 GNU/Linux
$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ ruby native-benchmark.rb 
                                    user     system      total        real
sh                              0.510000   2.170000   5.520000 ( 13.555887)
execute                         0.240000   1.280000   5.620000 (  6.161315)

Mac:

$ uname -a
Darwin asha.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386
$ ruby -v
ruby 1.8.7 (2010-06-23 patchlevel 299) [i686-darwin10.4.0]
$ ruby native-benchmark.rb 
                                    user     system      total        real
sh                              0.440000   2.510000   3.520000 (  8.327763)
execute                         0.330000   2.370000   4.860000 (  5.868368)

Even without the performance benefits, the ability to retrieve exit status and setup the child's environment are much needed additions that will let us clean up some janky code elsewhere.

The old Open3 based Git#sh and Git#run methods remain and are used as a fallback for non-POSIX systems (I assume Open3 works on platforms without fork(2) somehow) and also in cases where a #native is called with a pipeline as the last argument.

I'll be testing this out in GitHub staging/production environments this week.

Avoids starting a /bin/sh and /usr/bin/env process on each
native command invocation, and will allow exec'ing the
command directly.
This removes some overhead from all native git calls in the
following ways:

 - Removes a fork previously performed by Open3, which double
   forks to avoid needing to Process::wait.
 - Removes the need to shell escape arguments, since the git
   process's argv is passed explicitly as an array.
 - Removes the /bin/sh process (1 fork/exec)

Additionally, these changes allow obtaining the git process's exit
status, available as $? after any native git command invocations.
This is mostly so it works over RPC.
Pretty awesome. And the select(2) based implementation will fix a
long-standing bug where the grit process will hang when a git
process writes more than PIPE_BUF bytes to stderr or when the input
written to the git process's stdin exceeds PIPE_BUF. The old popen3
based logic writes all of stdin, then reads all of stdout, then
reads all of stderr so everything except stdout had to come in under
PIPE_BUF. This hasn't been much of an issue but is critical to our
plans on using `git cat-file --batch' and writing a bunch of SHA1s
on stdin.

Also moving toward using a common spawn method interface that's a
compatible subset of the Process.spawn method built into Ruby >=
1.9.1. The hope is that most non-MRI platforms will eventually
support Process.spawn out of the box and the ones that don't have
backports.
@rtomayko
Copy link
Collaborator Author

This is ready:

  • Reduces number of fork/execs by between 2-4. Nice perf increase.
  • Removes the need to shell quote input arguments.
  • Ability to retrieve child process exit status.
  • Ability to set child process environment without effecting parent process.
  • Ability to set child process working dir / pwd without effecting parent process.
  • Handles arbitrarily large input, output, and error streams. We can pipe stuff into commands without temp files and without worrying about hitting PIPE_BUF.
  • Based around Ruby 1.9's Process::spawn which will hopefully be a portability win in the future when support is added to JRuby/Windows.

Tested under various versions of MRI 1.8.7, REE 1.8.7, and 1.9.2-p0. Not all of these features work under JRuby yet but it works as well as it did previously.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant