Cache git remotes #2482

Closed
wants to merge 1 commit into
from

Projects

None yet

4 participants

@guybrush

right now npm install git+ssh://foo.com/bar.git#v0.0.1 will (see https://github.com/isaacs/npm/blob/master/lib/cache.js#L280-326 ):

  • clone the git-repo into /tmp/random
  • then it will checkout the gitref v0.0.1
  • then it will run addLocalDirectory(/tmp/random)

this patch will make it do (u is the git-remote):

  1. path.exists(cache+'/.gitRemotes/'+sha1(u)) ? 3. : 2.
  2. git clone --bare u cache/.gitRemotes/sha1(u)
  3. cd cache/.gitRemotes/sha1(u) && git fetch
  4. git clone cache/.gitRemotes/sha1(u) /tmp/random
  5. cd /tmp/random && git checkout gitref
  6. addLocalDirectory(/tmp/random)

that way npm only needs to clone repositories (per git-remote) 1 time and will use git fetch after that

@guybrush

note that since ca15583 npm install git+ssh://git@github.com:isaacs/npm.git does not work anymore. this PR depends on ca15583

git supports (see http://www.kernel.org/pub/software/scm/git/docs/v1.5.6/git-fetch.html#_git_urls_a_id_urls_a )

  • rsync://host.xz/path/to/repo.git/
  • http://host.xz/path/to/repo.git/
  • https://host.xz/path/to/repo.git/
  • git://host.xz/path/to/repo.git/
  • git://host.xz/~user/path/to/repo.git/
  • ssh://[user@]host.xz[:port]/path/to/repo.git/
  • ssh://[user@]host.xz/path/to/repo.git/
  • ssh://[user@]host.xz/~user/path/to/repo.git/
  • ssh://[user@]host.xz/~/path/to/repo.git

and

  • [user@]host.xz:/path/to/repo.git/
  • [user@]host.xz:~user/path/to/repo.git/
  • [user@]host.xz:path/to/repo.git

so npm should only allow

  • npm install git+ssh://git@github.com/isaacs/npm.git
  • npm install git://git@github.com/isaacs/npm.git
  • npm install git@github.com:isaacs/npm.git

which is, the colon (scp-style) works only without defining a protocol

i might be wrong, any thoughts on that?

related: #2453

@guybrush

i found out about require('zlib') :)

so now it is

  1. path.exists(cache+'/.gitRemotes/'+sha1(u)) ? 3. : 2.
  2. git clone --bare u cache/.gitRemotes/sha1(u)
  3. cd cache/.gitRemotes/sha1(u) && git fetch
  4. git archive /tmp/random.tgz
  5. addLocalTarball(/tmp/random.tgz)
@guybrush

did a git rebase to master

@scriby scriby referenced this pull request Oct 4, 2012
Closed

git URL update speed #2855

@isaacs isaacs and 1 other commented on an outdated diff Oct 7, 2012
lib/cache.js
log.verbose("addRemoteGit", [u, co])
- var tmp = path.join(npm.tmp, Date.now()+"-"+Math.random())
- mkdir(path.dirname(tmp), function (er) {
- if (er) return cb(er)
- exec( npm.config.get("git"), ["clone", u, tmp], null, false
- , function (er, code, stdout, stderr) {
- stdout = (stdout + "\n" + stderr).trim()
- if (er) {
- log.error("git clone " + u, stdout)
- return cb(er)
- }
- log.verbose("git clone "+u, stdout)
- exec( npm.config.get("git"), ["checkout", co], null, false, tmp
+ var p = path.join(npm.config.get("cache"), ".gitRemotes", v)
+ path.exists(p, function (exists) {
@isaacs
isaacs Oct 7, 2012

Don't use path.exists. Use fs.stat, and verify that it's actually the sort of thing you want it to be. (Ie, st.isDirectory() etc.)

@guybrush
guybrush Oct 19, 2012

what should i do if the path exists but is not a directory, just delete it?

@isaacs isaacs commented on an outdated diff Oct 7, 2012
lib/cache.js
log.verbose("addRemoteGit", [u, co])
- var tmp = path.join(npm.tmp, Date.now()+"-"+Math.random())
- mkdir(path.dirname(tmp), function (er) {
- if (er) return cb(er)
- exec( npm.config.get("git"), ["clone", u, tmp], null, false
- , function (er, code, stdout, stderr) {
- stdout = (stdout + "\n" + stderr).trim()
- if (er) {
- log.error("git clone " + u, stdout)
- return cb(er)
- }
- log.verbose("git clone "+u, stdout)
- exec( npm.config.get("git"), ["checkout", co], null, false, tmp
+ var p = path.join(npm.config.get("cache"), ".gitRemotes", v)
@isaacs
isaacs Oct 7, 2012

The folder should not start with a dot, or contain capital letters. _git-remotes would be better.

@isaacs isaacs and 1 other commented on an outdated diff Oct 7, 2012
lib/cache.js
log.verbose("addRemoteGit", [u, co])
- var tmp = path.join(npm.tmp, Date.now()+"-"+Math.random())
- mkdir(path.dirname(tmp), function (er) {
- if (er) return cb(er)
- exec( npm.config.get("git"), ["clone", u, tmp], null, false
- , function (er, code, stdout, stderr) {
- stdout = (stdout + "\n" + stderr).trim()
- if (er) {
- log.error("git clone " + u, stdout)
- return cb(er)
- }
- log.verbose("git clone "+u, stdout)
- exec( npm.config.get("git"), ["checkout", co], null, false, tmp
+ var p = path.join(npm.config.get("cache"), ".gitRemotes", v)
+ path.exists(p, function (exists) {
+ if (exists) return archive()
@isaacs
isaacs Oct 7, 2012

If the path exists and is a folder, then you still ought to do a git fetch -a origin in it, so that you pick up new refs.

@guybrush
guybrush Oct 19, 2012

I do git fetch in the archive() function, will change it to git fetch -a origin.

@scriby

I just wanted to say thanks for working on this! It will really help out on my project.

@guybrush

oh i just noticed the comments from isaacs, sweet! will try to fix the things later today and rebase, this would really help me too :)

@guybrush

to sum up what i changed (its all only in addRemoteGit()):

  • added gitEnv() to all the git-exec()'s
  • .gitRemotes -> _git-remotes
  • path.exists(cacheDir) -> check if all of the following is true, if not rm(chacheDir) and do a fresh clone (i am not sure about this)
    • fs.stat(cacheDir) && stats.isDirectory()
    • exec("git config --get remote.origin.url") == u
  • git fetch -> git fetch -a origin
  • joined stdout and stderr for logging after all the exec's like everywhere else (stdout = (stdout + "\n" + stderr).trim())
  • added some "("+u+")" to some log-messages, to give a hint about the repo
  • listen for close on the git fetch -a origin process before doing addLocalTarball to make sure the gzip-stream ended (i am not sure about this)

so with this patch npm caches git-remotes like this:

  1. cacheDir = path.join(cache,'_git-remotes',sha1(u))
  2. checkGitDir(cacheDir) ? 4. : 3. (rm cacheDir if necessary)
  3. git clone --mirror u cacheDir
  4. cd cacheDir && git fetch -a origin
  5. git archive /tmp/random.tgz --format=tar --prefix=package/
  6. addLocalTarball(/tmp/random.tgz)

since i am very unsure where/how to put tests in npm i made a simple test-module :D

@scriby

poke

@guybrush

@scriby until @isaacs responds you can use the feature by installing the branch with npm install -g git://github.com/guybrush/npm.git#cacheGitRemotes, I will try to keep the branch up2date/rebased to master. at least I am using it in production to install my stuff and it works fine so far - though I dont have a huge server-farm or something where this is super-critical.

I understand that this patch is somehow critical and does not add a popular demanded feature (and is not very clean yet?). so it may take some time to convince @isaacs to even spend some time in reviewing this or even implement this himself, i guess :)

also I will try to poke @isaacs in irc from time to time :p

@guybrush guybrush referenced this pull request in substack/rolling-reduce Dec 7, 2012
Closed

Add component.json #1

@domenic
npm member

+1

@domenic domenic referenced this pull request in requirejs/text Jan 6, 2013
Closed

Publish to npm #32

@isaacs
npm member

Landed on master with a little bit of fixup. Thanks, this is a big improvement!

@isaacs isaacs closed this Jan 15, 2013
@scriby

Thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment