Don't *always* update Git dependencies. #4191

Open
domenic opened this Issue Nov 26, 2013 · 26 comments

Projects

None yet
@domenic
Member
domenic commented Nov 26, 2013

In #4104, @robertkowalski fixed #1727 by making Git dependencies always update (i.e., always show as outdated). This is a big improvement over where we were, when they would never update! But it could be better.

Ideally:

  • If package.json points to a commit hash and the local copy is already at that commit hash, do no HTTP requests.
  • If package.json points to a tag or branch name, go fetch the metadata for that tag/branch from the repo, and check if the commit hash matches the local one. If so, do a full tarball download and unpack; otherwise, do no further HTTP.
  • If package.json points to no tag (i.e. master), go fetch the metadata for master, and proceed as in the previous case.

I'm not sure how much of this is possible given what information and technology we have available. (How do you tell what commit the local copy is on? What does 'getting metadata" mean? Are there easy ways to do this stuff for GitHub, but not for arbitrary Git servers?) I would bet, though, that Git-based package managers like Bower or Volo have already figured this out.

@guybrush
Contributor

also:

  • If package.json points to a commit hash and the local copy is not at that commit, check if the commit exists in the local cache of the repository. if it exists pull out the tarball, otherwise fetch the remote and try to pull out tarball afterwards

and in general check if the target-commit (if provided) exists in the cache before fetching

@vjpr
vjpr commented Jan 23, 2014

Is there some way to prevent all the dependencies of git packages being re-downloaded when a package changes?

Say I split my project into modules and deploy to Heroku, node_modules is cached for my main module, but other modules as git dependencies have all their dependencies re-downloaded and rebuilt.

Maybe an option to tell npm not to clobber nested node_modules when a git dep changes. This would allow a custom build script to handle the nested node_modules caching and invalidation.

@guybrush
Contributor

ok i tried to fix it and i am hanging at a problem where i dont see a good solution right now: how do we distinguish a commit-hash from a branch/tag-name? because without knowing if a commit-hash is meant or a git-ref we have to fetch the origin to be sure that whatever is up-to-date when you define a git-url-dependency: "foo" : "git/url#whatever".

so the first idea that pops up in my head would be to introduce another seperator (besides #) which indicates that you mean a commit-hash and no git-ref (which seems a bad idea..).

another idea would be, to look for a commit in the local repo with that hash. if it is there just use it and assume that whatever is a commit-hash

thoughts?

(if the way it works now sucks really hard for you, i recommend to bundle the dependencies)

@qraynaud

Once you have downloaded the ref once, you can know. And since you need to download it at least the first time that's not really a problem (you just have to checkout the repository, pull to the indicated ref and compare the current hash to the provided reference).

Or am I missing something?

@scottnonnenberg

+1 for a fix. This is slowing down installs for any package with git dependencies, even if the references are using tags. Just reverted to npm@1.3.11. :0(

@guybrush
Contributor

@scottnonnenberg with my PR #4667 it will skip fetching but only when you use commit-hashes (since we cant be sure if tags/branches haven't moved). also the defined commit-hash has to be at least 4 characters (currently).

@scottnonnenberg

@guybrush Glad to see some movement on this stuff. However, restricting the right behavior to commit-hashes seems like we're optimizing for the degenerative case - it's a bad idea idea to change published history, and that includes tags: http://sethrobertson.github.io/GitBestPractices/#pubonce

Seems to me, if a tag changes, it's reasonable to ask folks to manually remove that dependency and reinstall it. Bower behaves this way - it's entirely built on git tags, and keeps a local cache of that tag for future installs of that version until you manually remove it.

Then you're only left with branch vs. tag differentiation. If you're pointing to a branch, yes, reinstall all the time. If a tag, leave it.

@domenic
Member
domenic commented Mar 25, 2014

it's reasonable to ask folks to manually remove that dependency and reinstall it.

-1, this is unreasonable. The fact that bower does this simply means that bower is conflating a version control system with a versioned registry.

@scottnonnenberg

@domenic I think you've hit the nail on the head. The problem is that we're acting like we can be 100% clean with a dependency straight to git. But git has behaviors that don't map directly to the npm registry. For example, a commit hash can be removed outright from a remote repository. It's another bad practice, but it can be done.

The point is that we need to design a system that works as most people expect it would, and works consistently. Given the definition of a tag, and the fact that it takes interesting longhand commands to change a tag in a remote repository, I think we'll continue to butt up against user expectation without a change.

@guybrush
Contributor

when you remove a git-hash like you said, it will just not install anything and throw an error (i.e. npm). it wont just install something.

also see http://blog.npmjs.org/post/77758351673/no-more-npm-publish-f

Even if a package is 100% unpublished by the author, and you publish a new version of a brand new thing, you won’t be able to use the version numbers that the previous author used.

The only way around this will be for a server admin to manually intervene.

The net benefit is that you won’t be surprised by having different things show up when you and someone else both have foo@1.2.3 installed. Either it’ll work, or it won’t, but it won’t ever be a completely different thing.

which comes pretty close to a git-hash, when you put a hash in your package.json you can be pretty sure you get what you meant (or nothing)

also my PR doesnt prevent you from using tags, its just pessimistic about tags/branches since they can move and thus the git-repo must be fetched every time

@scottnonnenberg

Hey guys. I appreciate that you took the time to talk through this with me thus far.

Full disclosure: I am using git+ssh dependencies with semver tags as a kind of local npm registry. It worked until that recent npm change, so I'm on npm@1.3.11 until either the previous tag behavior is restored, or I find a lightweight private npm registry I can use. I've found Reggie and sinopia so far - perhaps you guys have recommendations?

One final bit of support for the previous behavior: There's a lot of history on the side of caching tags - the second-most popular answer at this StackOverflow question on private npm modules, from 2011, talks about using git+ssh/tags. I'm pretty sure that the majority of the people using this technique either haven't upgraded to the latest node/npm yet, or just haven't noticed that their deploys are taking longer. They will, as I did, and consider it a breaking change.

So if we don't go back to the previous behavior, we should at least have a good set of alternative solutions for the private modules scenario.

@othiym23 othiym23 added this to the cache rewrite milestone Oct 3, 2014
@sandinmyjoints

What if there were a flag --no-refetch-git that skipped refetching already-installed git-specified deps entirely? Then using checked-in node_modules/ with git+ssh private repos as a poor man's private registry could use tags and branches, instead of only being able to use commit hashes. (Can't wait until private npm packages are available without having to run your own registry!)

@rluba
rluba commented Mar 11, 2015

This issue slows us down massively.

If you have a Git dependency that has any child dependency with native extensions, those extensions are recompiled upon every npm install. For example depending on bunyan or restify causes dtrace-provider to be compiled every time, karma recompiles fsevents, socket.io recompiles ws and so on.

Even fetching the branch/tag every time to check if the commit hash has changed before removing and reinstalling a Git dependency would already be a huge improvement.

@aearly
aearly commented Mar 19, 2015

👍 for a npm flag that skips re-checking git dependencies if the tag/branch id hasn't changed.

@othiym23 othiym23 added the git label Apr 24, 2015
@estani
estani commented Apr 29, 2015 edited

"--cache-only" or something that does not check anything (not only git) provided the already cached versions satisfied the requirements.
This also implies that once built, it will always build as long as the main dependencies are not modifed.

@gumallett
gumallett commented Jul 21, 2015 edited

The documentation for npm install should at least note this "quirk". The algorithm list here: https://docs.npmjs.com/cli/install, explicitly states that the dependency is only installed if not already in node_modules.

@wmertens
Contributor

So… Any news on one of the "light" solutions proposed? This is a huge waste of resources, for everyone.

@jordwalke

I would really love to see this finally addressed.

@benjarwar

+1 for @sandinmyjoints's idea of adding a flag.

@wmertens
Contributor

AFAICT, this seems to be solved in NPMv3. Simply running npm i does not update the git-based deps.

@jordwalke

@wmertens: But you do sometimes want to update the git dependencies, if the hash installed is not the same as the hash specified.

@wmertens
Contributor

I assume that is exactly what happens, I haven't tested that yet.

Wout.
(typed on mobile, excuse terseness)

@danmazzella

Now my Git repo is never updated after performing "npm i". Is this the desired outcome?

@aarononeal

Seems like we've gone from "don't always" to "never". npm install is never updating my git repo dependencies and npm outdated doesn't show them. I have to specifically call npm install <mymodule> to force the update which invalidates caching and triggers build updates even when there are none. I assume that's a bug and not by design?

@wmertens
Contributor
wmertens commented May 17, 2016 edited

Yes, I can confirm that this is the current behavior, even if you specify
tags and change them (imho indicating that you are very interested in an
update)

Ideally, npm should check if the git repo changed tag and rebuild.

@dgreensp

Is there any way to change a git URL so that npm install will cause an update? Commit hash to a different commit hash with a different semver version doesn't seem to trigger an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment