Skip to content
This repository has been archived by the owner on Aug 11, 2022. It is now read-only.

multi-stage install #5919

Closed
isaacs opened this issue Aug 11, 2014 · 9 comments
Closed

multi-stage install #5919

isaacs opened this issue Aug 11, 2014 · 9 comments
Assignees
Milestone

Comments

@isaacs
Copy link
Contributor

isaacs commented Aug 11, 2014

When the user types npm install

  1. Read the current tree of installed packages. Mark each of these as already existing.
  2. Walk this tree checking the dependencies of each node.
  3. For each missing dep, fetch the package data and create a "phantom" node at its point in the tree. Then, repeat step 2 for the new phantom nodes.
  4. De-duplicate all phantom nodes (leaving extant nodes in place)
  5. Now we have a de-duplicated list of packages. Fetch them all to cache. (All networking happens here.)
  6. BFS-walk the tree, unpacking into phantom nodes, making them real. (If there is any error at this point or beyond, abort and rimraf all phantom nodes.)
  7. BFS-walk the tree, running "preinstall" scripts on each newly installed node.
  8. BFS-walk the tree, running "install" scripts on each newly installed node.
  9. BFS-walk the tree, running "postinstall" scripts on each newly installed node.
  10. Success. Print info about what was installed.
@othiym23 othiym23 added this to the multi-stage install milestone Aug 11, 2014
@othiym23
Copy link
Contributor

I'd like to see a clear separation between building up the representation of the "real" (what's available to the app on disk) tree and building up the "ideal" (what's specified in npm-shrinkwrap.json and / or package.json) tree, and then having a reconciliation process that generates the list of operations necessary to converge the real on the ideal. This creates a simple, deterministic constraint-solving step in the middle that can be pushed a whole bunch of different directions in the future.

@isaacs
Copy link
Contributor Author

isaacs commented Aug 11, 2014

Good point. I think that the "read, eval, operate" sections are somewhat implicit in the algorithm above, but they should be explicit. How's this?

read stage

  1. Read the current tree of installed packages. (Mark each of these as already existing.)

eval stage

  1. Walk this tree checking the dependencies of each node.
  2. For each missing dep, fetch the package data and create a "phantom" node at its point in the tree. Then, repeat step 2 for the new phantom nodes.
  3. De-duplicate all phantom nodes (leaving extant nodes in place)

generate list of operations

  1. For each required tarball, add "fetch tarball" to operation list. (Make sure this has no duplicates.)
  2. BFS walk the tree to get set of phantom nodes.
  3. For each phantom node, add "unpack tarball into place" to operation list. Add "rimraf phantom node" to fail list.
  4. For each phantom node, add "run preinstall" to operation list.
  5. For each phantom node, add "run install" to operation list.
  6. For each phantom node, add "run postinstall" to operation list.

do operations

  1. Do all the things.
  2. If anything fails, do the fail list.
  3. Else, success! Print info about what was installed.

@isaacs
Copy link
Contributor Author

isaacs commented Aug 12, 2014

Of course, this is only the most simple situation (install missing deps). Nothing to clobber, no shrinkwraps, etc.

The "eval stage" for a shrinkwrap install:

  1. Walk this tree checking where a node does not match the corresponding node in the shrinkwrap tree.
  2. IF the node is missing, add the phantom node to the tree.
  3. ELSE if the node is different, we have to replace a node.
  4. ELSE if the node exists and shouldn't, we have to remove a node.

To "replace" a node, we do a "remove" operation followed by an "add" operation.

To "remove" a node:

  1. Generate a temporary name $tmp. (Something like ./node_modules/.path-to-pkg-$pid maybe?)
  2. Add "move $folder to $tmp" to the operation list.
  3. Add "move $tmp to $folder" to the rollback list.
  4. Add "remove $tmp" to the cleanup list.

So, changing a node for a shrinkwrap swap would be something like:

  1. generate temp name $tmp
  2. Add "move $folder to $tmp" to operation list
  3. Add "unpack tgz to $folder" to operation list
  4. Add "rimraf $folder" to rollback list.
  5. Add "mv $tmp to $folder" to rollback list.
  6. Add "remove $tmp" to cleanup list.

And then I guess after doing the "operation" list, we do the "cleanup" list, which should be things that can fail, but even if they do fail, then it doesn't bork the install, so we exit non-zero, but don't try to roll back (since the "cleanup" items will bork a rollback, and won't be attempted until after we have everything in the desired state anyway.)

@othiym23
Copy link
Contributor

This is what I've had kicking around my head for the last month or two. It's not entirely dissimilar from what you're proposing, but there are some key differences. It also probably needs another couple iterations to incorporate more of the edge cases around shrinkwrapping and deduping.

All traversals are breadth-first, unless otherwise specified.

[R]ead

  1. Read the current tree of installed packages:
    1. Traverse node_modules, reading each package's manifest (if available)
  2. Build the ideal tree of desired packages:
    1. If npm-shrinkwrap.json exists, convert that into the in-memory tree.
    2. If npm-shrinkwrap.json does not exist but package.json does exist:
      1. Read the dependencies from package.json.
      2. For each dependency, read the package's metadata (from the cache and / or the registry) and use that to populate the dependency tree.
    3. If neither npm-shrinkwrap.json nor package.json exist, create an ideal tree consisting solely of the package(s) provided as arguments to npm install and their dependencies, as described in step 2.ii.

[E]valuate

  1. Evaluate the current tree:
    1. Traverse the current tree, resolving missing packages using node's module resolution algorithm
    2. Assign any remaining missing modules to a list.
  2. Evaluate the ideal tree:
    1. Insert nodes for all of the missing modules from the list provided in step 1.
    2. Check the ideal tree for peerDependency conflicts.
    3. If npm-shrinkwrap.json does not exist, dedupe the ideal tree.
  3. Converge the current tree on the ideal tree:
    1. Perform a depth-first traversal of the current and ideal trees in lockstep.
    2. Anywhere there are child nodes in the ideal tree but not the current tree, find the highest ancestor in the tree and add it and all its children to the list of installs to be performed.
    3. Anywhere there are nodes in the ideal tree that differ from the current tree, add an install for the ideal tree's version to the list of installs to be performed.

[A]pply

  1. For each entry in the list of installs to be performed:
    1. Ensure that the package's tarball is cached and valid.
    2. If an install will overwrite an existing dependency:
      1. Safely rename the existing module.
      2. Push (original name, renamed name) onto the cleanup list
    3. Unpack the package's tarball into place.
  2. After all unpacking has been done:
    1. Call run preinstall for the package.
    2. Call run install for the package.
    3. Call run postinstall for the package.
  3. Run through the cleanup list, unlinking the renamed directories.

On error:

  1. For each entry in the list of installs to be performed, remove its package directory, if it exists.
  2. Run through the cleanup list, restoring the renamed directories to their original locations.

[D]isplay

Print out the list of what was installed (and optionally the list of what was cleaned up, as well as the current and ideal trees).

@bmeck
Copy link
Contributor

bmeck commented Aug 25, 2014

I would just like to note that this seems like it will allow the use of temporary directories to be used to build during apply, which is one of the main pain points of Windows currently, so +1

@iarna iarna self-assigned this Sep 8, 2014
@iarna iarna mentioned this issue Sep 18, 2014
43 tasks
@NickHeiner
Copy link

I think this will make things much nicer. It would also be nice if programmatic users could plug into the pipeline at multiple points, similar to how browserify allows for extensibility: https://github.com/substack/node-browserify#bpipeline.

@iarna
Copy link
Contributor

iarna commented Sep 23, 2014

@NickHeiner Finer grained detail than the current lifecycle scripts is plausible. If you'd like to open an issue to discuss what your use case is, I'll bring it over into this milestone.

@manvalls
Copy link

Let's say we have two packages, A and B. A depends upon B, and B depends upon A. Someone wants to contribute with A, grabs its code from github, navigate to its directory and runs npm install . Wouldn't we end with the following tree?

  • A
    • B
      • A

Is there any posibility to end with a tree like this?

  • A
    • B
      • symlink to A, something like ../../..

And if it could be possible, would it be cached by require? Thanks for all the awesome work :)

EDIT: it seems that placing the A folder dropped from github inside a directory named "node_modules" solves the problem

@iarna
Copy link
Contributor

iarna commented Dec 15, 2014

As the multi-stage install project has its own milestone and is now broken out into specific issues, I'm going to close this.

@iarna iarna closed this as completed Dec 15, 2014
@npm npm locked and limited conversation to collaborators Jun 24, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants