Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging and resolution (buildTimeOnly dependencies) #816

Closed
jordwalke opened this issue Oct 19, 2016 · 10 comments
Closed

Packaging and resolution (buildTimeOnly dependencies) #816

jordwalke opened this issue Oct 19, 2016 · 10 comments

Comments

@jordwalke
Copy link
Member

jordwalke commented Oct 19, 2016

We need to support multiple versions of packages being installed. Supporting multiple packages being installed does not mean we support linking multiple versions of packages into one final executable (that is not even well defined at this point, though it's worth solving eventually).
So what then do we need to support immediately? Well, often, dependencies are only required in order to assist in helping to generate build artifacts that will eventually be linked into the final executable. This build tooling itself is never linked into a final executable. This ends up opening a ton of opportunities to relax the requirement that there be one global version per package.

This is not just something that is unique to Reason or compiled languages. Anything with an equivalent "link" step (such as bundling of web resources) would benefit from the same ability to flatten, but relax flattening constraints so that dependencies merely required for building artifacts do not increase the surface area for flattening conflicts.

Build time dependencies are a good opportunity to relax flattening requirements because build tooling could have tons of dependencies, all of which might conflict with versions that our app requires at run-time. Since there's no conceptual conflict, we want to allow multiple package versions that are only used at build time (and with no runtime component) to exist on disk, and be used by their dependees in ways we know are always well defined.

Let's assume that findlib is powering all of the lookups for "where my dependencies are". We have complete control over constructing what findlib sees while building a particular package, and that's what I'll describe here - how we construct what is visible, to which packages, and at which steps of the build process. Most packages use findlib to find packages. rebel does not by default, but we can change that.

Some definitions:

  • compiling: Producing a single linkable compilation artifact.
  • linking: Joining all of those individual linkable compilation artifacts into a single executable artifact.
  • artifact: Either a single compilation unit, or a final executable are both considered "artifacts".
  • build-time dependency: A --> B means A has a "build-time" dependency on another package B.
    A --> B iff: Some of B's artifacts are required in order to generate A's artifacts (either individual compilation artifacts, or executable artifact). Build-time dependencies are not inherently transitive, meaning just because A has a build time dependency on B it does not mean that A has any kind of of dependency on B's dependencies. The intuition is that typically build-time dependencies are dependencies on build utilities such as jenga/rebel/make, and you only care about that immediate dependency's (executable) artifact and have no idea how that executable artifact was created, or what dependencies were needed to generate it.
  • run-time dependency: A ==> B means A has a run-time dependency on B.
    A ==> B iff: linking A's compilation artifacts into an executable implies that B's artifacts must also be linked into that executable. It follows that A ==> B && B ==> C implies A ==> C. run-time dependencies are inherently transitive, but we can further distinguish "immediate run-time dependency" from "transitive run-time dependency" if needed.

We start with some constraints:

  • Turning A into an executable, requires supplying all of A's run-time dependencies (which by definition must be transitively computed).
  • There may be at most one version for any package P, whose artifacts are linked into any single executable.

The package manager will ask every package to "build yourself". When asked to do so, we apply a convention which allows multiple versions of a package P to, in some cases, be installed in the sandbox, but while maintaining the constraints above. We break the build process into two stages, "compiling" and "linking".

  • compiling a package A's means producing its linkable/executable artifacts, and requires running A's "compiling step". During the compiling step, A should only be allowed to "see" artifacts from the subset of run-time dependencies that are immediate, as well as the artifacts of its build-time dependencies (which are by definition always immediate).
  • linking an executable for package A means joining together all of the transitive run-time dependencies of A together into one binary. During the linking step, A should be able to "see" all of the artifacts of its run-time dependencies (which means transitively) and the artifacts of its build-time dependencies (which are by definition always immediate). Not every package will want to produce an executable, so the linking portion of the build step may be skipped.
  • By default any package listed in dependencies is considered a run-time ==> dependency.
  • To mark A's dependency on B as build-time --> only, add the name "B" to A's package.json's buildTimeOnly field. This prevents B's artifacts (and B's transitive artifacts) from being visible to package Z's linking step (supposing that Z has a run-time dependency on A).

Here's how this solves various problems we encounter:

  1. Suppose flappy-bird ==> yojson-1.0.0 (run-time) and flappy-bird produces an executable. There cannot be two versions of yojson linked into the final flappy-bird executable. However, that doesn't mean that there can't be two versions of yojson existing inside the package sandbox, and being used at other times in the whole build process.

For example, suppose flappy-bird ==> yojson-1.0.0(run-time) and flappy-bird --> rebel because flappy-bird included rebel in its buildTimeOnly list (this makes sense because flappy-bird just needs the rebel binary to start its build process). Then suppose that rebel ==> yojson-1.2 (run-time). Normally, if our package manager had to resolve everything to a single version per package we'd be out of luck because both yojson-1.0.0 and yojson-1.2.0 are both needed at some point. But in this case, there is no real conflict because rebel needs yojson-1.2.0 at its run-time, in order to generate a single executable artifact rebel. flappy-bird then depends on that rebel artifact in order to build, but does not require that rebel artifact at run time. Furthermore, since build-time dependencies aren't transitive, whild building flappy-bird sees rebel, and its version of yojson-1.0.0, and does not see any of rebel's dependencies.

  1. Then suppose that same flappy-bird depends on reason-1.0.0 , and suppose that yojson-1.0.0 depends on reason-1.2.0. Reason introduces no runtime artifacts that are linked into the final executable so flappy-bird and yojson both mark their reason as buildTimeOnly depenency. flappy-bird only needs reason-1.0.0 so that it can use the binary refmt-1.0.0 at build time, and yojson only depends on reason-1.2.0 so that it can use the refmt-1.2.0 at build time.

(devDependencies have nothing to do with any of this - they are useless to us and don't do what we want at all).
(peerDependencies could have been useful but npm3 completely changed their behavior so let's ignore them).
(yarn's --flat installation is not what we want/need here, because we want to allow multiple versions to exist, but we wish that yarn had a --flatten-runtime option which would flatten runtime-dependencies, but allow multiple versions of buildTimeOnly dependencies. For now, we can just live with npm installing multiple versions for everything and then flatten after the fact. Hopefully yarn can provide better support here).

@yunxing @vramana

@jordwalke
Copy link
Member Author

cc Yarn peeps @kittens @bestander @cpojer @dxu
and opam peeps @avsm

@jordwalke jordwalke changed the title Packaging and package resolution needs. Packaging and resolution (buildTimeOnly dependencies) Oct 19, 2016
@dxu
Copy link
Contributor

dxu commented Oct 19, 2016

This makes a lot of sense, the big problem with the dedupe of the dependencies during resolution was the transitive dependencies you explain above allowing for potentially tons of different package versions.

The approach of adding the buildTimeOnly field seems like it'd break compatibility for the majority of existing cases though. Also, this is probably due to my own lack of personal experience releasing projects that used npm as a primary build tool, but is it viable to rely on developers to properly specify the packages as buildTimeOnly? I feel like there are already people who are dependencies that should be devDependencies or peerDependencies, having them understand the above and make sure to keep track might not be feasible. I don't know if this group of users and the group of users publishing packages would necessarily overlap, though. I ask because this seems like something that you'd need to be fairly strictly all in on in order to work - if one package incorrectly contains buildTimOnly dependencies as a dependency, then it'll potentially break anything its included in.

I'm wondering if there's a way to automate this. It would be nice if there were some post-processing step during the publishing of the npm package where it just went through and maybe parsed the scripts and checked against the bin's of the dependencies or something to see if they were build time only. That way you wouldn't have to rely on the developer specifying fields, and also would be an offloaded task that you could potentially just run on all existing packages as well if it worked.

Also, not really sure what the plans are for yarn and compatibility with the existing registry but it seems like this would force it to start diverging further from npm packages. e.g, why not just have runTimeDependencies and buildTimeDependencies instead of dependencies at that point?

@jordwalke
Copy link
Member Author

but is it viable to rely on developers to properly specify the packages as buildTimeOnly

I'm glad you asked. We discussed allowing a package to mark itself as buildTimeOnly package. Then the final classification would be the or of the two. It's good to be able to mark another package as buildTimeOnly even if it didn't mark itself. I described it as I did above just for simplicity.

@jordwalke
Copy link
Member Author

but is it viable to rely on developers to properly specify the packages as buildTimeOnly? I feel like there are already people who are dependencies that should be devDependencies or peerDependencies,

peerDependencies is misused because it's very complicated, isn't implemented correctly (the solver doesn't find solutions) and its behavior was greatly changed in npm3. For devDependencies, well I don't know why people don't use them correctly. But in npm with JS, there's little incentive to use them correctly because people are okay with just including 50 copies of jQuery. We would have a --flaten-runtime that would make those cases stand out and people would hopefully fix them.

@jordwalke
Copy link
Member Author

The approach of adding the buildTimeOnly field seems like it'd break compatibility for the majority of existing cases though

I don't believe this is true, but feel free to explain a specific case.

@yunxing
Copy link
Contributor

yunxing commented Oct 19, 2016

I think we still need "peerDependencies" here somehow for cases like this:

A --> reason ==> ocaml@4.2
   --> ocaml@*

In this case, reason only produces artifacts with layout of ocaml@4.2 , while if A resolves to ocaml@4.3, it won't understand the artifacts produced by reason.

@jordwalke
Copy link
Member Author

Maybe this issue applies more broadly (does it?) but in the case of reason, we want it to parse to a compiler version agnostic format that can work with any ocaml version. I thought that would solve this problem, but maybe it still comes up other places?

@dxu
Copy link
Contributor

dxu commented Oct 21, 2016

I don't believe this is true, but feel free to explain a specific case.

Sorry, my choice of words was misleading here. I meant to say that by adding the new buildTimeOnly field, all existing cases of packages that have any dependencies that should be marked buildTimeOnly, won't follow the convention, and can likely cause problems with any packages that require them (or any package that requires a package that requires them, etc).

We discussed allowing a package to mark itself as buildTimeOnly package.

Does this mean the developer marks itself as buildTimeOnly? or we somehow automatically detect if a package (or dependency) is buildTimeOnly? The former would have the same issue for existing packages, but the latter would be completely fine.

@jordwalke
Copy link
Member Author

jordwalke commented Oct 23, 2016

@dxu My thinking was that a package B'spackage.json field buildTimeOnly would allow B to say "Hey, you probably only want to use me at build time only".

This allows B the ability to make itself hard to misuse.

We have a couple of options for how to best use this information.

  • Option 1: We can let B's marking of itself as buildTimeOnly sufficient for all tooling to properly consider all uses of B to be buildTimeOnly.
  • Option 2: B's self-marking is not sufficient for it to be treated as buildTimeOnly. A package A that depends on B, must include B in A/package.json's buildTimeOnlyDependencies list in order for it to fully be considered a build time only package. Then what good was B's buildTimeOnly: true field? Well, we can use B's buildTimeOnly field to validate that people have remembered to mark B in their buildTimeOnlyDependencies field.

I prefer option 2 as it would be much more clear to people who read A's package.json.

So what can we do once we classify and validate B as a build time only dependency? We've gained more flexible/relaxed flattening in the case of B. It's nice if B's version can be flattened globally per npm sandbox and the package manager should try, but it's not necessary. This avoids more conflicts.

all existing cases of packages that have any dependencies that should be marked buildTimeOnly, won't follow the convention, and can likely cause problems with any packages that require them (or any package that requires a package that requires them, etc).

Since I preferred option 2 above, it means that B marking itself isn't the thing that makes it a build time only dependency. A's package.json makes B a build time dependency. B's self-marking just allows us to validate that usage is correct. So with option 2, you can still use all existing build tooling even if it didn't mark itself as built-time-only tooling.

@jaredly
Copy link
Contributor

jaredly commented Jun 14, 2018

This is quite old, and some of things is going into the esy package manager. I'm going to close this for now.

@jaredly jaredly closed this as completed Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants