.. proposal-number:: Leave blank. This will be filled in when the proposal is accepted.
.. trac-ticket:: Leave blank. This will eventually be filled with the Trac ticket number which will track the progress of the implementation of the feature.
.. implemented:: Leave blank. This will be filled in with the first GHC version which implements the described feature.
This is a proposal for adding per-component configure to the Cabal Setup script API. A prototype implementation is available at #3644.
A package (e.g.,
bytestring) consists of multiple components (e.g., a
library and its test suite). There are several motivations behind having
multiple components in a package:
- It is convenient to not have to write a second package just to hold the test suite for a library. Part of this is historically motivated, as in the old days it was far more difficult to do development on a collection of packages than it was on a single package. But there is large amount of metadata, e.g., authorship, versioning, the package description, which would be needlessly duplicated if you needed to publish a separate package.
- The package serves as an abstraction barrier: components internal to a package have access to APIs that package authors don't want to expose to the wider world. One common "trick" which is used to achieve this (e.g. as seen in cabal-install), is that a test library will literally recompile the library modules it depends on to get access to private APIs that are not otherwise available. Similarly, test suites and benchmarks are not externally useful, except as support code for a library; it makes little sense to put them into separate packages.
- A test suite will usually have a different set of dependencies than the library proper. Assuming that you want both the library and the test suite in the same package, you need a way to specify distinct sets of dependencies for each of them, and the component is the logical distinction between the two.
Unfortunately, components were not designed into Cabal at the beginning,
and thus Cabal's support for components was not so much designed as
evolved. But there is a basic guiding principle that components are an
internal implementation detail of a package: the
configures and builds a package as a whole, and internally this may mean
that several components get built. Over the years we have accumulated
options which violate this principle (
ComponentDeps in cabal-install) but the core
architectural principle still holds.
In this document, I propose a component oriented design for Cabal, where every essential operation is done on a per-component basis. I am not the first to have done so: @snoyberg has previously asked for this very feature. The purpose of this proposal is to fully flesh out the implications of the change.
This design has multiple benefits:
- It permits building only specific components for packages, without needing all of the dependencies of the package as a whole to be built. (#1725) This is helpful, e.g., if a library also has executable components: if a downstream package only needs the library, time and space can be saved by building the library without building the components.
- It makes it possible for package managers like cabal-install and Stack to organize builds on a per-component basis, improving parallelism (e.g., when a package contains many independent executables) and making it easier to avoid unnecessary rebuilds when a new component is built. (#2775) One particular example is that you can build a test suite for a package without building the library.
- It solves the cyclic dependency problem where a test suite depends on a testing library that depends on the the library being tested. Now the test suite can be built independently from the library, after the testing library has been built.
- It allows for an architecturally sound implementation of Backpack, as the process of instantiating requirements of a Backpack only makes sense for libraries, rather than an entire package (an executable and test suite cannot be instantiated in any meaningful sense).
I have divided this proposal into two parts: first the changes necessary for the Cabal library (which is shared between cabal-install and Stack), and then a description of how cabal-install can take advantage of this new feature. There is not a Stack portion to this proposal, but commentary from the Stack team would be appreciated.
Proposed Change: Cabal
The status quo. Currently, the Cabal library defines an command line
API via the
./Setup configure command. A conventional invocation of
this command from, e.g.,
cabal-install, might look like this:
./Setup configure --ghc --prefix=$HOME/.cabal --user --ipid=package-name-1.0-abcdefg --dependency=array=array-0.5.1.0-abcdef1234567 --dependency=base=base-188.8.131.52-abcdef123456 --dependency=tasty=tasty-1.0-abcdef123456 --enable-tests --exact-configuration --disable-benchmarks
There are a few important flags:
--ipid specifies the installed
package identity that the package will be built under: this is
identifier is a unique identifier chosen by the package manager (for
cabal new-build computes this identifier by hashing all of
the possible inputs into the build, with the hope that the hash uniquely
determines all of the build products of the package), and
--dependency=pkgname=ipid-1.0-abcdef specifies a specific dependency
to be used during the build of all the components.
--exact-configuration requires ALL dependencies of all enabled
components to be specified explicitly. Notice that this particular
--enable-tests passed (thus enabling the building of the
test suite) and specifies which version of
tasty (a test library)
should be used. This command configures both the library and the test
suite of this package simultaneously.
configure command takes no positional arguments.
The proposal. We extend
configure to take a single argument
specifying which component is to be configured:
./Setup configure test-suite
It is invalid to pass multiple arguments. Under this mode, the configure command behaves differently in the following ways:
Subsequent commands, e.g., build, no longer use an inplace package database (conventionally in
dist/package.conf.inplace). It is expected that all dependencies live in the database stack specified by the user.
The set of required
--dependencyflags for an
--exact-configurationis changed to precisely the
build-dependsof the component being enabled. Dependencies from non-enabled components do not have to be specified (similar to how
--disable-testsworks today). However, an internal dependency, e.g., from a test suite to the library, must now be specified as
--dependency=packagename=pkgipid-1.0-abcdefg123. In the presence of convenience libraries, there may be multiple such extra "internal" dependencies. These are disambiguated from ordinary dependencies by inspection of the package description.
We replace the
--ipidflag with a
--cidflag, which specifies the component identifier of the component being built. The user must pick a distinct
--cidfor each component in the package, as distinct from an installed package identifier which was global for a package. This flag is only valid when a component is explicitly specified to be built. For backwards compatibility, the
--ipidflag retains its original meaning as an identifier for the package as a whole: thus when we build a component with it, the IPID is qualified with the name of the component to form a component id.
If a library contains a
build-toolsdependency on an executable, if the library is being configured by itself, it is expected that the caller arrange for the executable to be present on the
PATH, so that the build tool configuration process succeeds. (For example, if I have an internal exe
fooand a component
barwhich has a
build-toolsdependency on it, you are guaranteed that
foois part of the build graph for
While a user can still explicitly specify installation paths for various files the component may install, we will apply different defaults to these paths:
Variable Current New ----------------------------------------- bindir $prefix/bin (same) libsubdir $abi/$libname $abi/$cid datasubdir $abi/$pkgid $abi/$cid docdir $datadir/doc/$abi/$pkgid $datadir/doc/$abi/$cid
As you might expect, subsequent
register, etc commands operate only on the enabled component.
There are a few expectations as to how this per-component configure interface is intended to be used by a client:
- If the client reuses the same source directory to build multiple
components, it must assign a distinct build directory
--distdirto each component build.
- The installation directories configured for each component are expected to be disjoint for each component. This is encouraged by the different default install directories.
- A component must be copied and registered to a package database, before any of its (same-package) dependents can be built. (This is unlike the current behavior, where the component is automatically registered to an inplace database.)
- The ordering components are built should respect internal
Interaction with hooks. This proposal leaves the hooks API
unchanged. Thus, just as before, hooks are responsible for determining
what components are enabled before performing the operations they need.
If they operate on the available
ComponentLocalBuildInfo hooks should
automatically work properly with these changes (even better than
manually specifying which components to build at
./Setup build time)
Interaction with Haddock. Initially,
haddock will be kept as it is
today: a command which builds Haddock documentation for every enabled
component (one in this case). However, eventually, it would be good to
treat building Haddock documentation as a "component" in and of itself,
so it can be generated and installed independently of building any
component (indeed, Haddock is completely independent of a build).
Thus, every component would also have a corresponding Haddock component
for the documentation.
The primary complication is that cross-linking for Haddock documentation
is implemented by storing paths for haddock interfaces and HTMLs in
the package database
haddock-html). Clearly, since we want
to build Haddocks independently of libraries, this should no longer
be done. I suggest we workaround this problem by making "fake"
installed package database entries to represent Haddock documentation:
Haddock components form an independent, parallel dependency tree
to the library tree.
Interaction with package common files. Cabal packages can be bundled
with package-common files (specified by the
data-files field), which
are installed to a location common over all components inside a package.
Initially, our plan is that
./Setup copy for a single enabled
component will copy those package-common files to the install
directory of that component; thus, for a package with multiple
installable components, the common files will be duplicated across
each component install. A refinement would be to treat data file
installation as a "component" of its own, which the libraries and
executables of a package can depend on.
Another way to solve this problem is to introduce per-component
Interaction with Backpack. A component id encodes all of the
non-Backpack dependency configuration about a component; in the absence
of Backpack, it uniquely identifies the unit of compilation that is,
e.g., installed to a (now in-aptly named) installed package database.
With Backpack, we further qualify these identifiers with a module
substitution, which specifies how we instantiate the various
requirements (unfilled module implementations) of a component (using the
Advisory note on convenience libraries. Internal dependencies
permit a package to shadow packages that exist on Hackage: for
example, if I define an internal library named
and references to
build-depends refer to
the internal library, not the Hackage library.
When these internal libraries are installed to the package database,
it is expedient to distinguish an internal library from the
actual Hackage package. The way this is done is that the registration
information for an internal library has its package name mangled
into the form
Proposed Change: cabal-install
We now describe how cabal-install can be engineered to make use of this feature. The primary complication is the relationship between dependency solving, which is necessarily a per-package affair, and building, which is a per-component affair.
Dependency solving status quo. The dependency solving process picks which versions of packages to use, and produces a solver install plan, which is a graph of resolver packages. Resolver packages are either pre-existing packages from the package database or planned packages that need to be built: planned packages are identified only by a package identifier, and also store other solver information such as the flag assignment that was picked. These solver plan is the converted into an install plan in a one-to-one correspondence (mapping package identifiers into installed package identifiers).
When setup dependencies were added to cabal-install, cabal-install's
dependency solver was also refined to keep track of dependencies on a
per-component basis (
ComponentDeps). Thus, although the graph of
resolver packages is determined by the full, combined dependencies of
all components in a package, it's also possible to determine the
dependencies for a specific component. At the moment, this is only used
to build Setup scripts (which constitute a component of its own) with
only the necessary packages.
The proposal. Dependency solving naturally is a package-level affair, since versions are ascribed to packages, not components. However, we would like build components individually: thus, we need to expand a package-based solver plan into a component-based install plan. This is how it works:
- For each package, we resolve the conditionals (using the solver
provided flag assignments) in order to produce a
PackageDescription, which is essentially a Cabal file with all conditionals stripped off. Prior to this step, we don't even know what the components of the package necessarily will be (a component can be marked as un-buildable through the dependency solving process).
- We can now convert a graph of packages into a graph of components. Each
resulting component inherits only the dependencies of that component,
as specified by
ComponentDeps. The component identity is defined by looking at the input dependencies of the component, as well other options which would affect the build. Note that in a solver plan, the dependencies refer to packages, not components: these package references are resolved to the public library component of a package in question. After componentization,
ComponentDepsis eliminated from the install plan.
- A component is built by configuring its member package for solely the component in question and then building it in a distinct build directory. Execution of the install plan can now be parallelized over components, rather than packages.
Determining if non-libraries are already installed. In general, we
can determine if a library is already installed by consulting the
installed package database; however, no such database exists for
executables. However, in
new-build, executables are installed to a
deterministic directory in the Nix store; thus, we can simply check if
the directory already exists in order to determine if an executable has
already been built.
Note that we cannot use this strategy for libraries, because we do need to load the InstalledPackageInfo (in particular, this information is used by Backpack when mix-in linking).
Interaction with packages with legacy Custom scripts. We can only build a package on a per-component basis if the Setup script is sufficiently new and supports this interface. If it does not, we cannot do so. We determine the version of the Setup script in (1) and if it is not supported, we treat the package as a legacy node in the component graph. Its component ID is simply the component ID of its public library (if it has one; if it does not, no matter, as it's not possible to refer to this package as a dependency in any case.)
Interaction with Backpack. Backpack needs to perform mix-in linking on components, and then a further expansion step to instantiate components. Mix-in linking occurs as we are expanding the package graph into a component graph, while instantiation happens as a separate step after expansion.
- This feature will not be compatible with Custom setup scripts that are linked against an old version of Cabal.
data-fileswill be duplicated until we consider these files a "component" in-and-of-themselves.
- File paths for installed things will change from where they are currently being stored.
- We need to
configurethe package for each component, rather than once for all components. This is particularly terrible for packages with setup type
Configure, since the configure script can take quite a long time to run. Passing
-Cto the configure script (via
cabal.project.local) can help if you are rebuilding.
- This may require BC-breaking changes to the Cabal API (though I will work hard to minimize these.)
build-dependsto point to libraries rather than packages means that depending on a package with an executable does not imply that the executable will also be installed. (This misfeature is occasionally used in the wild on Hackage. More information at cabal#3661.)
An alternate design I considered was to not extend
with a per-component mode. Instead, a package would be configured once,
and then the package manager would use a newly added
flag to build components individually (or in parallel.) However, I
decided that this approach would not be hermetic enough. It also
turned out to be difficult to work into the existing
code, although that is arguably a bug.
./Setupscripts accept the arguments
--enable-library-profilingapply to a package as a whole. With per-component configure, these options can be applied to a component specifically; thus,
--enable-executable-profilingdoesn't make much sense if you're just configuring a library. Should we introduce new variants of these flags which are not "component" qualified? How in
cabal-installcan you ask for only a specific executable to be profiled?