Package Environments

23Skidoo edited this page Mar 20, 2013 · 2 revisions

Imported from Trac wiki; be wary of outdated information or markup mishaps.

This page is for trying to clarify some ideas and design for cabal handling sets of packages people want to build.

Goal

Recently people have written various cabal extension/wrapper tools to address use cases that the main Cabal/cabal-install tool does not handle very well. These tools are each designed to address a few use cases, and the different tools overlap somewhat in what they cover.

The goal here is:

  • to see if we can identify some more unified idea and explain what the various existing tools are trying to do in terms of that idea;
  • then to come up with a design (mechanisms) that implements that idea; and
  • a user interface that covers the main use cases in a reasonably simple way, but hopefully sufficiently flexible to expose what the design can do

Example use cases

Developer working on a single package

The developer is actively developing a package. The package is in some intermediate state i.e. it doesn't make sense for other, not-in-development projects to depend on it (e.g. it might be buggy, have an unstable API at the moment.) The developer might want, in addition to building the package, to run tests and/or benchmarks.

Current workflow (assuming that cabal repl exists):

cabal configure --enable-tests --enable-benchmarks
cabal install --only-dependencies  # does this pull in test/benchmark deps?
cabal build
-or
cabal build && cabal test
-or-
cabal build && cabal benchmark
-or-
cabal repl

Problems with current workflow:

  • Dependencies might be added/removed during development, forcing the developer to re-run configure --enable-tests --enable-benchmarks, cabal install --only-dependencies.
  • It's not possible to build the package without configuring it and install its dependencies, thus those should be implied by build.
  • It's not possible to run tests/benchmarks without first building the package, so building should be implied by running cabal test/bench.

Ideal workflow:

cabal build
-or-
cabal test
-or-
cabal bench
-or-
cabal repl

Developer working on multiple local packages

Like "Developer working on a single package", except the developer also has to change one or more dependencies in order to make the current package work e.g. perhaps he/she needs to add a new function to one of the dependencies and use it in the package under development. The dependencies might not be owner by the developer so he/she can't simply change them and make a new release. He/she needs to use the modified version until the changes have been accepted upstream.

Current workflow:

cd dep
cabal configure && cabal build
cd pkg
cabal configure --package-db=../dep/dist/package.conf.inplace
# Like previous use case when it comes to running tests, install other
dependencies, etc.

Problems with current workflow:

  • Every time the package depended upon needs to be changed, the four steps above have to be repeated.
  • Otherwise the same as in "Developer working on a single package"

Ideal workflow (UI up for discussion):

cabal build --package-root=$HOME/src  # Looks for additional packages here

Similar for cabal test, cabal bench, and cabal repl

In-house developer team working on multiple packages with local hackage server

This is similar to the above "Developer working on multiple local packages" case but there is a team of programmers producing and consuming packages. Development versions of packages are shared using source control, and developers can publish versions of packages to the local hackage server. Developers work using a combination of packages from hackage.haskell.org, the local hackage server (which has some bug fix versions of 3rd party packages as well as packages published by the team), and packages checked out from source control. There are also testing servers which build from the local hackage server rather than from source repos.

Existing tools

cabal-dev

cabal-dev provides what it calls a sandbox for source packages and installed packages. It has a command line interface that is very similar to that of cabal, so that it can be used as a drop-in replacement. It is implemented as a wrapper around the cabal command.

For source packages, the sandboxing mean providing a local source package set that overrides the global package index. Tarballs can be added to this index. It provides a command cabal-dev add-source /path/to/source/code which generate an sdist snapshot of the given package and adds that tarball to the local source index.

For installed packages, the sandboxing means that packages are not registered into the user or global ghc package database. The global package db is used, so it is recommended that the global package db is only used for the ghc core libraries. This approach conflicts with using distribution packages for non-core libraries, because they are installed into the global db.

The user interface provides two ways to install a package into a sandbox, either to add the source package into the sandbox, or to install a package into the sandbox. In the latter case the source is not available if something needed to be rebuilt (e.g. needed profiling version later).

Note that when source packages are added to the sandbox, it is a snapshot of the package, not a live link to another build tree. This is probably not by design, but a limitation of cabal that cabal-dev cannot easily fix.

The default install location for cabal-dev is the sandbox. This means it only works with packages that are prefix independent. Libraries or programs that use the Paths_''pkgname'' module, e.g. to find data files would expect to find those files also in the sandbox. This is ok for running inplace but not if applications will be installed to some system location in the end. This would be better if we had a reliable way to build prefix-independent packages (or fail for ones that are not prefix-independent).

When you do cabal-dev add-source to add a source package to the sandbox, we think that just makes that version available, and it does not mask all other versions of that package. One has to rely on the added source package having a higher version, and rely on the solver to pick the highest version (which is will if possible, but will fall back to older versions if necessary).

cabal-src

cabal-src is intended to solve the problem that cabal does not know about the source versions of local packages, so it cannot use those source packages in its dependency planning. It has a command line interface that is very similar to that of cabal, so that it can be used as a drop-in replacement. It is implemented as a wrapper around the cabal command.

Ordinarily, if you cabal install in a local directory, cabal knows only about the packages that are already installed, and the source packages available from hackage. It does not know about other local build trees. If you make a change to a package and install it, then go to build another local package then cabal will usually use the instance of the package that you just installed. However this is not always possible: to use consistent versions of dependencies it is sometimes necessary to rebuild a package. This is where the problem occurs, if cabal cannot see that source package then it cannot rebuild from that source. This is the problem that cabal-src tries to address.

cabal-src address the problem by taking a snapshot of a source package and inserting it into cabal's local source index. It modifies the ~/.cabal/config file to tell cabal to look in this local index.

This is in a way similar to what cabal-dev add-source does, but it does it for the user's default environment rather than for a local sandbox.

cabal-meta

cabal-meta is intended to solve the same problem as cabal-src but it solves it in a different way. It has a command line interface that is very similar to that of cabal, so that it can be used as a drop-in replacement. It is implemented as a wrapper around the cabal-src or cabal-dev commands (which are themselves wrappers around the cabal command).

cabal-meta lets the user list the locations of local source build trees in a file. When the user runs cabal-meta install it compiles each of the packages and uses cabal-src to add the source package into the local source package index. In a sense, cabal-meta is a declarative approach compared to the imperative approach of cabal-src. Where cabal-src actively inserts source packages into the local source package index, with cabal-meta you declare what source packages should be used.

Additionally it allows per-package configuration flags to be specified in the local file. Another feature is that git repositories can be used as locations of source packages. These git repo is synced each time the user runs cabal-meta install.

virthualenv / hsenv

virthualenv (recently renamed hsenv but not released at the time of writing) is a tool to create what it calls isolated Haskell environments.

It lets the user start shell sessions where the usual Haskell tools (ghc, cabal etc) only use and install packages from the isolated environment. It is implemented by setting environment variables, the PATH and GHC_PACKAGE_PATH such that the ghc and cabal commands will install packages only within the local environment.

This solves part of the same problem that cabal-dev solves, providing a sandbox for installed packages (but not for source packages as cabal-dev does).

cab

The cab commands is described as a MacPorts-like maintenance command of Haskell cabal packages. It can do various things that cabal, cabal-dev and ghc-pkg can do, but with a different command line interface. Its stated goal is for the command line interface to cover the functionality of these other tools in a more consistent way.

In particular it provides commands for:

  • unregistering packages (cabal only lets you install and users have to use ghc-pkg unregister to remove them)
  • listed "outdated" packages
  • list dependencies and reverse dependencies of a package
  • check consistency of installed packages (like ghc-pkg check)

It also has a flag for using a sandbox (making use of cabal-dev).

sandboxer

sandboxer is a simple shell script wrapper around cabal-dev that overrides the default sandbox location with one of the user's choice. This way, multiple projects can share the same sandbox, even if the development/build environment doesn't provide options for specifying the cabal-dev sandbox. It works by setting GHC_PACKAGE_PATH to $SANDBOX_LOCATION:$SYSTEM_LOCATION and replacing the system cabal-dev with a wrapper that calls cabal-dev --sandbox=$SANDBOX_LOCATION. sandboxer works only on *nix-like systems.

Package environments idea

Roles: The package author and package builder are distinct roles (though in many use cases the same individual may fill both roles):

  • The package author specifies information in the package .cabal file.
  • The package environment is controlled by the person/agent doing the build.

A package environment consists of:

  • source package set
  • installed package store
  • constraints for package versions and flags
  • other build configuration (profiling, optimisation, C lib locations etc)

The source package set is a finite mapping of source package id to a location where we can find a source package. This includes references to source tarballs (remotely or locally) and references to local build trees.

The installed package store is a location where packages are installed, plus a package database where installed packages are registered.

The constraints include:

  • package version constraints, like "foo == 1.0"
  • configuration flags for particular packages (that is the "flags" stuff in .cabal files)
  • enabling/disabling of test suites or benchmarks These constraints influence which source packages can be used and the configuration influence what the dependencies of those packagea are.

Other build configuration includes most of the other command line flags that package builders can currently specify when they do cabal configure:

  • enable/disable profiling
  • dynamic or static linking
  • where to look for C libraries
  • optimisation settings
  • etc

Mechanisms for an implementation

Extended source package index format

The existing hackage index format gives us a source package set but has the limitation that it cannot refer to local build trees.

It is relatively straightfoward to generalise the existing hackage index format to make it possible to refer to local tarballs or directories.

Design summary here: http://www.haskell.org/pipermail/cabal-devel/2011-May/007557.html

TODO: what about references to repositories (darcs, git etc) ?

Representation of package environment

The obvious choice here is a config style file that represents as a file the same info that one can usually specify on the cabal command line. It could be the same or similar to the per user ~/.cabal/config file format (with some usability improvements). The nice thing about this approach is these sets of configurations can be merged, so we can merge with the per user config file (or not, if we're doing an isolated build) and similarly could merge in other local config files (some include mechanism).

User interface

Package environment file

Straw man proposal

inherit: ~/.cabal/config
-- Or don't inherit to make an isolated environment

-- The source package index
local-repo: ./cabalenv/index.tar

-- The installed package store
install-dirs
  prefix: ./cabalenv/
package-db: ./cabalenv/package.conf.d

-- constraints for package versions and flags
constraints:
  foo == 1.0,
  foo +use_blah -use_blurb

-- Other build configuration
with-compiler: /usr/local/bin/ghc-7.4.0.20111219
optimisation: O2
library-profiling: True
extra-include-dirs:
extra-lib-dirs:

Issues:

  • some of the field names are perhaps not what we really want. The names above are what the current ~/.cabal/config uses. We can change it, though we should keep them consistent with each other.
  • using the prefix be the local env is wrong. We probably don't really want the sandbox to be the final install location. It's just a staging area.

Cabal-meta combines listing the location of packages with constraints for those packages. It's not clear if that's a good idea. However it does seem like a good idea to be able to have them in the same file at least. Perhaps instead of just allowing local-repo: ./cabalenv/index.tar and the links only being mentioned in there, we should allow listing them directly in this file. This would have the limitation that you cannot include whole .cabal files or tarballs by value, but you could still list links to local packages.

sandbox command

Sandboxes are created, inspected, modified, and deleted through the cabal sandox command:

  • cabal sandbox init creates a new sandbox.

  • cabal sandbox delete deleted the sandbox, including all packages installed in the sandbox (i.e. in .cabal-sandbox).

  • cabal sandbox add-source adds a source package to the sandbox package database.

  • In the future we might add commands for showing the packages installed in the sandbox, the sandbox location (i.e. directory), etc.

sandbox init creates a new sandbox by creating a specially named file, cabal.sandbox.config, in the root directory of the package. This file contains a local-repo and a install-dirs prefix that points to the sandbox (i.e. to .cabal-sandbox/packages and .cabal-sandbox respectively). If this file is present, all builds in the package's directory will use the sandbox.

The cabal.sandbox.config file is not normally edited by users and should contain a warning on top of the file saying that it's machine generated. If the user wants to add e.g. package constraints she should use the (new) cabal.config file, which doesn't exist by default but can be added to the package's root directory by the user.

By using two different package environment files (both recognized by Cabal through their special names) allows us to have one file that the user can edit without worrying about the system overwriting any changes she made (including comments and whitespace layout).

Even though the cabal.sandbox.config file is normally not edited by the user we decided to keep it visible (i.e. not in .cabal-sandbox) so the user can edit it if desired. We can revisit this choice in the future if this proves to confuse users.

The config files have the following priorities, from higher to lower:

  1. cabal.sandbox.config
  2. cabal.config
  3. ~/.cabal/config

Once a sandbox have been created, all cabal commands (e.g. configure, build, and install) will make use of it automatically.

Source packages that have been added ("linked") into the sandbox by add-source are rebuilt (and reconfigured, if needed) and installed into the sandbox each time the main package is built. We most likely want to optimize this process some time in the future, to avoid rebuilding/relinkining when neccesary.