Clone this wiki locally
Imported from Trac wiki; be wary of outdated information or markup mishaps.
Guide to the Cabal source code
On first look the Cabal code seems large and intimidating. This page is intended to give you a head start in understanding it.
All the Cabal modules live under
The modules can be roughly divided into two groups:
The declarative modules: They are mostly concerned with data structures like package descriptions. These modules live under
Distribution.*. Much of the code in these modules are utility functions for handling the data types and also functions for parsing and showing them.
The active modules: They are concerned with actually doing things like configuring, building and installing packages. These modules live under
According to SLOCCount Cabal is currently about 23,500 lines of code. This breaks down as about 7,000 lines for the declarative part and about 16,500 for the active part. Most modules are less than a few hundred lines, though there are a couple monsters nearer 1,000 lines.
Language features and packages
Cabal is 100% Haskell. It uses hierarchical modules, a little bit of FFI in places and some CPP. It is otherwise Haskell 98. This is important since it has to work with Hugs, nhc98, jhc as well as ghc.
A further constraint is that because Cabal is used by both GHC and
Hugs to bootstrap the libraries, it can itself only depend on other
boot libraries, and only those shipped with all compilers and
available on all OSs. This means we cannot depend on various other
common packages like parsec or mtl, or GHC-specific packages like
template-haskell. We also avoid the package
the equivalent hierarchical modules in the
base package. That
currently leaves array, base, bytestring, containers, directory,
filepath, old-locale, old-time, pretty, process and random, but the
fewer dependencies the better.
Really dull modules
- Distribution/GetOpt.hs (source) (no docs - hidden module): This should live under Compat/ it's just a bundled version of the standard GetOpt. Not very interesting.
Some simple data types
Distribution/Version.hs (source) (docs): exports the
Versiontype along with a parser and pretty printer. A version is something like "1.3.3". It also defines
Dependencydata types. Version ranges are like ">= 1.2 && < 2". A dependency is a package name and a version range, like "foo >= 1.2 && < 2".
Distribution/Package.hs (source) (docs): defines a package identifier along with a parser and pretty printer for it.
PackageIdentifiers consist of a name and an exact version (exact version as opposed to a dependency like above that uses a version range).
Distribution/Verbosity.hs (source) (docs): a simple
Verbositytype with associated utilities. There are 4 standard verbosity levels from
Deafening. This is used for deciding what logging messages to print in the active parts.
Distribution/Compiler.hs (source) (docs): This has an enumeration of the various compilers that Cabal knows about. It also specifies the default compiler. Sadly you'll often see code that does case analysis on this compiler flavour enumeration like:
case compilerFlavor comp of GHC -> GHC.getInstalledPackages verbosity packageDb progconf JHC -> JHC.getInstalledPackages verbosity packageDb progconf
Obviously it would be better to use the proper
Compilerabstraction because that would keep all the compiler-specific code together. Unfortunately we cannot make this change yet without breaking the
UserHooksapi, which would break all custom
Setup.hs files, so for the moment we just have to live with this deficiency. If you're interested, see issue #57.
Distribution/System.hs (source) (docs): Cabal often needs to do slightly different things on specific platforms. You probably know about the
System.Info.os :: Stringhowever using that is very inconvenient because it is a string and different Haskell implementations do not agree on using the same strings for the same platforms! (In particular see the controversy over "windows" vs "ming32"). So to make it more consistent and easy to use we have an
Distribution/License.hs (source) (docs): The
.cabalfile allows you to specify a license file. Of course you can use any license you like but people often pick common open source licenses and it's useful if we can automatically recognise that (eg so we can display it on the hackage web pages). So you can also specify the license itself in the
.cabalfile from a short enumeration defined in this module. It includes
The package description data types
Distribution/ParseUtils.hs (source) (no docs - hidden module): The
.cabalfile format is not trivial, especially with the introduction of configurations and the section syntax that goes with that. This module has a bunch of parsing functions that is used by the
.cabalparser and a couple others. It has the parsing framework code and also little parsers for many of the formats we get in various
.cabalfile fields, like module names, comma separated lists etc.
Distribution/PackageDescription.hs (source) (docs): This defines the data structure for the
.cabalfile format. There are several parts to this structure. It has top level info and then
Executablesections each of which have associated
BuildInfodata that's used to build the library or exe. To further complicate things there is both a
GenericPackageDescription. This distinction relates to [Cabal configurations](Cabal configurations). When we initially read a
.cabalfile we get a
GenericPackageDescriptionwhich has all the conditional sections. Before actually building a package we have to decide on each conditional. Once we've done that we get a
PackageDescription. It was done this way initially to avoid breaking too much stuff when the feature was introduced. It could probably do with being rationalised at some point to make it simpler.
Distribution/PackageDescription/Configuration.hs (source) (docs): This is about the [Cabal configurations](Cabal configurations) feature. It exports
flattenPackageDescriptionwhich are functions for converting
PackageDescriptions. It has code for working with the tree of conditions and resolving or flattening conditions.
Distribution/PackageDescription/Parse.hs (source) (docs): This defined parsers and partial pretty printers for the
.cabalformat. Some of the complexity in this module is due to the fact that we have to be backwards compatible with old
.cabalfiles, so there's code to translate into the newer structure.
Distribution/PackageDescription/Check.hs (source) (docs): This has code for checking for various problems in packages. There is one set of checks that just looks at a
PackageDescriptionin isolation and another set of checks that also looks at files in the package. Some of the checks are basic sanity checks, others are portability standards that we'd like to encourage. There is a
PackageChecktype that distinguishes the different kinds of check so we can see which ones are appropriate to report in different situations. This code gets uses when configuring a package when we consider only basic problems. The higher standard is uses when when preparing a source tarball and by hackage when uploading new packages. The reason for this is that we want to hold packages that are expected to be distributed to a higher standard than packages that are only ever expected to be used on the author's own environment.
Distribution/InstalledPackageInfo.hs (source) (docs): The
.cabalfile format is for describing a package that is not yet installed. It has a lot of flexibility like conditionals and dependency ranges. As such that format is not at all suitable for describing a package that has already been built and installed. By the time we get to that stage we have resolved all conditionals and resolved dependency version constraints to exact versions of dependent packages. So this module defines the
InstalledPackageInfodata structure that contains all the info we keep about an installed package. There is a parser and pretty printer. The textual format is rather simpler than the
.cabalformat, there are no sections for example. This is the format that
Useful internal abstractions
Distribution/Simple/Program.hs (source) (docs): This provides an abstraction which deals with configuring and running programs. A
Programis a static notion of a known program. A
Programthat has been found on the current machine and is ready to be run (possibly with some user-supplied default args). Configuring a program involves finding its location and if necessary finding its version. There is also a
ProgramConfigurationtype which holds configured and not-yet configured programs. It is the parameter to lots of actions elsewhere in Cabal that need to look up and run programs. If we had a Cabal monad, the
ProgramConfigurationwould probably be a reader or state component of it.
The module also defines all the known built-in
defaultProgramConfigurationwhich contains them all.
Distribution/Simple/Command.hs (source) (docs): This is to do with command line handling. The Cabal command line is organised into a number of named sub-commands (much like darcs). The
Commandabstraction represents one of these sub-commands, with a name, description, a set of flags.
Commands can be associated with actions and run. It handles some common stuff automatically, like the
--help and command line completion flags. It is designed to allow other tools make derived commands. This feature is used heavily in cabal-install.
Distribution/Simple/InstallDirs.hs (source) (docs): This manages everything to do with where files get installed (though does not get involved with actually doing any installation). It provides an
InstallDirstype which is a set of directories for where to install things. It also handles the fact that we use templates in these install dirs. For example most install dirs are relative to some
$prefixand by changing the prefix all other dirs still end up changed appropriately. So it provides a
PathTemplatetype and functions for substituting for these templates.
Distribution/Simple/Compiler.hs (source) (docs): This should be a much more sophisticated abstraction than it is. Currently it's just a bit of data about the compiler, like it's flavour and name and version. The reason it's just data is because currently it has to be in
Showso it can be saved along with the
LocalBuildInfo. The only interesting bit of info it contains is a mapping between language extensions and compiler command line flags. This module also defines a
PackageDBtype which is used to refer to package databases. Most compilers only know about a single global package collection but GHC has a global and per-user one and it lets you create arbitrary other package databases. We do not yet support this latter feature very much.
Distribution/Simple/PreProcess.hs (source) (docs): This defines a
PreProcessorabstraction which represents a pre-processor that can transform one kind of file into another. There is also a
PPSuffixHandlerwhich is a combination of a file extension and a function for configuring a
PreProcessor. It defines a bunch of known built-in preprocessors like cpp, cpphs, c2hs, hsc2hs, happy, alex etc and lists them in
knownSuffixHandlers. On top of this it provides a function for actually preprocessing some sources given a bunch of known suffix handlers. This module is not as good as it could be, it could really do with a rewrite to address some of the problems we have with pre-processors.
Distribution/Simple/Utils.hs (source) (docs): A large and somewhat miscellaneous collection of utility functions used throughout the rest of the Cabal lib and in other tools that use the Cabal lib like cabal-install. It has a very simple set of logging actions. It has low level functions for running programs, a bunch of wrappers for various directory and file functions that do extra logging.
Distribution/Simple/LocalBuildInfo.hs (source) (docs): Once a package has been configured we have resolved conditionals and dependencies, configured the compiler and other needed external programs. The
LocalBuildInfois used to hold all this information. It holds the install dirs, the compiler, the exact package dependencies, the configured programs, the package database to use and a bunch of miscellaneous configure flags. It gets saved and reloaded from a file (
dist/setup-config). It gets passed in to very many subsequent build actions.
Particular phases or actions within the build process
- configure the compiler
- resolves any conditionals in the package description
- resolve the package dependencies
- check if all the extensions used by this package are supported by the compiler
- check that all the build tools are available (including version checks if appropriate)
- checks for any required pkg-config packages (updating the
BuildInfowith the results)
Then based on all this it saves the info in the
LocalBuildInfoand writes it out to a file. It also displays various details to the user, the amount of information displayed depending on the verbosity level.
Distribution/Simple/Build.hs (source) (docs): This is the entry point to actually building the modules in a package. It doesn't actually do much itself, most of the work is delegated to compiler-specific actions. It does do some non-compiler specific bits like running pre-processors. There's some stuff to do with generating makefiles which is a well hidden feature that's used to build libraries inside the GHC build system but which we'd like to kill off and replace with something better (doing our own dependency analysis properly). Half the module is dedicated to generating the
Paths_pkgnamemodule. This is a module that Cabal generates for the benefit of packages. It enables them to find their version number and find any installed data files at runtime. This code should probably be split off into another module.
Distribution/Simple/Haddock.hs (source) (docs): This module deals with the haddock and hscolour commands. Sadly this is a rather complicated module. It deals with two versions of haddock (0.x and 2.x). It has to do pre-processing for haddock 0.x which involves
unliting and using
-D__HADDOCK__for any source code that uses cpp. It has to call ghc-pkg to find the locations of documentation for dependent packages, so it can create links. The hscolour support allows generating html versions of the original source, with coloured syntax highlighting.
Distribution/Simple/Register.hs (source) (docs): This module deals with registering and unregistering packages. There are a couple ways it can do this, one is to do it directly. Another is to generate a script that can be run later to do it. The idea here being that the user is shielded from the details of what command to use for package registration for a particular compiler. In practice this aspect was not especially popular so we also provide a way to simply generate the package registration file which then must be manually passed to ghc-pkg. It is possible to generate registration information for where the package is to be installed, or alternatively to register the package inplace in the build tree. The latter is occasionally handy, and will become more important when we try to build multi-package systems. This module does not delegate anything to the per-compiler modules but just mixes it all in in this module, which is rather unsatisfactory. The script generation and the unregister feature are not well used or tested.
Distribution/Simple/SrcDist.hs (source) (docs): This handles the
sdistcommand. The module exports an
sdistaction but also some of the phases that make it up so that other tools can use just the bits they need. In particular the preparation of the tree of files to go into the source tarball is separated from actually building the source tarball. The sdist action also does some distribution QA checks.
Distribution/Simple/GHC.hs (source) (docs): This is a fairly large module. It contains most of the GHC-specific code for configuring, building and installing packages. It also exports a function for finding out what packages are already installed. Configuring involves finding the ghc and ghc-pkg programs, finding what language extensions this version of ghc supports and returning a
getInstalledPackagesinvolves calling the ghc-pkg program to find out what packages are installed. Building is somewhat complex as there is quite a bit of information to take into account. We have to build libs and programs, possibly for profiling and shared libs. We have to support building libraries that will be usable by GHCi and also ghc's
-split-objsfeature. We have to compile any C files using ghc. Linking, especially for
split-objsis remarkably complex, partly because there tend to be 1,000's of .o files and this can often be more than we can pass to the ld or ar programs in one go. There is also some code for generating
Makefiles but the less said about that the better. Installing for libs and exes involves finding the right files and copying them to the right places. One of the more tricky things about this module is remembering the layout of files in the build directory (which is not explicitly documented) and thus what search dirs are used for various kinds of files.
Stuff related to the front end
Distribution/Simple/UserHooks.hs (source) (docs): This defines the API that
Setup.hsscripts can use to customise the way the build works. This module just defines the
UserHookstype. The predefined sets of hooks that implement the
Configurebuild systems are defined in
UserHooksis a big record of functions. There are 3 for each action, a pre, post and the action itself. There are few other miscellaneous hooks, ones to extend the set of programs and preprocessors and one to override the function used to read the
.cabalfile. This hooks type is widely agreed to not be the right solution. Partly this is because changes to it usually break custom
Setup.hsfiles and yet many internal code changes do require changes to the hooks. For example we cannot pass any extra parameters to most of the functions that implement the various phases because it would involve changing the types of the corresponding hook. At some point it will have to be replaced.
Distribution/Simple/Setup.hs (source) (docs): This is a big module, but not very complicated. The code is very regular and repetitive. It defines the command line interface for all the Cabal commands. For each command (like
buildetc) it defines a type that holds all the flags, the default set of flags and a
Commandthat maps command line flags to and from the corresponding flags type. All the flags types are instances of
Monoid, see http://www.haskell.org/pipermail/cabal-devel/2007-December/001509.html for an explanation. The types defined here get used in the front end and especially in
cabal-installwhich has to do quite a bit of manipulating sets of command line flags. This is actually relatively nice, it works quite well. The main change it needs is to unify it with the code for managing sets of fields that can be read and written from files. This would allow us to save configure flags in config files.
Distribution/Simple/SetupWrapper.hs (source) (docs): This is a wrapper around calling
Setup.hsscripts. It is slightly more cunning than just calling
runghc Setup.hs args.... First of all, it checks the
.cabalfile and sees if it specifies any particular version of Cabal. It also checks the
build-type. If the
build-typeis anything other than
Customand the version of Cabal required is compatible then it does not run
Setup.hsat all, instead it directly calls
defaultMainArgs. This is a good deal quicker than compiling the
Setup.hsscript. On the other hand, if the
build-typeis custom or the version of Cabal specified is not compatible with the version being used, then it tried to compile the
Setup.hsscript with an appropriate version of the Cabal library. This aspect is currently only implemented for ghc. Nothing in the Cabal lib uses this module, it is provided for
Distribution/Simple.hs (source) (docs): This is the command line front end to the
Simplebuild system. The original idea was that there could be different build systems that all presented the same compatible command line interfaces. There is still a
Makesystem (see below) but in practice no packages use it. This module exports the main functions that
Setup.hsscripts use. It re-exports the
UserHookstype, the standard entry points like
defaultMainWithHooksand the predefined sets of
Setup.hsscripts can extend to add their own behaviour.
Distribution/Make.hs (source) (docs): This is an alternative build system that delegates everything to the
makeprogram. All the commands just end up calling make with appropriate arguments. The intention was to allow preexisting packages that used makefiles to be wrapped into Cabal packages. In practice essentially all such packages were converted over to the Simple build system instead. Consequently this module is probably not used much and it certainly only sees cursory maintenance and no testing. Perhaps at some point we should stop pretending that it works.