Add a user guide section on package concepts

ghc · Sep 20, 2013 · abed563 · abed563
1 parent 1c183eb
commit abed563
Showing 1 changed file with 226 additions and 0 deletions.
diff --git a/Cabal/doc/developing-packages.markdown b/Cabal/doc/developing-packages.markdown
@@ -244,6 +244,232 @@ program may contain multiple modules with the same name if they come
 from separate packages; in all other current Haskell systems packages
 may not overlap in the modules they provide, including hidden modules.
 
+
+# Package concepts #
+
+Before diving into the details of writing packages it helps to
+understand a bit about packages in the Haskell world and the particular
+approach that Cabal takes.
+
+### The point of packages ###
+
+Packages are a mechanism for organising and distributing code. Packages
+are particularly suited for "programming in the large", that is building
+big systems by using and re-using code written by different people at
+different times.
+
+People organise code into packages based on functionality and
+dependencies. Social factors are also important: most packages have a
+single author, or a relatively small team of authors.
+
+Packages are also used for distribution: the idea is that a package can
+be created in one place and be moved to a different computer and be
+usable in that different environment. There are a surprising number of
+details that have to be got right for this to work, and a good package
+system helps to simply this process and make it reliable.
+
+Packages come in two main flavours: libraries of reusable code, and
+complete programs. Libraries present a code interface, an API, while
+programs can be run directly. In the Haskell world, library packages
+expose a set of Haskell modules as their public interface. Cabal
+packages can contain a library or executables or both.
+
+Some programming languages have packages as a builtin language concept.
+For example in Java, a package provides a local namespace for types and
+other definitions. In the Haskell world, packages are not a part of the
+language itself. Haskell programs consist of a number of modules, and
+packages just provide a way to partition the modules into sets of
+related functionality. Thus the choice of module names in Haskell is
+still important, even when using packages.
+
+### Package names and versions ###
+
+All packages have a name, e.g. "HUnit". Package names are assumed to be
+unique. Cabal package names can use letters, numbers and hyphens, but
+not spaces. The namespace for Cabal packages is flat, not hierarchical.
+
+Packages also have a version, e.g "1.1". This matches the typical way in
+which packages are developed. Strictly speaking, each version of a
+package is independent, but usually they are very similar. Cabal package
+versions follow the conventional numeric style, consisting of a sequence
+of digits such as "1.0.1" or "2.0". There are a range of common
+conventions for "versioning" packages, that is giving some meaning to
+the version number in terms of changes in the package. Section [TODO]
+has some tips on package versioning.
+
+The combination of package name and version is called the _package ID_
+and is written with a hyphen to separate the name and version, e.g.
+"HUnit-1.1". 
+
+For Cabal packages, the combination of the package name and version
+_uniquely_ identifies each package. Or to put it another way: two
+packages with the same name and version are considered to _be_ the same.
+
+Strictly speaking, the package ID only identifies each Cabal _source_
+package; the same Cabal source package can be configured and built in
+different ways. There is a separate installed package ID that uniquely
+identifies each installed package instance. Most of the time however,
+users need not be aware of this detail.
+
+### Kinds of package: Cabal vs GHC vs system ###
+
+It can be slightly confusing at first because there are various
+different notions of package floating around. Fortunately the details
+are not very complicated.
+
+Cabal packages
+:   Cabal packages are really source packages. That is they contain
+    Haskell (and sometimes C) source code.
+
+    Cabal packages can be compiled to produce GHC packages. They can
+    also be translated into operating system packages.
+
+GHC packages
+:   This is GHC's view on packages. GHC only cares about library
+    packages, not executables. Library packages have to be registered
+    with GHC for them to be available in GHCi or to be used when
+    compiling other programs or packages.
+
+    The low-level tool `ghc-pkg` is used to register GHC packages and to
+    get information on what packages are currently registered.
+
+    You never need to make GHC packages manually. When you build and
+    install a Cabal package containing a library then it gets registered
+    with GHC automatically.
+
+    Haskell implementations other than GHC have essentially the same
+    concept of registered packages. For the most part, Cabal hides the
+    slight differences.
+
+Operating system packages
+:   On operating systems like Linux and Mac OS X, the system has a
+    specific notion of a package and there are tools for installing and
+    managing packages.
+
+    The Cabal package format is designed to allow Cabal packages to be
+    translated, mostly-automatically, into operating system packages.
+    They are usually translated 1:1, that is a single Cabal package
+    becomes a single system package.
+
+    It is also possible to make Windows installers from Cabal packages,
+    though this is typically done for a program together with all of its
+    library dependencies, rather than packaging each library separately.
+
+
+### Unit of distribution ###
+
+The Cabal package is the unit of distribution. What this means is that
+each Cabal package can be distributed on its own in source or binary
+form. Of course there may dependencies between packages, but there is
+usually a degree of flexibility in which versions of packages can work
+together so distributing them independently makes sense.
+
+It is perhaps easiest to see what being ``the unit of distribution''
+means by contrast to an alternative approach. Many projects are made up
+of several interdependent packages and during development these might
+all be kept under one common directory tree and be built and tested
+together. When it comes to distribution however, rather than
+distributing them all together in a single tarball, it is required that
+they each be distributed independently in their own tarballs.
+
+Cabal's approach is to say that if you can specify a dependency on a
+package then that package should be able to be distributed
+independently. Or to put it the other way round, if you want to
+distribute it as a single unit, then it should be a single package.
+
+
+### Explicit dependencies and automatic package management ###
+
+Cabal takes the approach that all packages dependencies are specified
+explicitly and specified in a declarative way. The point is to enable
+automatic package management. This means tools like `cabal` can resolve
+dependencies and install a package plus all of its dependencies
+automatically. Alternatively, it is possible to mechanically (or mostly
+mechanically) translate Cabal packages into system packages and let the
+system package managager install dependencies automatically.
+
+It is important to track dependencies accurately so that packages can
+reliably be moved from one system to another system and still be able to
+build it there. Cabal is therefore relatively strict about specifying
+dependencies. For example Cabal's default build system will not even let
+code build if it tries to import a module from a package that isn't
+listed in the `.cabal` file, even if that package is actually installed.
+This helps to ensure that there are no "untracked dependencies" that
+could cause the code to fail to build on some other system.
+
+The explicit dependency approach is in contrast to the traditional
+"./configure" approach where instead of specifying dependencies
+declarativly, the `./configure` script checks if the dependencies are
+present on the system. Some manual work is required to transform a
+`./configure` based package into a Linux distribution package (or
+similar). This conversion work is usually done by people other than the
+package author(s). The practical effect of this is that only the most
+popular packages will benefit from automatic package managment. Instead,
+Cabal forces the original author to specify the dependencies but the
+advantage is that every package can benefit from automatic package
+managment.
+
+The "./configure" approach tends to encourage packages that adapt
+themselves to the environment in which they are built, for example by
+disabling optional features so that they can continue to work when a
+particular dependency is not available. This approach makes sense in a
+world where installing additional dependencies is a tiresome manual
+process and so minimising dependencies is important. The automatic
+package managment view is that packages should just declare what they
+need and the package manager will take responsibility for ensuring that
+all the dependencies are installed.
+
+Sometimes of course optional features and optional dependencies do make
+sense. Cabal packages can have optional features and varying
+dependencies. These conditional dependencies are still specified in a
+declarative way however and remain compatible with automatic package
+management. The need to remain compatible with automatic package
+management means that Cabal's conditional dependencies system is a bit
+less flexible than with the "./configure" approach.
+
+### Portability ###
+
+One of the purposes of Cabal is to make it easier to build packages on
+different platforms (operating systems and CPU architectures), with
+different compiler versions and indeed even with different Haskell
+implementations. (Yes, there are Haskell implementations other than
+GHC!)
+
+Cabal provides abstractions of features present in different Haskell
+implementations and wherever possible it is best to take advantage of
+these to increase portability. Where necessary however it is possible to
+use specific features of specific implementations.
+
+For example a package author can list in the package's `.cabal` what
+language extensions the code uses. This allows Cabal to figure out if
+the language extension is supported by the Haskell implementation that
+the user picks. Additionally, certain language extensions such as
+Template Haskell require special handling from the build system and by
+listing the extension it provides the build system with enough
+information to do the right thing.
+
+Another similar example is linking with foreign libraries. Rather than
+specifying GHC flags directly, the package author can list the libraries
+that are needed and the build system will take care of using the right
+flags for the compiler. Additionally this makes it easier for tools to
+discover what system C libraries a package needs, which is useful for
+tracking dependencies on system libraries (e.g. when translating into
+linux distro packages).
+
+In fact both of these examples fall into the category of explicitly
+specifying dependencies. Not all dependencies are other Cabal packages.
+Foreign libraries are clearly another kind of dependency. It's also
+possible to think of language extensions as dependencies: the package
+depends on a Haskell implementation that supports all those extensions.
+
+Where compiler-specific options are needed however, there is an "escape
+hatch" available. The developer can specify implementation-specific
+options and more generally there is a configuration mechanism to
+customise many aspects of how a package is built depending on the
+Haskell implementation, the operating system, computer architecture and
+user-specified configuration flags.
+
+
 ## Creating a package ##
 
 Suppose you have a directory hierarchy containing the source files that