Permalink
Fetching contributors…
Cannot retrieve contributors at this time
683 lines (526 sloc) 30.5 KB
=====================================
The Internal Structure of Python Eggs
=====================================
STOP! This is not the first document you should read!
.. contents:: **Table of Contents**
----------------------
Eggs and their Formats
----------------------
A "Python egg" is a logical structure embodying the release of a
specific version of a Python project, comprising its code, resources,
and metadata. There are multiple formats that can be used to physically
encode a Python egg, and others can be developed. However, a key
principle of Python eggs is that they should be discoverable and
importable. That is, it should be possible for a Python application to
easily and efficiently find out what eggs are present on a system, and
to ensure that the desired eggs' contents are importable.
There are two basic formats currently implemented for Python eggs:
1. ``.egg`` format: a directory or zipfile *containing* the project's
code and resources, along with an ``EGG-INFO`` subdirectory that
contains the project's metadata
2. ``.egg-info`` format: a file or directory placed *adjacent* to the
project's code and resources, that directly contains the project's
metadata.
Both formats can include arbitrary Python code and resources, including
static data files, package and non-package directories, Python
modules, C extension modules, and so on. But each format is optimized
for different purposes.
The ``.egg`` format is well-suited to distribution and the easy
uninstallation or upgrades of code, since the project is essentially
self-contained within a single directory or file, unmingled with any
other projects' code or resources. It also makes it possible to have
multiple versions of a project simultaneously installed, such that
individual programs can select the versions they wish to use.
The ``.egg-info`` format, on the other hand, was created to support
backward-compatibility, performance, and ease of installation for system
packaging tools that expect to install all projects' code and resources
to a single directory (e.g. ``site-packages``). Placing the metadata
in that same directory simplifies the installation process, since it
isn't necessary to create ``.pth`` files or otherwise modify
``sys.path`` to include each installed egg.
Its disadvantage, however, is that it provides no support for clean
uninstallation or upgrades, and of course only a single version of a
project can be installed to a given directory. Thus, support from a
package management tool is required. (This is why setuptools' "install"
command refers to this type of egg installation as "single-version,
externally managed".) Also, they lack sufficient data to allow them to
be copied from their installation source. easy_install can "ship" an
application by copying ``.egg`` files or directories to a target
location, but it cannot do this for ``.egg-info`` installs, because
there is no way to tell what code and resources belong to a particular
egg -- there may be several eggs "scrambled" together in a single
installation location, and the ``.egg-info`` format does not currently
include a way to list the files that were installed. (This may change
in a future version.)
Code and Resources
==================
The layout of the code and resources is dictated by Python's normal
import layout, relative to the egg's "base location".
For the ``.egg`` format, the base location is the ``.egg`` itself. That
is, adding the ``.egg`` filename or directory name to ``sys.path``
makes its contents importable.
For the ``.egg-info`` format, however, the base location is the
directory that *contains* the ``.egg-info``, and thus it is the
directory that must be added to ``sys.path`` to make the egg importable.
(Note that this means that the "normal" installation of a package to a
``sys.path`` directory is sufficient to make it an "egg" if it has an
``.egg-info`` file or directory installed alongside of it.)
Project Metadata
=================
If eggs contained only code and resources, there would of course be
no difference between them and any other directory or zip file on
``sys.path``. Thus, metadata must also be included, using a metadata
file or directory.
For the ``.egg`` format, the metadata is placed in an ``EGG-INFO``
subdirectory, directly within the ``.egg`` file or directory. For the
``.egg-info`` format, metadata is stored directly within the
``.egg-info`` directory itself.
The minimum project metadata that all eggs must have is a standard
Python ``PKG-INFO`` file, named ``PKG-INFO`` and placed within the
metadata directory appropriate to the format. Because it's possible for
this to be the only metadata file included, ``.egg-info`` format eggs
are not required to be a directory; they can just be a ``.egg-info``
file that directly contains the ``PKG-INFO`` metadata. This eliminates
the need to create a directory just to store one file. This option is
*not* available for ``.egg`` formats, since setuptools always includes
other metadata. (In fact, setuptools itself never generates
``.egg-info`` files, either; the support for using files was added so
that the requirement could easily be satisfied by other tools, such
as distutils).
In addition to the ``PKG-INFO`` file, an egg's metadata directory may
also include files and directories representing various forms of
optional standard metadata (see the section on `Standard Metadata`_,
below) or user-defined metadata required by the project. For example,
some projects may define a metadata format to describe their application
plugins, and metadata in this format would then be included by plugin
creators in their projects' metadata directories.
Filename-Embedded Metadata
==========================
To allow introspection of installed projects and runtime resolution of
inter-project dependencies, a certain amount of information is embedded
in egg filenames. At a minimum, this includes the project name, and
ideally will also include the project version number. Optionally, it
can also include the target Python version and required runtime
platform if platform-specific C code is included. The syntax of an
egg filename is as follows::
name ["-" version ["-py" pyver ["-" required_platform]]] "." ext
The "name" and "version" should be escaped using the ``to_filename()``
function provided by ``pkg_resources``, after first processing them with
``safe_name()`` and ``safe_version()`` respectively. These latter two
functions can also be used to later "unescape" these parts of the
filename. (For a detailed description of these transformations, please
see the "Parsing Utilities" section of the ``pkg_resources`` manual.)
The "pyver" string is the Python major version, as found in the first
3 characters of ``sys.version``. "required_platform" is essentially
a distutils ``get_platform()`` string, but with enhancements to properly
distinguish Mac OS versions. (See the ``get_build_platform()``
documentation in the "Platform Utilities" section of the
``pkg_resources`` manual for more details.)
Finally, the "ext" is either ``.egg`` or ``.egg-info``, as appropriate
for the egg's format.
Normally, an egg's filename should include at least the project name and
version, as this allows the runtime system to find desired project
versions without having to read the egg's PKG-INFO to determine its
version number.
Setuptools, however, only includes the version number in the filename
when an ``.egg`` file is built using the ``bdist_egg`` command, or when
an ``.egg-info`` directory is being installed by the
``install_egg_info`` command. When generating metadata for use with the
original source tree, it only includes the project name, so that the
directory will not have to be renamed each time the project's version
changes.
This is especially important when version numbers change frequently, and
the source metadata directory is kept under version control with the
rest of the project. (As would be the case when the project's source
includes project-defined metadata that is not generated from by
setuptools from data in the setup script.)
Egg Links
=========
In addition to the ``.egg`` and ``.egg-info`` formats, there is a third
egg-related extension that you may encounter on occasion: ``.egg-link``
files.
These files are not eggs, strictly speaking. They simply provide a way
to reference an egg that is not physically installed in the desired
location. They exist primarily as a cross-platform alternative to
symbolic links, to support "installing" code that is being developed in
a different location than the desired installation location. For
example, if a user is developing an application plugin in their home
directory, but the plugin needs to be "installed" in an application
plugin directory, running "setup.py develop -md /path/to/app/plugins"
will install an ``.egg-link`` file in ``/path/to/app/plugins``, that
tells the egg runtime system where to find the actual egg (the user's
project source directory and its ``.egg-info`` subdirectory).
``.egg-link`` files are named following the format for ``.egg`` and
``.egg-info`` names, but only the project name is included; no version,
Python version, or platform information is included. When the runtime
searches for available eggs, ``.egg-link`` files are opened and the
actual egg file/directory name is read from them.
Each ``.egg-link`` file should contain a single file or directory name,
with no newlines. This filename should be the base location of one or
more eggs. That is, the name must either end in ``.egg``, or else it
should be the parent directory of one or more ``.egg-info`` format eggs.
As of setuptools 0.6c6, the path may be specified as a platform-independent
(i.e. ``/``-separated) relative path from the directory containing the
``.egg-link`` file, and a second line may appear in the file, specifying a
platform-independent relative path from the egg's base directory to its
setup script directory. This allows installation tools such as EasyInstall
to find the project's setup directory and build eggs or perform other setup
commands on it.
-----------------
Standard Metadata
-----------------
In addition to the minimum required ``PKG-INFO`` metadata, projects can
include a variety of standard metadata files or directories, as
described below. Except as otherwise noted, these files and directories
are automatically generated by setuptools, based on information supplied
in the setup script or through analysis of the project's code and
resources.
Most of these files and directories are generated via "egg-info
writers" during execution of the setuptools ``egg_info`` command, and
are listed in the ``egg_info.writers`` entry point group defined by
setuptools' own ``setup.py`` file.
Project authors can register their own metadata writers as entry points
in this group (as described in the setuptools manual under "Adding new
EGG-INFO Files") to cause setuptools to generate project-specific
metadata files or directories during execution of the ``egg_info``
command. It is up to project authors to document these new metadata
formats, if they create any.
``.txt`` File Formats
=====================
Files described in this section that have ``.txt`` extensions have a
simple lexical format consisting of a sequence of text lines, each line
terminated by a linefeed character (regardless of platform). Leading
and trailing whitespace on each line is ignored, as are blank lines and
lines whose first nonblank character is a ``#`` (comment symbol). (This
is the parsing format defined by the ``yield_lines()`` function of
the ``pkg_resources`` module.)
All ``.txt`` files defined by this section follow this format, but some
are also "sectioned" files, meaning that their contents are divided into
sections, using square-bracketed section headers akin to Windows
``.ini`` format. Note that this does *not* imply that the lines within
the sections follow an ``.ini`` format, however. Please see an
individual metadata file's documentation for a description of what the
lines and section names mean in that particular file.
Sectioned files can be parsed using the ``split_sections()`` function;
see the "Parsing Utilities" section of the ``pkg_resources`` manual for
for details.
Dependency Metadata
===================
``requires.txt``
----------------
This is a "sectioned" text file. Each section is a sequence of
"requirements", as parsed by the ``parse_requirements()`` function;
please see the ``pkg_resources`` manual for the complete requirement
parsing syntax.
The first, unnamed section (i.e., before the first section header) in
this file is the project's core requirements, which must be installed
for the project to function. (Specified using the ``install_requires``
keyword to ``setup()``).
The remaining (named) sections describe the project's "extra"
requirements, as specified using the ``extras_require`` keyword to
``setup()``. The section name is the name of the optional feature, and
the section body lists that feature's dependencies.
Note that it is not normally necessary to inspect this file directly;
``pkg_resources.Distribution`` objects have a ``requires()`` method
that can be used to obtain ``Requirement`` objects describing the
project's core and optional dependencies.
``setup_requires.txt``
----------------------
Much like ``requires.txt`` except represents the requirements
specified by the ``setup_requires`` parameter to the Distribution.
``dependency_links.txt``
------------------------
A list of dependency URLs, one per line, as specified using the
``dependency_links`` keyword to ``setup()``. These may be direct
download URLs, or the URLs of web pages containing direct download
links, and will be used by EasyInstall to find dependencies, as though
the user had manually provided them via the ``--find-links`` command
line option. Please see the setuptools manual and EasyInstall manual
for more information on specifying this option, and for information on
how EasyInstall processes ``--find-links`` URLs.
``depends.txt`` -- Obsolete, do not create!
-------------------------------------------
This file follows an identical format to ``requires.txt``, but is
obsolete and should not be used. The earliest versions of setuptools
required users to manually create and maintain this file, so the runtime
still supports reading it, if it exists. The new filename was created
so that it could be automatically generated from ``setup()`` information
without overwriting an existing hand-created ``depends.txt``, if one
was already present in the project's source ``.egg-info`` directory.
``namespace_packages.txt`` -- Namespace Package Metadata
========================================================
A list of namespace package names, one per line, as supplied to the
``namespace_packages`` keyword to ``setup()``. Please see the manuals
for setuptools and ``pkg_resources`` for more information about
namespace packages.
``entry_points.txt`` -- "Entry Point"/Plugin Metadata
=====================================================
This is a "sectioned" text file, whose contents encode the
``entry_points`` keyword supplied to ``setup()``. All sections are
named, as the section names specify the entry point groups in which the
corresponding section's entry points are registered.
Each section is a sequence of "entry point" lines, each parseable using
the ``EntryPoint.parse`` classmethod; please see the ``pkg_resources``
manual for the complete entry point parsing syntax.
Note that it is not necessary to parse this file directly; the
``pkg_resources`` module provides a variety of APIs to locate and load
entry points automatically. Please see the setuptools and
``pkg_resources`` manuals for details on the nature and uses of entry
points.
The ``scripts`` Subdirectory
============================
This directory is currently only created for ``.egg`` files built by
the setuptools ``bdist_egg`` command. It will contain copies of all
of the project's "traditional" scripts (i.e., those specified using the
``scripts`` keyword to ``setup()``). This is so that they can be
reconstituted when an ``.egg`` file is installed.
The scripts are placed here using the distutils' standard
``install_scripts`` command, so any ``#!`` lines reflect the Python
installation where the egg was built. But instead of copying the
scripts to the local script installation directory, EasyInstall writes
short wrapper scripts that invoke the original scripts from inside the
egg, after ensuring that sys.path includes the egg and any eggs it
depends on. For more about `script wrappers`_, see the section below on
`Installation and Path Management Issues`_.
Zip Support Metadata
====================
``native_libs.txt``
-------------------
A list of C extensions and other dynamic link libraries contained in
the egg, one per line. Paths are ``/``-separated and relative to the
egg's base location.
This file is generated as part of ``bdist_egg`` processing, and as such
only appears in ``.egg`` files (and ``.egg`` directories created by
unpacking them). It is used to ensure that all libraries are extracted
from a zipped egg at the same time, in case there is any direct linkage
between them. Please see the `Zip File Issues`_ section below for more
information on library and resource extraction from ``.egg`` files.
``eager_resources.txt``
-----------------------
A list of resource files and/or directories, one per line, as specified
via the ``eager_resources`` keyword to ``setup()``. Paths are
``/``-separated and relative to the egg's base location.
Resource files or directories listed here will be extracted
simultaneously, if any of the named resources are extracted, or if any
native libraries listed in ``native_libs.txt`` are extracted. Please
see the setuptools manual for details on what this feature is used for
and how it works, as well as the `Zip File Issues`_ section below.
``zip-safe`` and ``not-zip-safe``
---------------------------------
These are zero-length files, and either one or the other should exist.
If ``zip-safe`` exists, it means that the project will work properly
when installed as an ``.egg`` zipfile, and conversely the existence of
``not-zip-safe`` means the project should not be installed as an
``.egg`` file. The ``zip_safe`` option to setuptools' ``setup()``
determines which file will be written. If the option isn't provided,
setuptools attempts to make its own assessment of whether the package
can work, based on code and content analysis.
If neither file is present at installation time, EasyInstall defaults
to assuming that the project should be unzipped. (Command-line options
to EasyInstall, however, take precedence even over an existing
``zip-safe`` or ``not-zip-safe`` file.)
Note that these flag files appear only in ``.egg`` files generated by
``bdist_egg``, and in ``.egg`` directories created by unpacking such an
``.egg`` file.
``top_level.txt`` -- Conflict Management Metadata
=================================================
This file is a list of the top-level module or package names provided
by the project, one Python identifier per line.
Subpackages are not included; a project containing both a ``foo.bar``
and a ``foo.baz`` would include only one line, ``foo``, in its
``top_level.txt``.
This data is used by ``pkg_resources`` at runtime to issue a warning if
an egg is added to ``sys.path`` when its contained packages may have
already been imported.
(It was also once used to detect conflicts with non-egg packages at
installation time, but in more recent versions, setuptools installs eggs
in such a way that they always override non-egg packages, thus
preventing a problem from arising.)
``SOURCES.txt`` -- Source Files Manifest
========================================
This file is roughly equivalent to the distutils' ``MANIFEST`` file.
The differences are as follows:
* The filenames always use ``/`` as a path separator, which must be
converted back to a platform-specific path whenever they are read.
* The file is automatically generated by setuptools whenever the
``egg_info`` or ``sdist`` commands are run, and it is *not*
user-editable.
Although this metadata is included with distributed eggs, it is not
actually used at runtime for any purpose. Its function is to ensure
that setuptools-built *source* distributions can correctly discover
what files are part of the project's source, even if the list had been
generated using revision control metadata on the original author's
system.
In other words, ``SOURCES.txt`` has little or no runtime value for being
included in distributed eggs, and it is possible that future versions of
the ``bdist_egg`` and ``install_egg_info`` commands will strip it before
installation or distribution. Therefore, do not rely on its being
available outside of an original source directory or source
distribution.
------------------------------
Other Technical Considerations
------------------------------
Zip File Issues
===============
Although zip files resemble directories, they are not fully
substitutable for them. Most platforms do not support loading dynamic
link libraries contained in zipfiles, so it is not possible to directly
import C extensions from ``.egg`` zipfiles. Similarly, there are many
existing libraries -- whether in Python or C -- that require actual
operating system filenames, and do not work with arbitrary "file-like"
objects or in-memory strings, and thus cannot operate directly on the
contents of zip files.
To address these issues, the ``pkg_resources`` module provides a
"resource API" to support obtaining either the contents of a resource,
or a true operating system filename for the resource. If the egg
containing the resource is a directory, the resource's real filename
is simply returned. However, if the egg is a zipfile, then the
resource is first extracted to a cache directory, and the filename
within the cache is returned.
The cache directory is determined by the ``pkg_resources`` API; please
see the ``set_cache_path()`` and ``get_default_cache()`` documentation
for details.
The Extraction Process
----------------------
Resources are extracted to a cache subdirectory whose name is based
on the enclosing ``.egg`` filename and the path to the resource. If
there is already a file of the correct name, size, and timestamp, its
filename is returned to the requester. Otherwise, the desired file is
extracted first to a temporary name generated using
``mkstemp(".$extract",target_dir)``, and then its timestamp is set to
match the one in the zip file, before renaming it to its final name.
(Some collision detection and resolution code is used to handle the
fact that Windows doesn't overwrite files when renaming.)
If a resource directory is requested, all of its contents are
recursively extracted in this fashion, to ensure that the directory
name can be used as if it were valid all along.
If the resource requested for extraction is listed in the
``native_libs.txt`` or ``eager_resources.txt`` metadata files, then
*all* resources listed in *either* file will be extracted before the
requested resource's filename is returned, thus ensuring that all
C extensions and data used by them will be simultaneously available.
Extension Import Wrappers
-------------------------
Since Python's built-in zip import feature does not support loading
C extension modules from zipfiles, the setuptools ``bdist_egg`` command
generates special import wrappers to make it work.
The wrappers are ``.py`` files (along with corresponding ``.pyc``
and/or ``.pyo`` files) that have the same module name as the
corresponding C extension. These wrappers are located in the same
package directory (or top-level directory) within the zipfile, so that
say, ``foomodule.so`` will get a corresponding ``foo.py``, while
``bar/baz.pyd`` will get a corresponding ``bar/baz.py``.
These wrapper files contain a short stanza of Python code that asks
``pkg_resources`` for the filename of the corresponding C extension,
then reloads the module using the obtained filename. This will cause
``pkg_resources`` to first ensure that all of the egg's C extensions
(and any accompanying "eager resources") are extracted to the cache
before attempting to link to the C library.
Note, by the way, that ``.egg`` directories will also contain these
wrapper files. However, Python's default import priority is such that
C extensions take precedence over same-named Python modules, so the
import wrappers are ignored unless the egg is a zipfile.
Installation and Path Management Issues
=======================================
Python's initial setup of ``sys.path`` is very dependent on the Python
version and installation platform, as well as how Python was started
(i.e., script vs. ``-c`` vs. ``-m`` vs. interactive interpreter).
In fact, Python also provides only two relatively robust ways to affect
``sys.path`` outside of direct manipulation in code: the ``PYTHONPATH``
environment variable, and ``.pth`` files.
However, with no cross-platform way to safely and persistently change
environment variables, this leaves ``.pth`` files as EasyInstall's only
real option for persistent configuration of ``sys.path``.
But ``.pth`` files are rather strictly limited in what they are allowed
to do normally. They add directories only to the *end* of ``sys.path``,
after any locally-installed ``site-packages`` directory, and they are
only processed *in* the ``site-packages`` directory to start with.
This is a double whammy for users who lack write access to that
directory, because they can't create a ``.pth`` file that Python will
read, and even if a sympathetic system administrator adds one for them
that calls ``site.addsitedir()`` to allow some other directory to
contain ``.pth`` files, they won't be able to install newer versions of
anything that's installed in the systemwide ``site-packages``, because
their paths will still be added *after* ``site-packages``.
So EasyInstall applies two workarounds to solve these problems.
The first is that EasyInstall leverages ``.pth`` files' "import" feature
to manipulate ``sys.path`` and ensure that anything EasyInstall adds
to a ``.pth`` file will always appear before both the standard library
and the local ``site-packages`` directories. Thus, it is always
possible for a user who can write a Python-read ``.pth`` file to ensure
that their packages come first in their own environment.
Second, when installing to a ``PYTHONPATH`` directory (as opposed to
a "site" directory like ``site-packages``) EasyInstall will also install
a special version of the ``site`` module. Because it's in a
``PYTHONPATH`` directory, this module will get control before the
standard library version of ``site`` does. It will record the state of
``sys.path`` before invoking the "real" ``site`` module, and then
afterwards it processes any ``.pth`` files found in ``PYTHONPATH``
directories, including all the fixups needed to ensure that eggs always
appear before the standard library in sys.path, but are in a relative
order to one another that is defined by their ``PYTHONPATH`` and
``.pth``-prescribed sequence.
The net result of these changes is that ``sys.path`` order will be
as follows at runtime:
1. The ``sys.argv[0]`` directory, or an empty string if no script
is being executed.
2. All eggs installed by EasyInstall in any ``.pth`` file in each
``PYTHONPATH`` directory, in order first by ``PYTHONPATH`` order,
then normal ``.pth`` processing order (which is to say alphabetical
by ``.pth`` filename, then by the order of listing within each
``.pth`` file).
3. All eggs installed by EasyInstall in any ``.pth`` file in each "site"
directory (such as ``site-packages``), following the same ordering
rules as for the ones on ``PYTHONPATH``.
4. The ``PYTHONPATH`` directories themselves, in their original order
5. Any paths from ``.pth`` files found on ``PYTHONPATH`` that were *not*
eggs installed by EasyInstall, again following the same relative
ordering rules.
6. The standard library and "site" directories, along with the contents
of any ``.pth`` files found in the "site" directories.
Notice that sections 1, 4, and 6 comprise the "normal" Python setup for
``sys.path``. Sections 2 and 3 are inserted to support eggs, and
section 5 emulates what the "normal" semantics of ``.pth`` files on
``PYTHONPATH`` would be if Python natively supported them.
For further discussion of the tradeoffs that went into this design, as
well as notes on the actual magic inserted into ``.pth`` files to make
them do these things, please see also the following messages to the
distutils-SIG mailing list:
* http://mail.python.org/pipermail/distutils-sig/2006-February/006026.html
* http://mail.python.org/pipermail/distutils-sig/2006-March/006123.html
Script Wrappers
---------------
EasyInstall never directly installs a project's original scripts to
a script installation directory. Instead, it writes short wrapper
scripts that first ensure that the project's dependencies are active
on sys.path, before invoking the original script. These wrappers
have a #! line that points to the version of Python that was used to
install them, and their second line is always a comment that indicates
the type of script wrapper, the project version required for the script
to run, and information identifying the script to be invoked.
The format of this marker line is::
"# EASY-INSTALL-" script_type ": " tuple_of_strings "\n"
The ``script_type`` is one of ``SCRIPT``, ``DEV-SCRIPT``, or
``ENTRY-SCRIPT``. The ``tuple_of_strings`` is a comma-separated
sequence of Python string constants. For ``SCRIPT`` and ``DEV-SCRIPT``
wrappers, there are two strings: the project version requirement, and
the script name (as a filename within the ``scripts`` metadata
directory). For ``ENTRY-SCRIPT`` wrappers, there are three:
the project version requirement, the entry point group name, and the
entry point name. (See the "Automatic Script Creation" section in the
setuptools manual for more information about entry point scripts.)
In each case, the project version requirement string will be a string
parseable with the ``pkg_resources`` modules' ``Requirement.parse()``
classmethod. The only difference between a ``SCRIPT`` wrapper and a
``DEV-SCRIPT`` is that a ``DEV-SCRIPT`` actually executes the original
source script in the project's source tree, and is created when the
"setup.py develop" command is run. A ``SCRIPT`` wrapper, on the other
hand, uses the "installed" script written to the ``EGG-INFO/scripts``
subdirectory of the corresponding ``.egg`` zipfile or directory.
(``.egg-info`` eggs do not have script wrappers associated with them,
except in the "setup.py develop" case.)
The purpose of including the marker line in generated script wrappers is
to facilitate introspection of installed scripts, and their relationship
to installed eggs. For example, an uninstallation tool could use this
data to identify what scripts can safely be removed, and/or identify
what scripts would stop working if a particular egg is uninstalled.