Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add outline / rough draft of new Overview page #519

Merged
merged 24 commits into from Aug 12, 2018
Merged
Changes from 5 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
59ea5fc
adding mostly-skeleton overview
mahmoud May 27, 2018
7412ac5
draft talk about frameworks/platforms
mahmoud May 30, 2018
37d6422
expand a little on each section
mahmoud May 31, 2018
3303dbf
address most of @ncoghlan's initial review points at https://github.c…
mahmoud Jun 3, 2018
4a3a1b8
fix @pradyunsg's very valuable copyedits
mahmoud Jun 5, 2018
71b7efd
update title for searchability, add a bit to the binary distribution …
mahmoud Jun 14, 2018
6494e53
flesh out sections, clarify zipapp qualification, add a few links
mahmoud Jul 22, 2018
a4e0ada
expand image sections, add a few related links
mahmoud Jul 22, 2018
3c0a08a
refining a lot of phrasing, expanding on hardware, adding figures
mahmoud Jul 23, 2018
44a27ec
lots of links, and expand on considerations (i think wheels might be …
mahmoud Jul 23, 2018
48e61ed
link freezers
mahmoud Jul 23, 2018
12624cd
more links, slight tweaks to the intro and other language, provide a …
mahmoud Jul 26, 2018
d7d65fd
Merge branch 'master' of https://github.com/pypa/python-packaging-use…
mahmoud Jul 26, 2018
9e53a2f
link to overview from home page
mahmoud Jul 26, 2018
d366f79
various rephrasals and addition of a couple asides
mahmoud Jul 27, 2018
2123ee2
fix typo
mahmoud Jul 27, 2018
a629f96
address most copyediting concerns, tweak a header, and add thea's exp…
mahmoud Jul 29, 2018
8d3aae2
try to tie sdists and wheels more closely together as recommended pra…
mahmoud Jul 29, 2018
acc6807
new -> separate, add a bit more detail
mahmoud Jul 30, 2018
3f2f231
improve frameworks section intro
mahmoud Jul 30, 2018
88a1bc9
add additional encouragement to dual-publish wheels and sdists
mahmoud Aug 10, 2018
ee13f10
adjust heading consistency and add notes for future editors
mahmoud Aug 11, 2018
d1e7111
a few tweaks for flow, plus a favorite quote from Wikipedia for the e…
mahmoud Aug 11, 2018
db86788
add link to jupyterlab writing advice
mahmoud Aug 11, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
256 changes: 256 additions & 0 deletions source/overview.rst
@@ -0,0 +1,256 @@
================
Packaging Python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the title something like "An overview of Python packaging"? It makes it better for linking and SEO.

================

Python is a general-purpose programming language, meaning you can use
it for many things. You can build robots or server software or a game
for your friends to play.

For this reason, the first step in every Python project must be to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this paragraph should merge with the one above (just delete the newlines).

think about the project's audience and the corresponding target
environment. Using this information, this overview will guide you to
the packaging technologies best suited to your project.

It might seem strange to think about packaging before writing code,
but this process does wonders for avoiding headaches later on.

* Who are your software's users? Are they other developers doing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce this list with something like:

"Some of the questions you may need to ask to determine your packaging strategy are:"

software development, operations people in a datacenter, or some
less software-savvy group?
* Is your software meant for servers, desktops, or embedded devices?
* Is your software installed individually, or to many computers at once?

Packaging is all about target environment and deployment
experience. There are many answers to the questions above and each
combination of circumstances has its own solutions.

Packaging libraries and tools
-----------------------------

You may have heard about PyPI, ``setup.py``, and wheel files. These
are just a few of the tools Python's ecosystem provides for
distributing Python code to developers.

The following classes of code are libraries and tools, meant for a
technical audience, in a development setting. Skip ahead to
Application packaging if you're looking for ways to package Python for
a production setting.

Python modules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section reads a bit bare and doesn't quite mesh with the context. What are we trying to convey to the user here? What action do we expect them to take with this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this section is under two pressures:

  1. Python developers from their first pip install are steeped in setup.py and "for-other-developers" packaging. I vividly remember the times I really struggled to make the mental leap from "I should build an egg" to "I should build an RPM" for my production application. It wasn't until much later that I realized the deployment model and even the audience were completely different. So this section is trying to place these technologies in the decision tree, as early and separate as possible from the applications. Without it, the immediate question becomes "well, what about wheels/sdists/pip/eggs/etc."?
  2. The section is on the bare side (even shorter than the sections in the initial post) because while I want to acknowledge the technologies, the rest of the site is a much better guide. I wanted it to be more content-light and link-heavy.

If you think it could stand do be expanded a bit more without overshadowing more extensive writings on the rest of the guide, I'd be happy to do what I can. Alternatively, as someone who knows the guide much better than me, I would really welcome more links to things that need linking!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm thinking we can either (1) remove it, or (2) expand on it just a little:

In the most simple of cases such as a single Python file that only depends on the Python standard library, you can simply copy and use the script however you'd like. This is great for sharing simple scripts between people who both have compatible Python versions (such as via email, StackOverflow, or GitHub gists). There are even some entire Python libraries that offer this as an option, such as Bottle. However, this pattern doesn't hold well for projects that consist of multiple files, need specific additional Python or OS libraries, or need a specific version of Python.

Copy link
Member

@ncoghlan ncoghlan Jul 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @theacodes's suggested additional paragraph here - it does a bit of extra scene-setting to make it clear that even within the "fellow Python users" audience there's still quite a bit of potential variety in needs and expectations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! Added, with a link to bottle, and boltons, only because I specifically talk about this in the docs.

^^^^^^^^^^^^^^

A Python file, provided it only relies on the standard library, can be
redistributed and reused. You will also need to ensure it's written
for the right version of Python.

Python source distributions
^^^^^^^^^^^^^^^^^^^^^^^^^^^

If your code consists of multiple Python files, it's usually organized
into a directory structure. Any directory containing Python files,
provided one of those files is named ``__init__.py``, comprises an
:term:`import package`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the introduction of namespace packages, this qualifier isn't necessary - __init__.py just marks the distinction between a self-contained packaged and a native namespace package, rather than being a pre-requisite for making an import package at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the qualifier, but it's kind of tricky. Maybe I'm getting philosophical here, but a directory without an expression of intent like __init__.py doesn't necessarily become an import package until someone imports it. And that was more a choice of the importer than the publisher.

I think I'll always prefer to the __init__.py for explicit package creation to avoid this sort of quandary. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I default to __init__.py as well, and there are still plenty of tools that only handle explicit import packages by default.

Perhaps we could say something like:

Any directory containing Python files can optionally be accessed at runtime as an :term:import package, rather than as a collection of independent scripts (and if one of the files in the directory is named __init__.py, it makes it explicit that the submodules are intended to be accessed via the package rather than directly).


Because packages consist of multiple files, they are harder to
distribute. Most protocols support transferring only one file at a
time (when was the last time you clicked a link and it downloaded
multiple files?). It's easier to get incomplete transfers, and harder
to guarantee code integrity at the destination.

So long as your code contains nothing but pure Python code, and you
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a bit weird. Even packages with native components have source distributions, and conversely, pure-Python packages can still have built distributions (wheels).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I've added a graf to the (very underdeveloped) binary distribution section about how binary is best served up with source available, too. Still, because this is an overview, I'm going to leave some of the complexity (e.g., pure-Python wheels) offloaded to some of the other pages. I think next week sometime I'll be able to trawl the full guide for all of the crossrefs :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is currently fairly confusing as well, since it ends up conflating two different aspects:

  1. If you're publishing a pure Python package, you may not technically need a wheel file, since it's unlikely anybody will have issues installing your file. However, it's still a good idea to publish one, since wheels have a secondary purpose: they provide much richer static metadata than sdists do. Publishing a universal wheel also makes your project compatible with pip's --only-binary option, whereas even pure Python sdists will fail to pass that check.
  2. However, if you're publishing a package with native extensions, then the reason for publishing an sdist in addition to any wheel files you publish is to give folks the option of building it for themselves.

So rather than saying "Pure Python projects don't need to publish wheel archives", I think we want to instead have this section convey the perspective:

  1. Even pure Python projects should publish a universal wheel archive, as it's the presence of that wheel archive that makes it clear that they're either a pure Python project, or else the built archive contains only cross-platform binaries accessed through platform-independent mechanisms.
  2. Even projects that provided pre-compiled binary archives should still publish platform independent source archives to handle the cases that the pre-compiled archives don't cover

This change does mean the image from the Packaging Gradient article won't be usable as is, but it was also the image that I found confusing (which then allowed me to notice the issue in the text), so that may not be a bad thing :)

know your deployment environment supports your version of Python, then
you can use Python's native packaging tools to create a *source*
:term:`distribution package`, or *sdist* for short.

Python's *sdists* are compressed archives (``.tar.gz`` files)
containing one or more packages or modules. If your code is
pure-Python, and you only depend on other Python packages, you can `go
here to learn more <TODO>`_.

If you rely on any non-Python code, or non-Python packages (such as
libxml2 in the case of lxml, or BLAS libraries in the case of numpy),
you will want to read on.

.. TODO: "Did you know?" about distributions providing multiple
versions of the same package. Python packaging superpower!

Python binary distribution
^^^^^^^^^^^^^^^^^^^^^^^^^^

Python's real power comes from its ability to integrate with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Python's real power" can read as dismissive. I'd recommend "A great feature of Python is its ability to..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. This section is very underdeveloped, but I've adjusted this in my most recent commit..

software ecosystem, in particular libraries written in C, C++,
Fortran, Rust, and other languages. This is why wheels exist.


Packaging Applications
----------------------

So far we've only discussed Python's native distribution tools. Based
on our introduction, you would be correct to infer we're only
targeting environments which have Python. More importantly we're
assuming an audience who knows how to install Python packages.

With the variety of operating systems, configurations, and people out
there, this assumption is only safe when targeting a developer
audience.

Python's native packaging is mostly built for distributing reusable
code, called libraries, between developers. We can piggyback
**tools**, or basic applications for developers, on top of Python's
library packaging, using technologies like `setuptools entry_points
<http://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation>`_.

Generally libraries are building blocks, and not complete
applications. For distributing applications, there's a whole world of
technologies out there.

The best way to organize these application packaging options is by the
way they depend on the target environment. That's how we'll approach
the coming sections.

.. TODO: Another way of thinking about packaging solutions is by how
much they include. All solutions include your code, plus some
amount of your code's library and service dependencies. PEX
includes Python libraries. RPM includes a list of dependencies on
libraries and local services. Images can be built to include
everything.

Depending on a framework
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"deployment system" might be a better phrase here (I initially thought you meant web framework)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of looking for a term that could encapsulate the conventions of Heroku/PaaS as well as stuff like Kivy/Beeware, where as long as you stay within the lines, stuff should just work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could include a clarifications that not all frameworks include deployment systems, so evaluate carefully, especially if the option isn't on the list.

^^^^^^^^^^^^^^^^^^^^^^^^

Some types of Python applications, like web sites and services, are
common enough that they have frameworks to enable their development
and packaging. Other types of applications, like web and mobile
clients, are advanced enough that the framework is more or less a
necessity.

In all these cases, it makes sense to work backwards, from the
framework's packaging and deployment story. Some frameworks include a
deployment system which wraps the technologies outlined in the rest of
the guide. In these cases, you'll want to defer to your framework's
packaging guide for the easiest and most reliable production experience.

If you ever wonder how these platforms and frameworks work under the
hood, you can always read the sections beyond.

Service platforms
*****************

If you're developing for a "Platform-as-a-Service" or "PaaS" like
Heroku or Google App Engine, you are going to want to follow their
respective packaging guides.

* Heroku
* Google App Engine
* PythonAnywhere
* OpenShift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zappa for AWS Lambda would also be worth mentioning here.

* "Serverless" frameworks like Zappa

In all these setups, the platform takes care of packaging and
deployment, as long as you follow their patterns. Most software does
not fit these templates, hence the existence of all the other options
below.

If you're developing software that will be deployed to machines you
own, users' personal computers, or any other arrangement, read on.

Web browsers and mobile applications
************************************

Python's steady advances are leading it into new spaces. These days
you can write a mobile app or web application frontend in
Python. While the language may be familiar, the packaging and
deployment practices are brand new.

If you're planning on releasing to these new frontiers, you'll want to
check out the following frameworks, and refer to their packaging
guides:

* Kivy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to these project's pages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! I actually have quite a few projects I still haven't mentioned, but when I get a spare hour or two, I plan on going through and hyperlinking everything :)

* Beeware
* Brython
* Flexx

If you are *not* interested in using a framework or platform, or just
wonder about some of the technologies and techniques utilized by the
frameworks above, continue reading below.

Depending on a pre-installed Python
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Depending on the host system to have Python installed. Common in
controlled environments like data centers, and local environments of
tech savvy people. Technically includes pretty much every major Linux
and Mac OS version for many years now.

* PEX
* zipapp (doesn't include library dependencies, requires Python 3.5+)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/include/help manage/ (if you use pip install --target to add zip-compatible pure Python deps, zipapp will happily add them to the archive)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the bullets in these sections are really anemic. I'll be sure to include more discussion of the pros and cons of each option. Even then, I might summarize by saying "minimal support" or something, because that is a really small subset of the libraries out there :)

* shiv (requires Python 3)

Depending on a new Python ecosystem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to adjust the title here, as "new" is ambiguous: it could refer to "new-to-the-target-environment" or "new-to-the-world-at-large" (and neither is entirely accurate for the scope of the section anyway).

My suggestion would be "Depending on another software distribution ecosystem". That way if we later decide to add other options like Nix or homebrew in here, they can just be an extra paragraph at the end of this section, rather than needing their own section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(working from bottom to top has its disadvantages, I see what you mean from down below)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Depending on the host system to have an alternative ecosystem
installed, like Anaconda. Increasingly common in academic, analytical,
and other data-oriented environments. Also used in production services.

* conda/Anaconda
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible alternative structure here: rather than saying "new Python ecosystem", perhaps say "language independent packaging ecosystem", and mention:

  • conda/Anaconda (cross-platform)
  • homebrew (Mac OS X only)
  • rpm (selected Linux distros only)
  • deb (selected Linux distros only)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yeah, I suppose there are Python applications out there that depend on RPM-packaged/managed libraries (python-*.i386.rpm, etc.). I'm not sure if I'd recommend it. I probably would want to give conda its own heading regardless, sort of like omnibus, below, because it does represent a pretty unique solution.


Bringing your own Python
^^^^^^^^^^^^^^^^^^^^^^^^

Depending on the host system to be able to run a program in which
we've embedded Python. Operating systems have been designed to run
programs for a very long time, so this approach offers wide
compatibility, if you're willing to work at it.

* Freezers
* Omnibus

Bringing your own userspace
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Depending on the host system to be able to run a lightweight image in
a relatively modern arrangement often referred to as containerization.

* AppImage
* Flatpak
* Snappy
* Docker

Bringing your own kernel
^^^^^^^^^^^^^^^^^^^^^^^^

Depending on the host system to have a hypervisor and run a virtual
machine. This type of virtualization is mature and widespread in data
center environments.

* Vagrant
* AMIs
* OpenStack

Bringing your own hardware
^^^^^^^^^^^^^^^^^^^^^^^^^^

Depending on your host to have electricity.

Embed your code on an Adafruit or a Micropython, or some other
hardware, and just ship it to the datacenter, or your users' homes,
and call it good.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I resemble this remark :)

(DC fast chargers are a teensy bit bigger than a micro:bit, though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to find more projects like that to link to from here. I'm pretty confident we can find open-source exemplars of applications for all the solutions above, but do keep an eye out for good hardware applications.


What about...
-------------

* Operating-system packages (deb/rpm)
* virtualenv
* Security considerations

Summary
-------

Packaging in Python has a bit of a reputation for being a bumpy
ride. This is mostly a confused side effect of Python's
versatility. Once you understand the natural boundaries between each
packaging solution, you begin to realize that the varied landscape is
a small price Python programmers pay for using the most balanced,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tweak the praise of the language in this last sentence to be something unequivocally true. Something like:

... for using one of the most balanced, flexible, and broadly applicable languages available.

(The "one of" does the equivocation work there: "most" is debatable, "one of the most" is not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I'd take that debate anyday! But sure, not trying to make any enemies of anyone who read to the bottom of the page :)

flexible language available.