# Chapter 19: Building and Publishing Python Packages

This notebook covers the process of turning your Python project into distributable
packages. We explore source distributions, wheels, semantic versioning, version
constraints, package data, and the publishing workflow.

## Key Concepts
- **sdist**: Source distribution -- contains raw source code
- **wheel**: Pre-built distribution -- faster to install, no build step
- **Semantic versioning**: MAJOR.MINOR.PATCH version scheme
- **Publishing**: Upload packages to PyPI or private registries

## Section 1: Source Distributions vs Wheels

Python packages are distributed in two primary formats:

### Source Distribution (sdist)
- A `.tar.gz` archive of your project source code
- Must be **built** on the target machine during installation
- Includes `pyproject.toml`, source files, and any build scripts
- Format: `package-1.0.0.tar.gz`

### Wheel (bdist_wheel)
- A `.whl` file (actually a ZIP archive with a special naming convention)
- **Pre-built** -- no compilation needed during installation
- Much faster to install than sdists
- Format: `package-1.0.0-py3-none-any.whl`

| Aspect | sdist | wheel |
|---|---|---|
| **File extension** | `.tar.gz` | `.whl` |
| **Build required?** | Yes, on install | No, pre-built |
| **Install speed** | Slower | Faster |
| **C extensions** | Compiled on install | Pre-compiled for target platform |
| **Use case** | Fallback, source archive | Primary distribution format |

In [None]:
from dataclasses import dataclass


@dataclass
class WheelFilename:
    """Parse and represent a wheel filename per PEP 427.

    Format: {name}-{version}(-{build})?-{python}-{abi}-{platform}.whl
    """
    name: str
    version: str
    python_tag: str    # e.g. 'py3', 'cp312'
    abi_tag: str       # e.g. 'none', 'cp312'
    platform_tag: str  # e.g. 'any', 'manylinux_2_17_x86_64', 'macosx_14_0_arm64'

    @classmethod
    def parse(cls, filename: str) -> "WheelFilename":
        """Parse a wheel filename into its components."""
        base = filename.removesuffix(".whl")
        parts = base.split("-")
        # Handle optional build tag
        if len(parts) == 6:
            name, version, _build, python, abi, platform = parts
        else:
            name, version, python, abi, platform = parts
        return cls(name, version, python, abi, platform)

    @property
    def is_pure_python(self) -> bool:
        """Pure Python wheels work on any platform."""
        return self.abi_tag == "none" and self.platform_tag == "any"

    @property
    def is_platform_specific(self) -> bool:
        """Platform-specific wheels contain compiled extensions."""
        return self.platform_tag != "any"


# Parse some real-world wheel filenames
examples: list[str] = [
    "requests-2.31.0-py3-none-any.whl",
    "numpy-1.26.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
    "pydantic_core-2.14.6-cp312-cp312-macosx_11_0_arm64.whl",
]

for filename in examples:
    whl = WheelFilename.parse(filename)
    print(f"File: {filename}")
    print(f"  Name:     {whl.name}")
    print(f"  Version:  {whl.version}")
    print(f"  Python:   {whl.python_tag}")
    print(f"  ABI:      {whl.abi_tag}")
    print(f"  Platform: {whl.platform_tag}")
    print(f"  Pure:     {whl.is_pure_python}")
    print()

## Section 2: Building Packages with `python -m build`

The `build` package (PyPA standard) creates both sdist and wheel from your project:

```bash
# Install the build tool
pip install build

# Build both sdist and wheel (output goes to dist/)
python -m build

# Build only a wheel
python -m build --wheel

# Build only a source distribution
python -m build --sdist
```

After building, your `dist/` directory looks like:
```
dist/
  my_package-1.0.0.tar.gz       # sdist
  my_package-1.0.0-py3-none-any.whl  # wheel
```

The `build` tool reads `[build-system]` from `pyproject.toml` and delegates to
the declared backend (setuptools, hatchling, etc.).

In [None]:
from pathlib import Path
from dataclasses import dataclass


@dataclass
class BuildArtifact:
    """Represents a built distribution file."""
    path: Path
    format: str  # 'sdist' or 'wheel'

    @property
    def size_kb(self) -> float:
        if self.path.exists():
            return self.path.stat().st_size / 1024
        return 0.0


def classify_artifact(filename: str) -> str:
    """Determine if a dist file is an sdist or wheel."""
    if filename.endswith(".whl"):
        return "wheel"
    elif filename.endswith(".tar.gz"):
        return "sdist"
    elif filename.endswith(".zip"):
        return "sdist (zip)"
    return "unknown"


# Demonstrate classification
sample_files: list[str] = [
    "my_package-1.0.0.tar.gz",
    "my_package-1.0.0-py3-none-any.whl",
    "legacy_pkg-0.5.0.zip",
]

print(f"{'Filename':<45} {'Format'}")
print("-" * 60)
for f in sample_files:
    print(f"{f:<45} {classify_artifact(f)}")

## Section 3: Semantic Versioning (SemVer)

The Python ecosystem widely follows **Semantic Versioning**:

```
MAJOR.MINOR.PATCH
  │     │     └── Bug fixes, no API changes
  │     └──────── New features, backwards compatible
  └────────────── Breaking changes
```

### Version Progression Examples
- `1.0.0` -> `1.0.1`: Bug fix
- `1.0.1` -> `1.1.0`: New feature added
- `1.1.0` -> `2.0.0`: Breaking API change

### Pre-release and Build Metadata
- `1.0.0a1` -- Alpha release
- `1.0.0b2` -- Beta release
- `1.0.0rc1` -- Release candidate
- `1.0.0.post1` -- Post-release (documentation fix, etc.)
- `1.0.0.dev3` -- Development release

In [None]:
from dataclasses import dataclass
from functools import total_ordering


@total_ordering
@dataclass(frozen=True)
class SemanticVersion:
    """A simplified semantic version implementation."""
    major: int
    minor: int
    patch: int
    pre_release: str = ""  # e.g. 'a1', 'b2', 'rc1'

    @classmethod
    def parse(cls, version_str: str) -> "SemanticVersion":
        """Parse a version string like '1.2.3' or '2.0.0rc1'."""
        pre = ""
        # Extract pre-release suffix
        for marker in ("rc", "b", "a", "dev", "post"):
            if marker in version_str:
                idx = version_str.index(marker)
                pre = version_str[idx:]
                version_str = version_str[:idx].rstrip(".")
                break
        parts = version_str.split(".")
        return cls(
            major=int(parts[0]),
            minor=int(parts[1]) if len(parts) > 1 else 0,
            patch=int(parts[2]) if len(parts) > 2 else 0,
            pre_release=pre,
        )

    def __str__(self) -> str:
        base = f"{self.major}.{self.minor}.{self.patch}"
        return f"{base}{self.pre_release}" if self.pre_release else base

    def __eq__(self, other: object) -> bool:
        if not isinstance(other, SemanticVersion):
            return NotImplemented
        return (self.major, self.minor, self.patch, self.pre_release) == (
            other.major, other.minor, other.patch, other.pre_release
        )

    def __lt__(self, other: "SemanticVersion") -> bool:
        if not isinstance(other, SemanticVersion):
            return NotImplemented
        # Pre-release versions sort before their release
        self_tuple = (self.major, self.minor, self.patch, self.pre_release == "", self.pre_release)
        other_tuple = (other.major, other.minor, other.patch, other.pre_release == "", other.pre_release)
        return self_tuple < other_tuple

    def bump_major(self) -> "SemanticVersion":
        return SemanticVersion(self.major + 1, 0, 0)

    def bump_minor(self) -> "SemanticVersion":
        return SemanticVersion(self.major, self.minor + 1, 0)

    def bump_patch(self) -> "SemanticVersion":
        return SemanticVersion(self.major, self.minor, self.patch + 1)


# Parse and compare versions
versions: list[SemanticVersion] = [
    SemanticVersion.parse("2.0.0rc1"),
    SemanticVersion.parse("1.0.0"),
    SemanticVersion.parse("2.0.0"),
    SemanticVersion.parse("1.1.0"),
    SemanticVersion.parse("1.0.1"),
    SemanticVersion.parse("2.0.0a1"),
]

print("Versions sorted by precedence:")
for v in sorted(versions):
    print(f"  {v}")

# Demonstrate version bumping
current = SemanticVersion.parse("1.4.2")
print(f"\nCurrent version:    {current}")
print(f"After patch bump:   {current.bump_patch()}")
print(f"After minor bump:   {current.bump_minor()}")
print(f"After major bump:   {current.bump_major()}")

## Section 4: Version Constraints and Specifiers

When declaring dependencies, you use **version specifiers** to control which
versions are acceptable:

| Specifier | Meaning | Example | Matches |
|---|---|---|---|
| `==` | Exact match | `==1.0.0` | Only 1.0.0 |
| `>=` | Minimum | `>=1.0` | 1.0, 1.1, 2.0, ... |
| `<=` | Maximum | `<=2.0` | 0.1, 1.5, 2.0 |
| `!=` | Exclude | `!=1.5.0` | Any except 1.5.0 |
| `~=` | Compatible release | `~=1.4` | >=1.4, <2.0 |
| `~=` | Compatible release | `~=1.4.2` | >=1.4.2, <1.5.0 |

### The Compatible Release Operator (`~=`)

The `~=` operator is particularly useful. It means "compatible with this version":
- `~=1.4` is equivalent to `>=1.4, <2.0`
- `~=1.4.2` is equivalent to `>=1.4.2, <1.5.0`

It drops the last component and increments the previous one for the upper bound.

In [None]:
from dataclasses import dataclass
from typing import Callable


@dataclass
class VersionSpecifier:
    """Represents a version constraint like '>=1.0,<2.0'."""
    raw: str

    def explain(self) -> str:
        """Provide a human-readable explanation of the specifier."""
        explanations: dict[str, str] = {
            "==": "exactly",
            ">=": "at least",
            "<=": "at most",
            ">": "greater than",
            "<": "less than",
            "!=": "anything except",
        }
        # Handle compatible release specially
        if self.raw.startswith("~="):
            version = self.raw[2:]
            parts = version.split(".")
            # Increment second-to-last, drop last
            upper_parts = parts[:-1]
            upper_parts[-1] = str(int(upper_parts[-1]) + 1)
            upper = ".".join(upper_parts)
            return f">={version}, <{upper}"

        # Handle compound specifiers
        parts_list: list[str] = []
        for part in self.raw.split(","):
            part = part.strip()
            for op, word in sorted(explanations.items(), key=lambda x: -len(x[0])):
                if part.startswith(op):
                    ver = part[len(op):]
                    parts_list.append(f"{word} {ver}")
                    break
        return ", ".join(parts_list)


# Demonstrate version specifiers
specifiers: list[str] = [
    "==1.0.0",
    ">=1.0,<2.0",
    "~=1.4",
    "~=1.4.2",
    ">=2.28,!=2.29.0",
    ">=3.10",
]

print(f"{'Specifier':<25} {'Equivalent / Explanation'}")
print("-" * 60)
for spec in specifiers:
    vs = VersionSpecifier(spec)
    print(f"{spec:<25} {vs.explain()}")

In [None]:
# Using the packaging library (standard tool for version handling)
# This is what pip uses internally
try:
    from packaging.version import Version
    from packaging.specifiers import SpecifierSet

    # Create a specifier set
    spec = SpecifierSet(">=1.0,<2.0,!=1.5.0")
    print(f"Specifier: {spec}")
    print()

    # Test versions against the specifier
    test_versions: list[str] = ["0.9", "1.0", "1.4.2", "1.5.0", "1.9.9", "2.0.0"]
    for v_str in test_versions:
        v = Version(v_str)
        matches: bool = v in spec
        status = "MATCH" if matches else "no match"
        print(f"  {v_str:10s} -> {status}")

    # Compatible release operator
    print(f"\n~=1.4.2 expands to: {SpecifierSet('~=1.4.2')}")

except ImportError:
    print("The 'packaging' library is not installed.")
    print("Install it with: pip install packaging")
    print("It provides robust version parsing and comparison.")

## Section 5: MANIFEST.in and Package Data

By default, only Python files are included in your package. To include other files
(templates, config files, data), you need to declare them.

### Modern approach: pyproject.toml
```toml
# For setuptools
[tool.setuptools.package-data]
my_package = ["templates/*.html", "data/*.json", "py.typed"]

# For hatchling
[tool.hatch.build.targets.wheel]
packages = ["src/my_package"]
```

### Legacy approach: MANIFEST.in (for sdist)
```
include LICENSE
include README.md
recursive-include my_package/templates *.html
recursive-include my_package/data *.json
prune tests
```

### Accessing package data at runtime
Use `importlib.resources` (Python 3.9+) instead of `__file__` hacks:

In [None]:
# importlib.resources -- the modern way to access package data
import importlib.resources as resources
import json

# Demonstrate the API (using the email package as a real example)
# In your own package, you would do:
#   files = resources.files("my_package")
#   template = (files / "templates" / "base.html").read_text()

# Show the API pattern
print("importlib.resources usage patterns:")
print()
print("  # Access a file in your package")
print('  files = importlib.resources.files("my_package")')
print('  config = (files / "config.json").read_text()')
print()
print("  # Access a subdirectory")
print('  templates = files / "templates"')
print('  html = (templates / "base.html").read_text()')
print()
print("  # Binary data")
print('  icon = (files / "icon.png").read_bytes()')
print()

# MANIFEST.in directives
directives: dict[str, str] = {
    "include": "Include specific files (e.g., include LICENSE)",
    "exclude": "Exclude specific files",
    "recursive-include": "Include files matching a pattern in a directory tree",
    "recursive-exclude": "Exclude files matching a pattern",
    "graft": "Include an entire directory tree",
    "prune": "Exclude an entire directory tree",
}

print("MANIFEST.in directives:")
for directive, desc in directives.items():
    print(f"  {directive:<25s} {desc}")

## Section 6: The Publishing Workflow

Publishing a package to PyPI follows a consistent workflow:

```
1. Prepare    -->  Update version, changelog, metadata
2. Build      -->  python -m build (creates sdist + wheel)
3. Check      -->  twine check dist/* (validate metadata)
4. Test       -->  Upload to TestPyPI first
5. Publish    -->  twine upload dist/* (upload to PyPI)
```

### Tools
- **`build`**: Creates sdist and wheel artifacts
- **`twine`**: Securely uploads packages to PyPI
- **`twine check`**: Validates package metadata before upload

### TestPyPI
Always test your publishing workflow against TestPyPI first:
```bash
# Upload to TestPyPI
twine upload --repository testpypi dist/*

# Install from TestPyPI to verify
pip install --index-url https://test.pypi.org/simple/ my-package
```

In [None]:
from dataclasses import dataclass, field
from enum import Enum, auto


class PublishStep(Enum):
    PREPARE = auto()
    BUILD = auto()
    CHECK = auto()
    TEST_UPLOAD = auto()
    PUBLISH = auto()


@dataclass
class PublishWorkflow:
    """Represents the steps to publish a Python package."""
    package_name: str
    version: str
    steps: list[tuple[PublishStep, str, str]] = field(default_factory=list)

    def __post_init__(self) -> None:
        self.steps = [
            (
                PublishStep.PREPARE,
                "Update version and changelog",
                f'Set version = "{self.version}" in pyproject.toml',
            ),
            (
                PublishStep.BUILD,
                "Build distributions",
                "python -m build",
            ),
            (
                PublishStep.CHECK,
                "Validate package metadata",
                "twine check dist/*",
            ),
            (
                PublishStep.TEST_UPLOAD,
                "Upload to TestPyPI",
                "twine upload --repository testpypi dist/*",
            ),
            (
                PublishStep.PUBLISH,
                "Upload to PyPI",
                "twine upload dist/*",
            ),
        ]

    def display(self) -> None:
        print(f"Publishing workflow for {self.package_name} v{self.version}:")
        print("=" * 60)
        for i, (step, description, command) in enumerate(self.steps, 1):
            print(f"\n  Step {i}: {description}")
            print(f"  Command: {command}")


workflow = PublishWorkflow("my-awesome-package", "1.0.0")
workflow.display()

## Section 7: Private Package Registries and Index URLs

Not all packages belong on the public PyPI. Organizations often host **private
registries** for internal packages.

### Common Private Registry Options
- **Azure Artifacts**: Integrated with Azure DevOps
- **AWS CodeArtifact**: Managed artifact repository
- **Google Artifact Registry**: GCP-hosted
- **GitLab Package Registry**: Built into GitLab
- **JFrog Artifactory**: Enterprise artifact management
- **devpi**: Self-hosted, open source

### Configuration

```bash
# Install from a private index
pip install --index-url https://private.registry.com/simple/ my-private-pkg

# Use private index as extra (fallback to PyPI)
pip install --extra-index-url https://private.registry.com/simple/ my-pkg
```

### pip.conf / pip.ini
```ini
[global]
index-url = https://private.registry.com/simple/
extra-index-url = https://pypi.org/simple/
trusted-host = private.registry.com
```

In [None]:
from dataclasses import dataclass
from pathlib import Path
import os


@dataclass
class PipConfig:
    """Represents pip configuration for package indices."""
    index_url: str = "https://pypi.org/simple/"
    extra_index_urls: list[str] | None = None
    trusted_hosts: list[str] | None = None

    def to_pip_conf(self) -> str:
        """Generate pip.conf content."""
        lines: list[str] = ["[global]", f"index-url = {self.index_url}"]
        if self.extra_index_urls:
            for url in self.extra_index_urls:
                lines.append(f"extra-index-url = {url}")
        if self.trusted_hosts:
            for host in self.trusted_hosts:
                lines.append(f"trusted-host = {host}")
        return "\n".join(lines)

    def install_command(self, package: str) -> str:
        """Generate the pip install command."""
        parts: list[str] = ["pip", "install"]
        if self.index_url != "https://pypi.org/simple/":
            parts.extend(["--index-url", self.index_url])
        if self.extra_index_urls:
            for url in self.extra_index_urls:
                parts.extend(["--extra-index-url", url])
        parts.append(package)
        return " ".join(parts)


# Corporate setup with private registry
corporate_config = PipConfig(
    index_url="https://artifacts.corp.example.com/pypi/simple/",
    extra_index_urls=["https://pypi.org/simple/"],
    trusted_hosts=["artifacts.corp.example.com"],
)

print("Generated pip.conf:")
print(corporate_config.to_pip_conf())
print()
print("Install command:")
print(f"  {corporate_config.install_command('internal-utils')}")
print()

# Default pip.conf location
if os.name == "nt":
    config_path = Path.home() / "pip" / "pip.ini"
else:
    config_path = Path.home() / ".config" / "pip" / "pip.conf"
print(f"Default pip config location: {config_path}")

## Section 8: Putting It All Together -- A Complete pyproject.toml

Here is a complete, production-ready `pyproject.toml` that ties together everything
we have covered across all three notebooks in this chapter.

In [None]:
complete_pyproject: str = """\
[project]
name = "example-cli"
version = "1.0.0"
description = "A production-ready CLI tool example"
readme = "README.md"
license = "MIT"
requires-python = ">=3.10"
authors = [
    {name = "Jane Developer", email = "jane@example.com"},
]
classifiers = [
    "Development Status :: 4 - Beta",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Typing :: Typed",
]
dependencies = [
    "click>=8.0",
    "rich>=13.0",
    "pydantic>=2.0,<3.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.0",
    "pytest-cov>=4.0",
    "mypy>=1.0",
    "ruff>=0.1",
]
docs = [
    "sphinx>=7.0",
    "furo",
]

[project.scripts]
example-cli = "example_cli.main:app"

[project.urls]
Homepage = "https://github.com/jane/example-cli"
Documentation = "https://example-cli.readthedocs.io"
Repository = "https://github.com/jane/example-cli"
Issues = "https://github.com/jane/example-cli/issues"

[build-system]
requires = ["setuptools>=68.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-data]
example_cli = ["py.typed", "templates/*.html"]

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "--strict-markers -v"

[tool.mypy]
strict = true

[tool.ruff]
target-version = "py310"
line-length = 88
"""

print(complete_pyproject)

In [None]:
# Quick-reference checklist before publishing

checklist: list[tuple[str, str]] = [
    ("Version updated", 'version = "X.Y.Z" in pyproject.toml'),
    ("Changelog updated", "Document changes in CHANGELOG.md"),
    ("Tests passing", "pytest --strict-markers"),
    ("Type checks passing", "mypy src/"),
    ("Linter clean", "ruff check src/"),
    ("Build artifacts created", "python -m build"),
    ("Metadata valid", "twine check dist/*"),
    ("TestPyPI upload", "twine upload --repository testpypi dist/*"),
    ("Test install works", "pip install --index-url https://test.pypi.org/simple/ pkg"),
    ("PyPI upload", "twine upload dist/*"),
    ("Git tag created", 'git tag -a v1.0.0 -m "Release 1.0.0"'),
]

print("Pre-publish checklist:")
print("=" * 65)
for i, (task, command) in enumerate(checklist, 1):
    print(f"  [ ] {i:2d}. {task}")
    print(f"         {command}")

## Summary

### Distribution Formats
- **sdist** (`.tar.gz`): Source archive, requires build on install
- **wheel** (`.whl`): Pre-built, fast to install, platform-specific for C extensions
- Always publish **both** sdist and wheel to PyPI

### Versioning
- **Semantic versioning**: `MAJOR.MINOR.PATCH` with clear upgrade semantics
- **`~=` (compatible release)**: The recommended constraint for most dependencies
- **Pre-release tags**: `a` (alpha), `b` (beta), `rc` (release candidate)

### Package Data
- Declare non-Python files in `[tool.setuptools.package-data]` or `MANIFEST.in`
- Access package data at runtime with `importlib.resources.files()`

### Publishing
- Build with `python -m build`, validate with `twine check`, upload with `twine upload`
- Always test on **TestPyPI** before publishing to the real PyPI
- Use **private registries** for internal packages (`--index-url` or `pip.conf`)