# Chapter 40: Package Utilities and Metadata

This notebook covers Python's tools for discovering, inspecting, and querying metadata about installed packages and modules. You will learn to enumerate available modules, distinguish packages from modules, query installed package versions, and identify built-in modules.

## Key Concepts
- **`pkgutil.iter_modules`**: Enumerate importable modules and packages
- **`importlib.metadata`**: Query installed package versions and distributions
- **`PackageNotFoundError`**: Raised when querying metadata for a missing package
- **Package vs. module**: Packages have a `__path__` attribute, plain modules do not
- **`sys.builtin_module_names`**: Tuple of modules compiled into the interpreter

## Section 1: Enumerating Modules with `pkgutil`

`pkgutil.iter_modules` yields information about all importable modules (and packages) found on the given search path. When called with no arguments, it searches `sys.path`.

In [None]:
import pkgutil

# iter_modules() yields ModuleInfo objects for all importable modules
all_modules: list[pkgutil.ModuleInfo] = list(pkgutil.iter_modules())

print(f"Total importable modules/packages: {len(all_modules)}")
print(f"Type of each item: {type(all_modules[0]).__name__}")
print()

# Show the first 10
print("First 10 modules:")
for info in all_modules[:10]:
    kind: str = "package" if info.ispkg else "module"
    print(f"  {info.name:>30} ({kind})")

In [None]:
import pkgutil

# Each ModuleInfo has: module_finder, name, ispkg
first_info: pkgutil.ModuleInfo = list(pkgutil.iter_modules())[0]

print(f"ModuleInfo attributes:")
print(f"  name:   {first_info.name}")
print(f"  ispkg:  {first_info.ispkg}")
print(f"  has 'name' attr:  {hasattr(first_info, 'name')}")
print(f"  has 'ispkg' attr: {hasattr(first_info, 'ispkg')}")

In [None]:
import pkgutil

# Count how many are packages vs plain modules
all_modules: list[pkgutil.ModuleInfo] = list(pkgutil.iter_modules())

packages: list[str] = [m.name for m in all_modules if m.ispkg]
modules: list[str] = [m.name for m in all_modules if not m.ispkg]

print(f"Packages: {len(packages)}")
print(f"Modules:  {len(modules)}")
print()

# Show some well-known packages
well_known: list[str] = ["json", "os", "email", "http", "unittest", "xml"]
print("Well-known names found as packages:")
for name in well_known:
    found: bool = name in packages
    print(f"  {name:>12}: {'package' if found else 'not a package (or module)'}")

## Section 2: Iterating Submodules of a Package

You can pass a package's `__path__` to `pkgutil.iter_modules` to list only the submodules within that specific package.

In [None]:
import pkgutil
import email

# List all submodules inside the 'email' package
email_submodules: list[pkgutil.ModuleInfo] = list(
    pkgutil.iter_modules(email.__path__)
)

print(f"Submodules in 'email' package: {len(email_submodules)}")
print()

for info in email_submodules:
    kind: str = "package" if info.ispkg else "module"
    print(f"  email.{info.name:>20} ({kind})")

In [None]:
import pkgutil
import json
import http

# Compare: json is a package (has __path__), http is also a package
for mod_name, mod in [("json", json), ("http", http)]:
    if hasattr(mod, "__path__"):
        submodules: list[pkgutil.ModuleInfo] = list(
            pkgutil.iter_modules(mod.__path__)
        )
        names: list[str] = [m.name for m in submodules]
        print(f"{mod_name} submodules: {names}")
    else:
        print(f"{mod_name} is not a package (no __path__)")

## Section 3: Package vs. Module

The key distinction between a package and a plain module is the `__path__` attribute. Packages are directories with an `__init__.py` and have `__path__` set. Plain modules are single `.py` files (or built-in) and do not have `__path__`.

In [None]:
import email
import math
import json
import os

# Check __path__ to determine package vs module
test_modules: list[tuple[str, object]] = [
    ("email", email),
    ("math", math),
    ("json", json),
    ("os", os),
]

for name, mod in test_modules:
    has_path: bool = hasattr(mod, "__path__")
    kind: str = "package" if has_path else "module"
    print(f"{name:>8}: {kind:>7}  (has __path__: {has_path})")
    if has_path:
        print(f"          __path__ = {mod.__path__}")

In [None]:
import importlib
import types

def is_package(module_name: str) -> bool:
    """Check if a module name refers to a package."""
    mod: types.ModuleType = importlib.import_module(module_name)
    return hasattr(mod, "__path__")

# Test several modules
names: list[str] = ["email", "math", "json", "http", "os", "sys", "xml"]

for name in names:
    result: bool = is_package(name)
    print(f"{name:>8}: is_package = {result}")

## Section 4: Built-in Module Names

`sys.builtin_module_names` is a tuple of strings listing all modules that are compiled directly into the Python interpreter. These modules have no `.py` file -- they are implemented in C.

In [None]:
import sys

# sys.builtin_module_names is a tuple of strings
builtin_names: tuple[str, ...] = sys.builtin_module_names

print(f"Type: {type(builtin_names).__name__}")
print(f"Count: {len(builtin_names)} built-in modules")
print()

# Check for well-known built-ins
expected_builtins: list[str] = ["sys", "builtins", "_io", "math", "json"]

for name in expected_builtins:
    is_builtin: bool = name in builtin_names
    print(f"  {name:>12} is built-in: {is_builtin}")

In [None]:
import sys

# Print all built-in module names sorted
print("All built-in module names:")
for i, name in enumerate(sorted(sys.builtin_module_names)):
    # Print in columns
    end_char: str = "\n" if (i + 1) % 4 == 0 else ""
    print(f"  {name:<25}", end=end_char)
print()  # Final newline

## Section 5: Querying Package Versions with `importlib.metadata`

The `importlib.metadata` module (added in Python 3.8) lets you query metadata about installed distribution packages -- version strings, entry points, and more.

In [None]:
import importlib.metadata

# Get the version of an installed package
pip_version: str = importlib.metadata.version("pip")

print(f"pip version: {pip_version}")
print(f"Type: {type(pip_version).__name__}")
print(f"Length > 0: {len(pip_version) > 0}")

In [None]:
import importlib.metadata

# Query versions for several known packages
package_names: list[str] = ["pip", "setuptools"]

for name in package_names:
    try:
        version: str = importlib.metadata.version(name)
        print(f"{name:>15}: version {version}")
    except importlib.metadata.PackageNotFoundError:
        print(f"{name:>15}: not installed")

## Section 6: Handling Missing Packages

When you query metadata for a package that is not installed, `importlib.metadata` raises `PackageNotFoundError`. This is the proper way to check whether a distribution package is available.

In [None]:
import importlib.metadata

# Trying to get the version of a nonexistent package
try:
    version: str = importlib.metadata.version("nonexistent_package_xyz_123")
    print(f"Version: {version}")
except importlib.metadata.PackageNotFoundError as e:
    print(f"PackageNotFoundError raised: {e}")
    print(f"Exception type: {type(e).__name__}")

In [None]:
import importlib.metadata

def get_package_version(name: str) -> str | None:
    """Safely get a package's version, returning None if not installed."""
    try:
        return importlib.metadata.version(name)
    except importlib.metadata.PackageNotFoundError:
        return None

# Test with real and fake packages
test_packages: list[str] = ["pip", "setuptools", "fake_package_abc", "another_missing"]

for name in test_packages:
    version: str | None = get_package_version(name)
    if version is not None:
        print(f"{name:>20}: {version}")
    else:
        print(f"{name:>20}: not installed")

## Section 7: Package Distributions Mapping

`importlib.metadata.packages_distributions()` returns a mapping from top-level importable package names to the distribution packages that provide them. This is useful for figuring out which PyPI package provides a given import.

In [None]:
import importlib.metadata

# Get the mapping of importable names -> distribution packages
distributions: dict[str, list[str]] = importlib.metadata.packages_distributions()

print(f"Type: {type(distributions).__name__}")
print(f"Total importable package names mapped: {len(distributions)}")
print()

# Show some entries
count: int = 0
for pkg_name, dist_names in sorted(distributions.items()):
    if count >= 10:
        break
    print(f"  {pkg_name:>25} -> {dist_names}")
    count += 1

In [None]:
import importlib.metadata

def find_distribution(import_name: str) -> list[str]:
    """Find which distribution package provides a given importable name."""
    distributions: dict[str, list[str]] = importlib.metadata.packages_distributions()
    return distributions.get(import_name, [])

# Look up some importable names
search_names: list[str] = ["pip", "setuptools", "nonexistent_xyz"]

for name in search_names:
    dists: list[str] = find_distribution(name)
    if dists:
        print(f"{name:>20} is provided by: {dists}")
    else:
        print(f"{name:>20}: no distribution found")

## Section 8: Practical Patterns

Combining the tools from this notebook to solve real-world problems: checking dependencies, auditing installed packages, and introspecting the module system.

In [None]:
import sys
import importlib
import importlib.metadata
import types

def module_report(name: str) -> dict[str, str | bool]:
    """Generate a comprehensive report about a module."""
    report: dict[str, str | bool] = {"name": name}

    # Is it a built-in?
    report["is_builtin"] = name in sys.builtin_module_names

    # Can we import it?
    try:
        mod: types.ModuleType = importlib.import_module(name)
        report["importable"] = True
        report["is_package"] = hasattr(mod, "__path__")
        report["has_file"] = hasattr(mod, "__file__") and mod.__file__ is not None
    except ImportError:
        report["importable"] = False
        return report

    # Try to get version from metadata
    try:
        report["version"] = importlib.metadata.version(name)
    except importlib.metadata.PackageNotFoundError:
        report["version"] = "(stdlib or no metadata)"

    return report

# Generate reports for various modules
for name in ["sys", "json", "email", "math", "pip"]:
    info: dict[str, str | bool] = module_report(name)
    print(f"{name}:")
    for key, value in info.items():
        if key != "name":
            print(f"  {key:>14}: {value}")
    print()

In [None]:
import pkgutil
import sys

def count_stdlib_modules() -> dict[str, int]:
    """Count stdlib packages and modules."""
    all_mods: list[pkgutil.ModuleInfo] = list(pkgutil.iter_modules())
    builtin_set: frozenset[str] = frozenset(sys.builtin_module_names)

    stats: dict[str, int] = {
        "total_importable": len(all_mods),
        "packages": sum(1 for m in all_mods if m.ispkg),
        "modules": sum(1 for m in all_mods if not m.ispkg),
        "builtin_c_modules": len(builtin_set),
    }
    return stats

stats: dict[str, int] = count_stdlib_modules()
print("Module system statistics:")
for key, value in stats.items():
    label: str = key.replace("_", " ").title()
    print(f"  {label:>25}: {value}")

## Summary

### Module Discovery
- **`pkgutil.iter_modules()`**: Yields `ModuleInfo(module_finder, name, ispkg)` for all importable modules
- **`pkgutil.iter_modules(path)`**: Pass a package's `__path__` to list only its submodules
- **`ModuleInfo.name`**: The module's importable name
- **`ModuleInfo.ispkg`**: `True` if the entry is a package, `False` for a plain module

### Package Metadata
- **`importlib.metadata.version(name)`**: Returns the version string for an installed distribution
- **`importlib.metadata.packages_distributions()`**: Maps importable names to their distribution packages
- **`importlib.metadata.PackageNotFoundError`**: Raised when a distribution is not installed

### Package vs. Module
- **Packages** have a `__path__` attribute (they are directories with `__init__.py`)
- **Modules** do not have `__path__` (they are single files or built-in)
- Use `hasattr(mod, '__path__')` to distinguish between them

### Built-in Modules
- **`sys.builtin_module_names`**: A `tuple` of module names compiled into the interpreter (C modules)
- These modules have no `.py` file and are always available
- Common built-ins include `sys`, `builtins`, `_io`, and platform-specific modules