# Chapter 40: Custom Importers and the Import Machinery

This notebook dives into the internals of Python's import machinery. You will learn how `sys.meta_path` finders work, how to use `find_spec` and `module_from_spec` to manually walk through the import process, and how `sys.modules` and `sys.path` govern module resolution.

## Key Concepts
- **`sys.meta_path`**: The list of finder objects that Python consults when importing
- **`sys.path`**: The list of directories and zip files searched for modules
- **`importlib.util.find_spec`**: Locate a module's spec without importing it
- **`importlib.util.module_from_spec`**: Create a new module object from a spec
- **`sys.modules`**: The cache of all previously imported modules
- **Custom finders and loaders**: How to hook into the import system

## Section 1: The `sys.meta_path` Finders

When Python encounters an `import` statement, it iterates through `sys.meta_path` -- a list of finder objects. Each finder has a `find_spec` (or older `find_module`) method that is called to locate the requested module.

In [None]:
import sys

# sys.meta_path contains the finders Python uses
print(f"Number of finders: {len(sys.meta_path)}")
print(f"Type: {type(sys.meta_path)}")
print()

for i, finder in enumerate(sys.meta_path):
    finder_type: str = type(finder).__name__
    has_find_spec: bool = hasattr(finder, "find_spec")
    has_find_module: bool = hasattr(finder, "find_module")
    print(f"  [{i}] {finder_type}")
    print(f"      has find_spec:   {has_find_spec}")
    print(f"      has find_module: {has_find_module}")

In [None]:
import sys

# The three standard finders and what they handle
# 1. BuiltinImporter: handles built-in C modules (sys, builtins, etc.)
# 2. FrozenImporter: handles frozen modules (used during interpreter startup)
# 3. PathFinder: searches sys.path for .py files and packages

for finder in sys.meta_path:
    name: str = type(finder).__name__
    module: str = type(finder).__module__
    print(f"{name} (from {module})")
    
    # Show the class docstring if available
    doc: str = (type(finder).__doc__ or "No docstring").split("\n")[0]
    print(f"  -> {doc}")
    print()

## Section 2: The `sys.path` Search Path

`sys.path` is a list of strings specifying where the `PathFinder` looks for modules. It includes the current directory, installed package locations, and any directories added at runtime.

In [None]:
import sys

# sys.path is a mutable list of directory strings
print(f"Type: {type(sys.path).__name__}")
print(f"Length: {len(sys.path)} entries")
print()

# Display each entry
for i, path in enumerate(sys.path):
    label: str = "(empty string = cwd)" if path == "" else ""
    print(f"  [{i:2d}] {path!r} {label}")

In [None]:
import sys
import importlib
import importlib.util
import tempfile
import os

# Demonstrate that modifying sys.path affects what can be imported
tmp_dir: str = tempfile.mkdtemp()
mod_file: str = os.path.join(tmp_dir, "path_demo_mod.py")

with open(mod_file, "w") as f:
    f.write("GREETING: str = 'Hello from sys.path!'\n")

# Before adding to sys.path: module is not findable
spec_before = importlib.util.find_spec("path_demo_mod")
print(f"Before adding to sys.path: find_spec = {spec_before}")

# Add our directory to sys.path
sys.path.insert(0, tmp_dir)

# Now it is findable
spec_after = importlib.util.find_spec("path_demo_mod")
print(f"After adding to sys.path:  find_spec = {spec_after}")

# Import and use it
mod = importlib.import_module("path_demo_mod")
print(f"GREETING = {mod.GREETING}")

# Clean up
sys.path.remove(tmp_dir)
del sys.modules["path_demo_mod"]
os.remove(mod_file)
os.rmdir(tmp_dir)

## Section 3: Using `find_spec` to Locate Modules

`importlib.util.find_spec` is the modern way to locate a module without importing it. It returns a `ModuleSpec` if the module is found, or `None` if it cannot be located.

In [None]:
import importlib.util

# find_spec for an existing module
spec = importlib.util.find_spec("json")

if spec is not None:
    print(f"Name:    {spec.name}")
    print(f"Origin:  {spec.origin}")
    print(f"Loader:  {type(spec.loader).__name__}")
    print(f"Parent:  {spec.parent!r}")
else:
    print("Module not found")

In [None]:
import importlib.util

# find_spec returns None for nonexistent modules
spec_missing = importlib.util.find_spec("nonexistent_module_xyz_123")
print(f"find_spec('nonexistent_module_xyz_123'): {spec_missing}")
print(f"Result is None: {spec_missing is None}")
print()

# Compare built-in vs file-based modules
for name in ["sys", "json", "os.path"]:
    spec = importlib.util.find_spec(name)
    if spec is not None:
        origin: str = spec.origin or "(no origin)"
        loader_name: str = type(spec.loader).__name__
        print(f"{name:>10}: loader={loader_name}, origin={origin}")

## Section 4: Creating Modules from Specs

`importlib.util.module_from_spec` creates a new, uninitialized module object from a `ModuleSpec`. This lets you control each step of the import process manually.

In [None]:
import importlib.util
import types

# Step 1: Find the spec for the json module
spec = importlib.util.find_spec("json")
print(f"Step 1 - Found spec: {spec is not None}")
print(f"  spec.name = {spec.name}")
print(f"  spec.loader = {spec.loader}")
print()

# Step 2: Create an empty module from the spec
if spec is not None:
    module: types.ModuleType = importlib.util.module_from_spec(spec)
    print(f"Step 2 - Created module: {module}")
    print(f"  type: {type(module).__name__}")
    print(f"  __name__: {module.__name__}")
    print(f"  isinstance(module, ModuleType): {isinstance(module, types.ModuleType)}")

In [None]:
import importlib.util
import types

# Full manual import process: find, create, execute
spec = importlib.util.find_spec("json")

if spec is not None and spec.loader is not None:
    # Create the module
    module: types.ModuleType = importlib.util.module_from_spec(spec)
    print(f"Before exec_module:")
    print(f"  has 'dumps': {hasattr(module, 'dumps')}")
    print()

    # Execute the module code to populate its namespace
    spec.loader.exec_module(module)
    print(f"After exec_module:")
    print(f"  has 'dumps': {hasattr(module, 'dumps')}")
    print(f"  has 'loads': {hasattr(module, 'loads')}")
    
    # The module works normally
    result: str = module.dumps({"key": "value"})
    print(f"\n  module.dumps({{'key': 'value'}}) = {result}")

## Section 5: The `sys.modules` Cache

`sys.modules` is a dictionary that caches every module that has been imported. When you import a module, Python first checks this cache. If the module is already there, it returns the cached version instead of loading it again.

In [None]:
import sys
import json

# sys.modules is a dict mapping names to module objects
print(f"Type: {type(sys.modules).__name__}")
print(f"Total cached modules: {len(sys.modules)}")
print()

# Check that json is in the cache
print(f"'json' in sys.modules: {'json' in sys.modules}")
print(f"sys.modules['json'] is json: {sys.modules['json'] is json}")
print()

# Show some of the cached module names
cached_names: list[str] = sorted(sys.modules.keys())
print(f"First 15 cached modules:")
for name in cached_names[:15]:
    print(f"  {name}")

In [None]:
import sys
import importlib
import types

# Demonstrate that importing returns the cached object
import json
json_id_1: int = id(json)

# Importing again returns the exact same object
json_again: types.ModuleType = importlib.import_module("json")
json_id_2: int = id(json_again)

print(f"First import id:  {json_id_1}")
print(f"Second import id: {json_id_2}")
print(f"Same object: {json_id_1 == json_id_2}")
print()

# You can remove a module from the cache to force re-import
# (This is generally not recommended in production code)
print(f"'json' in sys.modules before del: {'json' in sys.modules}")
cached_json: types.ModuleType = sys.modules["json"]
del sys.modules["json"]
print(f"'json' in sys.modules after del:  {'json' in sys.modules}")

# Re-import creates a new module object
import json
print(f"\nAfter re-import, same object? {id(json) == json_id_1}")
print(f"'json' back in sys.modules: {'json' in sys.modules}")

## Section 6: Writing a Custom Finder

You can add your own finder to `sys.meta_path` to intercept imports. A finder must implement a `find_spec` method that returns a `ModuleSpec` or `None`.

In [None]:
import sys
import importlib
import importlib.abc
import importlib.machinery
import types
from typing import Sequence


class LoggingFinder(importlib.abc.MetaPathFinder):
    """A finder that logs import attempts without handling them."""

    def __init__(self) -> None:
        self.log: list[str] = []

    def find_spec(
        self,
        fullname: str,
        path: Sequence[str] | None,
        target: types.ModuleType | None = None,
    ) -> importlib.machinery.ModuleSpec | None:
        """Log the import attempt and return None to pass to next finder."""
        self.log.append(fullname)
        # Return None so the normal import machinery handles it
        return None


# Install our logging finder
logger: LoggingFinder = LoggingFinder()
sys.meta_path.insert(0, logger)

try:
    # These imports will be logged
    import csv
    import html
    import html.parser

    print("Import attempts logged:")
    for name in logger.log:
        print(f"  {name}")
finally:
    # Always clean up: remove our finder
    sys.meta_path.remove(logger)

In [None]:
import sys
import importlib
import importlib.abc
import importlib.machinery
import importlib.util
import types
from typing import Sequence


class VirtualModuleFinder(importlib.abc.MetaPathFinder):
    """A finder that creates virtual modules on the fly."""

    PREFIX: str = "virtual_"

    def find_spec(
        self,
        fullname: str,
        path: Sequence[str] | None,
        target: types.ModuleType | None = None,
    ) -> importlib.machinery.ModuleSpec | None:
        if fullname.startswith(self.PREFIX):
            return importlib.machinery.ModuleSpec(
                fullname,
                loader=VirtualModuleLoader(),
            )
        return None


class VirtualModuleLoader(importlib.abc.Loader):
    """A loader that populates virtual modules."""

    def exec_module(self, module: types.ModuleType) -> None:
        """Set up the virtual module's namespace."""
        module.MESSAGE = f"I am a virtual module named '{module.__name__}'"
        module.created_by = "VirtualModuleLoader"


# Install the virtual module finder
finder: VirtualModuleFinder = VirtualModuleFinder()
sys.meta_path.insert(0, finder)

try:
    # Import virtual modules that do not exist as files
    virtual_hello: types.ModuleType = importlib.import_module("virtual_hello")
    virtual_world: types.ModuleType = importlib.import_module("virtual_world")

    print(f"virtual_hello.MESSAGE: {virtual_hello.MESSAGE}")
    print(f"virtual_world.MESSAGE: {virtual_world.MESSAGE}")
    print(f"created_by: {virtual_hello.created_by}")
finally:
    sys.meta_path.remove(finder)
    # Clean up sys.modules
    for key in list(sys.modules.keys()):
        if key.startswith("virtual_"):
            del sys.modules[key]

## Section 7: The Complete Import Protocol

Putting it all together: here is the full sequence of steps Python follows when you write `import something`.

In [None]:
import sys
import importlib
import importlib.util
import types

def manual_import(name: str) -> types.ModuleType:
    """Walk through the import protocol step by step."""
    # Step 1: Check sys.modules cache
    if name in sys.modules:
        print(f"  Step 1: Found '{name}' in sys.modules cache")
        return sys.modules[name]

    print(f"  Step 1: '{name}' not in cache")

    # Step 2: Find the module spec using finders in sys.meta_path
    spec = importlib.util.find_spec(name)
    if spec is None:
        raise ModuleNotFoundError(f"No module named '{name}'")
    print(f"  Step 2: Found spec via {type(spec.loader).__name__}")

    # Step 3: Create the module object
    module: types.ModuleType = importlib.util.module_from_spec(spec)
    print(f"  Step 3: Created module object")

    # Step 4: Add to sys.modules BEFORE executing
    # (This prevents infinite loops with circular imports)
    sys.modules[name] = module
    print(f"  Step 4: Added to sys.modules")

    # Step 5: Execute the module code
    if spec.loader is not None:
        spec.loader.exec_module(module)
    print(f"  Step 5: Executed module code")

    return module


# Remove 'textwrap' from cache to demonstrate full import
if "textwrap" in sys.modules:
    del sys.modules["textwrap"]

print("Importing 'textwrap' manually:")
tw: types.ModuleType = manual_import("textwrap")
print(f"\nResult: {tw.shorten('Hello World, this is a test', width=20)}")

print("\nImporting 'textwrap' again (cached):")
tw2: types.ModuleType = manual_import("textwrap")
print(f"Same object: {tw is tw2}")

In [None]:
import sys
import importlib.util

# Show which finder handles which module type
test_modules: list[str] = ["sys", "_frozen_importlib", "json", "os.path"]

for name in test_modules:
    spec = importlib.util.find_spec(name)
    if spec is not None:
        loader_type: str = type(spec.loader).__name__
        origin: str = spec.origin or "(built-in)"
        print(f"{name:>25}: loader={loader_type:>20}, origin={origin}")
    else:
        print(f"{name:>25}: not found")

## Summary

### The Import Protocol
1. **Check `sys.modules`**: If the module is cached, return it immediately
2. **Search `sys.meta_path`**: Each finder's `find_spec` is called until one returns a `ModuleSpec`
3. **Create module**: `module_from_spec(spec)` creates an empty module object
4. **Cache early**: The module is added to `sys.modules` before execution (prevents circular import loops)
5. **Execute**: `spec.loader.exec_module(module)` runs the module's code

### Key Objects
- **`sys.meta_path`**: List of finders -- `BuiltinImporter`, `FrozenImporter`, `PathFinder` by default
- **`sys.path`**: List of directories the `PathFinder` searches for `.py` files and packages
- **`sys.modules`**: Dict cache of `{name: module}` for all imported modules
- **`ModuleSpec`**: Describes a module's name, origin, loader, and whether it is a package

### Key Functions
- **`importlib.util.find_spec(name)`**: Locate a module without importing it (returns `None` if not found)
- **`importlib.util.module_from_spec(spec)`**: Create a new `ModuleType` from a spec
- **`spec.loader.exec_module(module)`**: Execute module code to populate its namespace

### Custom Importers
- Implement `importlib.abc.MetaPathFinder` with a `find_spec` method
- Implement `importlib.abc.Loader` with an `exec_module` method
- Insert your finder into `sys.meta_path` and always clean up when done