# Chapter 38: Garbage Collection

This notebook covers Python's garbage collector (`gc` module), which supplements reference counting by detecting and cleaning up circular references. You will also learn about `weakref.finalize` for cleanup callbacks and `gc.freeze` for performance optimization.

## Key Concepts
- **`gc` module**: Controls the cyclic garbage collector
- **Circular references**: Objects that reference each other, preventing ref count from reaching zero
- **Generational collection**: Three generations with increasing collection thresholds
- **`weakref.finalize`**: Register a callback that runs when an object is collected
- **`gc.freeze` / `gc.unfreeze`**: Move objects to a permanent generation for performance

## Section 1: The `gc` Module Basics

Python's garbage collector is enabled by default and runs automatically. You can inspect its state and trigger collection manually.

In [None]:
import gc

# The garbage collector is enabled by default
enabled: bool = gc.isenabled()
print(f"GC enabled: {enabled}")

# You can disable and re-enable it
gc.disable()
print(f"After disable: {gc.isenabled()}")

gc.enable()
print(f"After enable:  {gc.isenabled()}")

In [None]:
import gc

# gc.collect() manually triggers garbage collection
# It returns the number of unreachable objects found and freed
collected: int = gc.collect()
print(f"Objects collected: {collected}")
print(f"Return type is int: {isinstance(collected, int)}")

## Section 2: Circular References

Reference counting cannot free objects that form a cycle. The garbage collector detects these cycles and breaks them. We use `weakref.ref` to verify that the objects were actually collected.

In [None]:
import gc
import weakref


class Node:
    """A simple node that can reference another node."""

    def __init__(self, name: str) -> None:
        self.name: str = name
        self.ref: "Node | None" = None

    def __repr__(self) -> str:
        return f"Node({self.name!r})"


# Create a circular reference: a -> b -> a
a: Node = Node("A")
b: Node = Node("B")
a.ref = b
b.ref = a

# Create weak references to track whether objects are collected
weak_a: weakref.ref[Node] = weakref.ref(a)
weak_b: weakref.ref[Node] = weakref.ref(b)

print(f"Before del -- weak_a alive: {weak_a() is not None}")
print(f"Before del -- weak_b alive: {weak_b() is not None}")

# Delete the strong references
del a, b

# Force garbage collection to break the cycle
collected: int = gc.collect()
print(f"\nObjects collected: {collected}")
print(f"After gc -- weak_a alive: {weak_a() is not None}")
print(f"After gc -- weak_b alive: {weak_b() is not None}")
print(f"\nBoth nodes were freed: {weak_a() is None and weak_b() is None}")

## Section 3: The `__del__` Finalizer

The `__del__` method is called when an object is about to be destroyed. However, it has significant drawbacks with circular references and is generally discouraged. Use `weakref.finalize` instead (shown in the next section).

In [None]:
import gc


class Resource:
    """A class with a __del__ finalizer."""

    def __init__(self, name: str) -> None:
        self.name: str = name
        print(f"  Created {self.name}")

    def __del__(self) -> None:
        print(f"  Finalized {self.name}")


print("Creating and deleting a Resource:")
r: Resource = Resource("my_resource")
del r

print("\nNote: __del__ runs immediately when ref count hits zero.")
print("For circular references, __del__ can prevent collection.")
print("Prefer weakref.finalize for cleanup callbacks.")

## Section 4: `weakref.finalize` -- Safe Cleanup Callbacks

`weakref.finalize` registers a callback that runs when the target object is garbage collected. Unlike `__del__`, it does not prevent the garbage collector from cleaning up circular references.

In [None]:
import gc
import weakref


class Resource:
    """A resource tracked with weakref.finalize."""

    def __init__(self, name: str) -> None:
        self.name: str = name


cleaned_up: list[str] = []


def cleanup_callback(name: str) -> None:
    """Called when the Resource is collected."""
    cleaned_up.append(name)
    print(f"  Cleanup callback fired for: {name}")


# Create a resource and register a finalizer
obj: Resource = Resource("database_connection")
weakref.finalize(obj, cleanup_callback, "database_connection")

print("Before deletion:")
print(f"  cleaned_up: {cleaned_up}")

# Delete the object -- the callback fires
del obj
gc.collect()

print("\nAfter deletion:")
print(f"  cleaned_up: {cleaned_up}")
print(f"  Callback was called: {len(cleaned_up) == 1}")

In [None]:
import gc
import weakref


class Resource:
    pass


# finalize with a lambda callback
events: list[bool] = []
obj: Resource = Resource()
weakref.finalize(obj, lambda: events.append(True))

print(f"Events before del: {events}")

del obj
gc.collect()

print(f"Events after del:  {events}")
print(f"Callback ran:      {len(events) == 1}")

## Section 5: Generational Collection and `gc.get_stats()`

Python's garbage collector uses a generational scheme with three generations (0, 1, 2). New objects start in generation 0. Objects that survive a collection cycle are promoted to the next generation. Older generations are collected less frequently.

In [None]:
import gc

# gc.get_stats() returns one dict per generation
stats: list[dict[str, int]] = gc.get_stats()
print(f"Number of generations: {len(stats)}")
print()

for i, gen_stats in enumerate(stats):
    print(f"Generation {i}:")
    print(f"  collections: {gen_stats['collections']}")
    print(f"  collected:   {gen_stats['collected']}")
    print(f"  uncollectable: {gen_stats['uncollectable']}")
    print()

# Verify expected keys
print(f"'collections' in stats[0]: {'collections' in stats[0]}")
print(f"'collected' in stats[0]:   {'collected' in stats[0]}")

In [None]:
import gc

# gc.get_threshold() shows the allocation thresholds for each generation
thresholds: tuple[int, int, int] = gc.get_threshold()
print(f"Collection thresholds: {thresholds}")
print(f"  Generation 0: collect after {thresholds[0]} new allocations")
print(f"  Generation 1: collect after {thresholds[1]} gen-0 collections")
print(f"  Generation 2: collect after {thresholds[2]} gen-1 collections")

# You can collect a specific generation
collected_gen0: int = gc.collect(generation=0)
print(f"\nGen-0 collection freed: {collected_gen0} objects")

## Section 6: `gc.freeze` and `gc.unfreeze`

`gc.freeze()` moves all currently tracked objects into a permanent generation that is never scanned by the collector. This is useful in long-running applications (like web servers) where startup objects will never become garbage -- skipping them speeds up collection cycles.

In [None]:
import gc

# Freeze all currently tracked objects
gc.freeze()
frozen_count: int = gc.get_freeze_count()
print(f"Frozen objects: {frozen_count}")
print(f"Frozen count >= 0: {frozen_count >= 0}")

# Frozen objects are excluded from future GC scans
# This reduces pause times in long-running processes

# Unfreeze returns objects to normal generational tracking
gc.unfreeze()
after_unfreeze: int = gc.get_freeze_count()
print(f"\nAfter unfreeze: {after_unfreeze}")
print(f"All unfrozen: {after_unfreeze == 0}")

In [None]:
import gc

# Practical pattern: freeze after application startup
# This prevents the GC from repeatedly scanning long-lived objects

# Simulate startup: create some long-lived data
config: dict[str, str] = {"db_host": "localhost", "db_port": "5432"}
cache: dict[int, str] = {i: f"value_{i}" for i in range(100)}

# Freeze everything created so far
gc.collect()  # Clean up any garbage first
gc.freeze()
frozen: int = gc.get_freeze_count()
print(f"Frozen {frozen} objects after startup")

# New objects created during runtime are still tracked normally
runtime_data: list[int] = [1, 2, 3]

# Unfreeze when done (for notebook cleanup)
gc.unfreeze()
print(f"Unfrozen. Freeze count: {gc.get_freeze_count()}")

## Section 7: Weak References and the GC

Weak references (`weakref.ref`) allow you to reference an object without preventing it from being garbage collected. They are essential for caches and observer patterns where you do not want to keep objects alive artificially.

In [None]:
import gc
import weakref


class CacheItem:
    """An item that can be weakly referenced."""

    def __init__(self, value: str) -> None:
        self.value: str = value

    def __repr__(self) -> str:
        return f"CacheItem({self.value!r})"


# Create an object and a weak reference to it
item: CacheItem = CacheItem("important_data")
weak_item: weakref.ref[CacheItem] = weakref.ref(item)

# The weak reference can be dereferenced to get the object
print(f"Weak ref alive: {weak_item() is not None}")
print(f"Value: {weak_item()}")

# Delete the strong reference
del item
gc.collect()

# Now the weak reference returns None
print(f"\nAfter del:")
print(f"Weak ref alive: {weak_item() is not None}")
print(f"Value: {weak_item()}")

## Summary

### `gc` Module Essentials
- **`gc.isenabled()`**: Check if the garbage collector is active (default: `True`)
- **`gc.collect()`**: Manually trigger collection; returns the count of freed objects
- **`gc.get_stats()`**: Returns a list of 3 dicts (one per generation) with `collections`, `collected`, and `uncollectable` keys
- **`gc.get_threshold()`**: Shows allocation thresholds that trigger automatic collection

### Circular References
- Reference counting cannot break cycles (A references B, B references A)
- The garbage collector detects and frees cycles automatically
- Use `weakref.ref` to verify that objects have been collected

### Cleanup Callbacks
- **`__del__`**: Called on object destruction, but can prevent cycle collection
- **`weakref.finalize`**: Preferred alternative; does not interfere with the GC

### Performance: `gc.freeze` / `gc.unfreeze`
- `gc.freeze()` moves tracked objects to a permanent generation (never scanned)
- `gc.get_freeze_count()` reports how many objects are frozen
- `gc.unfreeze()` returns frozen objects to normal generational tracking
- Useful after application startup to reduce GC pause times