# Chapter 20: Memory Management and Garbage Collection

Understanding how CPython manages memory is essential for writing efficient Python code.
This notebook explores object identity, memory measurement, reference counting,
garbage collection, and weak references.

## Topics Covered
- **Object identity**: `id()`, `is` vs `==`
- **Interning**: Small integer and string caching
- **Memory measurement**: `sys.getsizeof()`
- **Reference counting**: `sys.getrefcount()` and CPython's mechanism
- **Garbage collection**: The `gc` module and cycle detection
- **Weak references**: `weakref` module
- **Practical**: Tracking object lifecycles with `__del__`

## id() and Identity: `is` vs `==`

Every Python object has a unique identity (its memory address in CPython). The `id()` function
returns this identity. The `is` operator checks whether two names refer to the **same object**
(identity), while `==` checks whether two objects have the **same value** (equality).

In [None]:
# Identity vs equality
a: list[int] = [1, 2, 3]
b: list[int] = [1, 2, 3]
c = a  # c is an alias for a

print(f"a == b: {a == b}")   # True - same value
print(f"a is b: {a is b}")   # False - different objects
print(f"a is c: {a is c}")   # True - same object

print(f"\nid(a): {id(a)}")
print(f"id(b): {id(b)}")
print(f"id(c): {id(c)}")

# Mutating through one alias affects the other
c.append(4)
print(f"\na after c.append(4): {a}")  # [1, 2, 3, 4]
print(f"b is unaffected:      {b}")   # [1, 2, 3]

## Small Integer Interning and String Interning

CPython caches ("interns") small integers in the range **-5 to 256** at startup. Every variable
bound to one of these values points to the same pre-allocated object. Similarly, short strings
that look like identifiers are often interned automatically, and you can force interning with
`sys.intern()`.

In [None]:
import sys

# Small integer interning: -5 to 256 are cached singletons
x: int = 256
y: int = 256
print(f"256 is 256: {x is y}")  # True - interned
print(f"id(x)={id(x)}, id(y)={id(y)}")

# Outside the interning range, new objects are created
a: int = 257
b: int = 257
print(f"\n257 is 257: {a is b}")  # May be False (implementation detail)
print(f"257 == 257: {a == b}")    # Always True

# String interning for identifier-like strings
s1: str = "hello"
s2: str = "hello"
print(f"\n'hello' is 'hello': {s1 is s2}")  # Typically True (auto-interned)

# Strings with spaces are usually NOT auto-interned
s3: str = "hello world"
s4: str = "hello world"
print(f"'hello world' is 'hello world': {s3 is s4}")  # May be False

# Force interning with sys.intern()
s5: str = sys.intern("hello world")
s6: str = sys.intern("hello world")
print(f"\nAfter sys.intern():")
print(f"  s5 is s6: {s5 is s6}")  # True - forced interning

## sys.getsizeof(): Measuring Object Memory

`sys.getsizeof()` returns the memory consumption of an object in bytes. Note that it only
measures the **shallow** size of the object itself, not the objects it references. For containers,
it measures the container overhead plus its internal array of pointers, but not the elements.

In [None]:
import sys

# Size of basic types
print("Basic type sizes (bytes):")
print(f"  int(0):       {sys.getsizeof(0)}")
print(f"  int(1):       {sys.getsizeof(1)}")
print(f"  int(2**30):   {sys.getsizeof(2**30)}")
print(f"  int(2**100):  {sys.getsizeof(2**100)}")
print(f"  float(3.14):  {sys.getsizeof(3.14)}")
print(f"  bool(True):   {sys.getsizeof(True)}")
print(f"  None:         {sys.getsizeof(None)}")

# Strings grow with length
print(f"\nString sizes:")
print(f"  '':         {sys.getsizeof('')}")
print(f"  'a':        {sys.getsizeof('a')}")
print(f"  'hello':    {sys.getsizeof('hello')}")
print(f"  'a' * 100:  {sys.getsizeof('a' * 100)}")

# Container sizes (shallow - does NOT include element sizes)
print(f"\nContainer sizes:")
print(f"  []:            {sys.getsizeof([])}")
print(f"  [1,2,3]:       {sys.getsizeof([1, 2, 3])}")
print(f"  list(range(100)): {sys.getsizeof(list(range(100)))}")
print(f"  ():            {sys.getsizeof(())}")
print(f"  (1,2,3):       {sys.getsizeof((1, 2, 3))}")
print(f"  {{}}:            {sys.getsizeof({})}")
print(f"  set():         {sys.getsizeof(set())}")

## sys.getrefcount(): Reference Counting

CPython uses **reference counting** as its primary memory management strategy. Every object
has a reference count that tracks how many names, containers, or other objects point to it.
When the count drops to zero, the memory is immediately freed.

`sys.getrefcount(obj)` returns the current reference count. Note that passing `obj` to the
function itself creates a temporary reference, so the count is always at least one higher
than you might expect.

In [None]:
import sys

# Basic reference counting
a = [1, 2, 3]
print(f"After 'a = [1,2,3]':     refcount = {sys.getrefcount(a)}")
# The count includes: a + the argument to getrefcount = 2

b = a  # Another reference to the same list
print(f"After 'b = a':           refcount = {sys.getrefcount(a)}")

c = a  # Yet another reference
print(f"After 'c = a':           refcount = {sys.getrefcount(a)}")

del b  # Remove one reference
print(f"After 'del b':           refcount = {sys.getrefcount(a)}")

del c  # Remove another
print(f"After 'del c':           refcount = {sys.getrefcount(a)}")

# Small integers have very high refcounts due to interning
print(f"\nRefcount of int 1:  {sys.getrefcount(1)}")
print(f"Refcount of int 0:  {sys.getrefcount(0)}")
print(f"Refcount of None:   {sys.getrefcount(None)}")

## Garbage Collection: The gc Module

Reference counting alone cannot handle **circular references** (object A references B, and B
references A). CPython's garbage collector uses a **generational** algorithm to detect and
break reference cycles.

The `gc` module provides control over the garbage collector:
- `gc.collect()` - Force a collection cycle
- `gc.get_referrers(obj)` - Find objects that reference `obj`
- `gc.get_referents(obj)` - Find objects that `obj` references
- `gc.get_threshold()` - View collection thresholds
- `gc.isenabled()` / `gc.disable()` / `gc.enable()` - Control the collector

In [None]:
import gc

# Check GC status and thresholds
print(f"GC enabled: {gc.isenabled()}")
print(f"GC thresholds (gen0, gen1, gen2): {gc.get_threshold()}")
print(f"GC counts (gen0, gen1, gen2):     {gc.get_count()}")

# Create a circular reference
class Node:
    """A node that can create circular references."""
    def __init__(self, name: str) -> None:
        self.name = name
        self.partner: Node | None = None

    def __repr__(self) -> str:
        return f"Node({self.name!r})"

# Create a cycle: a -> b -> a
a = Node("A")
b = Node("B")
a.partner = b
b.partner = a

print(f"\na.partner = {a.partner}")
print(f"b.partner = {b.partner}")
print(f"Cycle: a.partner.partner is a = {a.partner.partner is a}")

# Without GC, deleting a and b would leak memory
# because the cycle keeps refcounts above zero
del a, b

# Force garbage collection to clean up the cycle
collected = gc.collect()
print(f"\nForced GC collected {collected} objects")

In [None]:
import gc

# gc.get_referrers() and gc.get_referents()
target: list[int] = [10, 20, 30]
container: dict[str, list[int]] = {"data": target}
alias = target

# Find what refers TO our target list
referrers = gc.get_referrers(target)
print(f"Number of referrers to target: {len(referrers)}")
for ref in referrers:
    if isinstance(ref, dict) and "data" in ref:
        print(f"  Found container dict: {ref}")

# Find what target refers TO (its contents)
referents = gc.get_referents(target)
print(f"\nObjects that target references: {referents}")

# gc.get_referents on a dict shows keys and values
dict_referents = gc.get_referents(container)
print(f"Dict referents: {dict_referents}")

## weakref: References That Don't Prevent GC

A **weak reference** allows you to refer to an object without incrementing its reference count.
When the original object is garbage collected, the weak reference returns `None` instead of
keeping the object alive.

Weak references are useful for caches, observer patterns, and avoiding circular reference leaks.

In [None]:
import weakref


class ExpensiveResource:
    """Simulates a resource-heavy object."""
    def __init__(self, name: str) -> None:
        self.name = name
        print(f"  Created: {self.name}")

    def __repr__(self) -> str:
        return f"ExpensiveResource({self.name!r})"

    def __del__(self) -> None:
        print(f"  Destroyed: {self.name}")


# Create a strong reference and a weak reference
print("Creating resource:")
obj = ExpensiveResource("big_data")
weak = weakref.ref(obj)

print(f"\nWeak ref alive: {weak()}")
print(f"weak() is obj:  {weak() is obj}")

# Delete the strong reference
print("\nDeleting strong reference:")
del obj

# The weak reference now returns None
print(f"Weak ref after del: {weak()}")

In [None]:
import weakref


class CachedItem:
    """An item that can be weakly cached."""
    def __init__(self, item_id: int, data: str) -> None:
        self.item_id = item_id
        self.data = data

    def __repr__(self) -> str:
        return f"CachedItem(id={self.item_id}, data={self.data!r})"


# WeakValueDictionary: values are weak references
# Items are automatically removed when no strong refs remain
cache: weakref.WeakValueDictionary[int, CachedItem] = weakref.WeakValueDictionary()

# Create items and cache them
item1 = CachedItem(1, "first")
item2 = CachedItem(2, "second")
item3 = CachedItem(3, "third")

cache[1] = item1
cache[2] = item2
cache[3] = item3

print(f"Cache keys: {list(cache.keys())}")
print(f"Cache[1]: {cache[1]}")

# Delete a strong reference - the entry disappears from cache
del item2
print(f"\nAfter del item2, cache keys: {list(cache.keys())}")

del item1
print(f"After del item1, cache keys: {list(cache.keys())}")

# item3 is still alive because we hold a strong reference
print(f"cache[3]: {cache[3]}")

## Practical: Tracking Object Lifecycles with `__del__`

The `__del__` method (finalizer) is called when an object is about to be garbage collected.
While it should not be relied on for critical cleanup (use context managers for that), it is
useful for understanding when objects are actually destroyed.

**Warning**: `__del__` can cause issues with circular references if not used carefully, because
the garbage collector cannot determine a safe order to call finalizers in a cycle.

In [None]:
import gc
import weakref


class TrackedObject:
    """An object that reports its own lifecycle events."""
    _instances: list[str] = []  # Class-level tracking

    def __init__(self, name: str) -> None:
        self.name = name
        TrackedObject._instances.append(name)
        print(f"  [CREATED]   {name} (total alive: {len(TrackedObject._instances)})")

    def __del__(self) -> None:
        if self.name in TrackedObject._instances:
            TrackedObject._instances.remove(self.name)
        print(f"  [DESTROYED] {self.name} (total alive: {len(TrackedObject._instances)})")

    def __repr__(self) -> str:
        return f"TrackedObject({self.name!r})"


# Watch the lifecycle
print("Creating objects:")
obj_a = TrackedObject("alpha")
obj_b = TrackedObject("beta")
obj_c = TrackedObject("gamma")

print(f"\nAlive: {TrackedObject._instances}")

print("\nDeleting beta:")
del obj_b

print(f"\nAlive: {TrackedObject._instances}")

print("\nDeleting remaining objects:")
del obj_a
del obj_c

print(f"\nAlive: {TrackedObject._instances}")

In [None]:
import sys
import weakref


class LifecycleDemo:
    """Demonstrates ref counting + weak refs + finalizers together."""
    def __init__(self, name: str) -> None:
        self.name = name

    def __repr__(self) -> str:
        return f"LifecycleDemo({self.name!r})"

    def __del__(self) -> None:
        print(f"  Finalizer called for {self.name}")


# Weak reference with a callback
def weak_callback(ref: weakref.ref) -> None:
    """Called when the weakly-referenced object is collected."""
    print(f"  Weak reference callback triggered: object was collected")


print("Step 1: Create object with weak ref + callback")
obj = LifecycleDemo("demo")
weak = weakref.ref(obj, weak_callback)
print(f"  obj = {obj}")
print(f"  weak() = {weak()}")
print(f"  refcount = {sys.getrefcount(obj)}")

print("\nStep 2: Delete the strong reference")
del obj

print("\nStep 3: Check weak reference")
print(f"  weak() = {weak()}")

## Summary

### Key Takeaways

| Concept | Tool | Purpose |
|---------|------|---------|
| **Identity** | `id()`, `is` | Check if two names refer to the same object |
| **Equality** | `==` | Check if two objects have the same value |
| **Interning** | Small ints, `sys.intern()` | Reuse immutable objects to save memory |
| **Memory size** | `sys.getsizeof()` | Measure shallow memory of an object |
| **Ref counting** | `sys.getrefcount()` | See how many references point to an object |
| **GC control** | `gc.collect()`, `gc.get_referrers()` | Force collection, inspect object graph |
| **Weak refs** | `weakref.ref()`, `WeakValueDictionary` | Reference objects without preventing GC |
| **Finalizers** | `__del__` | Run cleanup code when object is collected |

### Best Practices
- Use `==` for value comparison, `is` only for `None` checks and identity checks
- Never rely on interning behavior in application logic -- it is an implementation detail
- Use `sys.getsizeof()` for quick estimates, but remember it is shallow
- Prefer context managers over `__del__` for resource cleanup
- Use `weakref` for caches to avoid memory leaks
- Let the garbage collector do its job; only call `gc.collect()` when you have a specific reason