# Chapter 38: Memory Profiling

This notebook covers tools for measuring and optimizing memory usage in Python. You will learn how to inspect object sizes with `sys.getsizeof`, reduce per-instance memory with `__slots__`, and trace allocations with `tracemalloc`.

## Key Concepts
- **`sys.getsizeof()`**: Returns the memory size of an object in bytes
- **`__slots__`**: Eliminates the per-instance `__dict__`, saving memory
- **`tracemalloc`**: Traces memory allocations to find where memory is used
- **`tracemalloc.take_snapshot()`**: Captures a snapshot of all current allocations
- **`tracemalloc.get_traced_memory()`**: Reports current and peak memory usage

## Section 1: `sys.getsizeof` -- Measuring Object Size

`sys.getsizeof()` returns the size of an object in bytes. This is the **shallow** size -- it does not include the size of objects that the target references.

In [None]:
import sys

# Basic types and their sizes
print("Sizes of basic types (in bytes):")
print(f"  int(0):           {sys.getsizeof(0)}")
print(f"  int(1):           {sys.getsizeof(1)}")
print(f"  int(2**30):       {sys.getsizeof(2**30)}")
print(f"  float(3.14):      {sys.getsizeof(3.14)}")
print(f"  bool(True):       {sys.getsizeof(True)}")
print(f"  None:             {sys.getsizeof(None)}")

# All sizes are positive
print(f"\ngetsizeof(0) > 0: {sys.getsizeof(0) > 0}")

In [None]:
import sys

# Container sizes grow with their contents
empty_list: list[int] = []
small_list: list[int] = [1, 2, 3, 4, 5]

print("List sizes:")
print(f"  []:              {sys.getsizeof(empty_list)}")
print(f"  [1,2,3,4,5]:     {sys.getsizeof(small_list)}")
print(f"  Empty < filled:  {sys.getsizeof(empty_list) < sys.getsizeof(small_list)}")

# String sizes grow with length
empty_str: str = ""
hello_str: str = "hello world"

print(f"\nString sizes:")
print(f"  '':              {sys.getsizeof(empty_str)}")
print(f"  'hello world':   {sys.getsizeof(hello_str)}")
print(f"  Empty < filled:  {sys.getsizeof(empty_str) < sys.getsizeof(hello_str)}")

# Tuple sizes
empty_tuple: tuple[()] = ()
small_tuple: tuple[int, int, int] = (1, 2, 3)

print(f"\nTuple sizes:")
print(f"  ():              {sys.getsizeof(empty_tuple)}")
print(f"  (1, 2, 3):       {sys.getsizeof(small_tuple)}")
print(f"  Empty < filled:  {sys.getsizeof(empty_tuple) < sys.getsizeof(small_tuple)}")

## Section 2: Comparing Container Type Sizes

Different container types have different memory overhead. Tuples are lighter than lists, and sets have hash-table overhead similar to dicts.

In [None]:
import sys

# Compare base sizes of empty containers
containers: dict[str, object] = {
    "list()":       [],
    "tuple()":      (),
    "dict()":       {},
    "set()":        set(),
    "frozenset()": frozenset(),
    "bytearray()": bytearray(),
}

print("Empty container sizes (bytes):")
for name, obj in containers.items():
    size: int = sys.getsizeof(obj)
    print(f"  {name:15s} {size:>4}")

print(f"\nTuples are lighter than lists: {sys.getsizeof(()) < sys.getsizeof([])}")

## Section 3: `__slots__` -- Reducing Per-Instance Memory

By default, each Python instance stores its attributes in a `__dict__` dictionary. Defining `__slots__` replaces the dict with a fixed-size structure, significantly reducing memory for classes with many instances.

In [None]:
import sys


class Regular:
    """A regular class with a __dict__."""

    def __init__(self, x: int) -> None:
        self.x: int = x


class Slotted:
    """A class using __slots__ instead of __dict__."""

    __slots__ = ("x",)

    def __init__(self, x: int) -> None:
        self.x: int = x


regular: Regular = Regular(1)
slotted: Slotted = Slotted(1)

regular_size: int = sys.getsizeof(regular)
slotted_size: int = sys.getsizeof(slotted)

print(f"Regular instance size: {regular_size} bytes")
print(f"Slotted instance size: {slotted_size} bytes")
print(f"Slotted is smaller:    {slotted_size < regular_size}")
print(f"Savings:               {regular_size - slotted_size} bytes per instance")

In [None]:
import sys


class RegularPoint:
    def __init__(self, x: float, y: float, z: float) -> None:
        self.x: float = x
        self.y: float = y
        self.z: float = z


class SlottedPoint:
    __slots__ = ("x", "y", "z")

    def __init__(self, x: float, y: float, z: float) -> None:
        self.x: float = x
        self.y: float = y
        self.z: float = z


# Regular instances have a __dict__
rp: RegularPoint = RegularPoint(1.0, 2.0, 3.0)
print(f"Regular has __dict__: {hasattr(rp, '__dict__')}")
print(f"Regular __dict__: {rp.__dict__}")
print(f"Regular __dict__ size: {sys.getsizeof(rp.__dict__)} bytes")

# Slotted instances do not
sp: SlottedPoint = SlottedPoint(1.0, 2.0, 3.0)
print(f"\nSlotted has __dict__: {hasattr(sp, '__dict__')}")
print(f"Slotted has __slots__: {hasattr(sp, '__slots__')}")

# Memory comparison
print(f"\nRegular size:  {sys.getsizeof(rp)} bytes")
print(f"Slotted size:  {sys.getsizeof(sp)} bytes")

In [None]:
import sys


class RegularItem:
    def __init__(self, val: int) -> None:
        self.val: int = val


class SlottedItem:
    __slots__ = ("val",)

    def __init__(self, val: int) -> None:
        self.val: int = val


# At scale, the savings add up significantly
n: int = 10_000

regular_items: list[RegularItem] = [RegularItem(i) for i in range(n)]
slotted_items: list[SlottedItem] = [SlottedItem(i) for i in range(n)]

# Estimate total memory (instance size * count)
regular_per: int = sys.getsizeof(regular_items[0])
slotted_per: int = sys.getsizeof(slotted_items[0])

regular_total: int = regular_per * n
slotted_total: int = slotted_per * n

print(f"Per-instance: Regular={regular_per}B, Slotted={slotted_per}B")
print(f"Total for {n:,} instances:")
print(f"  Regular: {regular_total:>10,} bytes ({regular_total / 1024:.1f} KB)")
print(f"  Slotted: {slotted_total:>10,} bytes ({slotted_total / 1024:.1f} KB)")
print(f"  Savings: {regular_total - slotted_total:>10,} bytes ({(regular_total - slotted_total) / 1024:.1f} KB)")

## Section 4: `tracemalloc` -- Tracing Memory Allocations

The `tracemalloc` module tracks memory allocations made by Python. It lets you take snapshots, compare them, and find the source of memory usage down to the file and line number.

In [None]:
import tracemalloc

# Start tracing memory allocations
tracemalloc.start()

# Allocate some memory
data: list[int] = [i for i in range(1000)]
text: str = "hello " * 1000

# Take a snapshot of current allocations
snapshot: tracemalloc.Snapshot = tracemalloc.take_snapshot()

# Get statistics grouped by line number
stats: list[tracemalloc.StatisticDiff] = snapshot.statistics("lineno")
print(f"Number of allocation sites: {len(stats)}")
print(f"Has entries: {len(stats) > 0}")

# Show the top 5 memory consumers
print("\nTop 5 allocation sites:")
for stat in stats[:5]:
    print(f"  {stat}")

# Stop tracing
tracemalloc.stop()

In [None]:
import tracemalloc

# tracemalloc can also group by filename
tracemalloc.start()

numbers: list[float] = [float(i) for i in range(5000)]

snapshot: tracemalloc.Snapshot = tracemalloc.take_snapshot()

# Group by filename instead of line number
stats_by_file: list[tracemalloc.StatisticDiff] = snapshot.statistics("filename")
print("Top 3 allocations by filename:")
for stat in stats_by_file[:3]:
    print(f"  {stat}")

tracemalloc.stop()

## Section 5: `get_traced_memory` -- Current and Peak Usage

`tracemalloc.get_traced_memory()` returns a tuple of `(current, peak)` bytes. This is useful for monitoring memory usage over time or finding the high-water mark of a function.

In [None]:
import tracemalloc

tracemalloc.start()

# Check initial memory
current_before: int
peak_before: int
current_before, peak_before = tracemalloc.get_traced_memory()
print(f"Before allocation:")
print(f"  Current: {current_before:>10,} bytes")
print(f"  Peak:    {peak_before:>10,} bytes")

# Allocate a large data structure
big_data: list[int] = list(range(100_000))

current_after: int
peak_after: int
current_after, peak_after = tracemalloc.get_traced_memory()
print(f"\nAfter allocation:")
print(f"  Current: {current_after:>10,} bytes")
print(f"  Peak:    {peak_after:>10,} bytes")

# Verify invariants
print(f"\ncurrent >= 0: {current_after >= 0}")
print(f"peak >= 0:    {peak_after >= 0}")
print(f"peak >= current: {peak_after >= current_after}")

tracemalloc.stop()

In [None]:
import tracemalloc


def measure_peak_memory(func: object) -> tuple[int, int]:
    """Measure the peak memory usage of a callable."""
    tracemalloc.start()
    func()  # type: ignore[operator]
    current: int
    peak: int
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return current, peak


def create_list() -> list[int]:
    """Create a list of integers."""
    return list(range(50_000))


def create_generator_sum() -> int:
    """Sum integers using a generator (lower peak memory)."""
    return sum(i for i in range(50_000))


list_current, list_peak = measure_peak_memory(create_list)
gen_current, gen_peak = measure_peak_memory(create_generator_sum)

print("List approach:")
print(f"  Current: {list_current:>10,} bytes")
print(f"  Peak:    {list_peak:>10,} bytes")

print("\nGenerator approach:")
print(f"  Current: {gen_current:>10,} bytes")
print(f"  Peak:    {gen_peak:>10,} bytes")

print(f"\nGenerator uses less peak memory: {gen_peak < list_peak}")

## Section 6: Comparing Snapshots

You can take two snapshots at different points and compare them to see what was allocated in between. This is invaluable for detecting memory leaks.

In [None]:
import tracemalloc

tracemalloc.start()

# Take a baseline snapshot
snapshot1: tracemalloc.Snapshot = tracemalloc.take_snapshot()

# Allocate some memory
new_data: list[str] = [f"item_{i}" for i in range(5000)]
more_data: dict[int, str] = {i: f"val_{i}" for i in range(2000)}

# Take a second snapshot
snapshot2: tracemalloc.Snapshot = tracemalloc.take_snapshot()

# Compare the two snapshots
diff_stats: list[tracemalloc.StatisticDiff] = snapshot2.compare_to(snapshot1, "lineno")

print("Top 5 memory increases between snapshots:")
for stat in diff_stats[:5]:
    print(f"  {stat}")

tracemalloc.stop()

## Section 7: Memory Optimization Strategies

Practical techniques for reducing memory usage in Python programs.

In [None]:
import sys

# Strategy 1: Use tuples instead of lists for immutable data
list_data: list[int] = [1, 2, 3, 4, 5]
tuple_data: tuple[int, ...] = (1, 2, 3, 4, 5)

print("Tuples vs lists:")
print(f"  list [1..5]:  {sys.getsizeof(list_data)} bytes")
print(f"  tuple (1..5): {sys.getsizeof(tuple_data)} bytes")
print(f"  Tuple smaller: {sys.getsizeof(tuple_data) < sys.getsizeof(list_data)}")

In [None]:
import sys

# Strategy 2: Use generators to avoid materializing large sequences
materialized: list[int] = [x * 2 for x in range(10_000)]
lazy: range = range(10_000)

print("Materialized list vs range object:")
print(f"  list:  {sys.getsizeof(materialized):>10,} bytes")
print(f"  range: {sys.getsizeof(lazy):>10,} bytes")

# Strategy 3: Use array module for homogeneous numeric data
import array

int_list: list[int] = list(range(1000))
int_array: array.array[int] = array.array("l", range(1000))

print(f"\nlist of 1000 ints:  {sys.getsizeof(int_list):>10,} bytes")
print(f"array of 1000 ints: {sys.getsizeof(int_array):>10,} bytes")
print(f"array is smaller:   {sys.getsizeof(int_array) < sys.getsizeof(int_list)}")

In [None]:
import sys

# Strategy 4: Use __slots__ for data-heavy classes (recap)
# Strategy 5: Delete large objects when no longer needed


def process_data() -> int:
    """Process data and clean up intermediate results."""
    # Large intermediate data
    raw: list[int] = list(range(100_000))
    result: int = sum(raw)

    # Explicitly free the large list
    del raw

    return result


total: int = process_data()
print(f"Result: {total:,}")
print("\nMemory optimization strategies:")
print("  1. Use tuples for immutable sequences")
print("  2. Use generators / range for iteration")
print("  3. Use array.array for homogeneous numeric data")
print("  4. Use __slots__ on classes with many instances")
print("  5. del large objects when they are no longer needed")

## Summary

### `sys.getsizeof()`
- Returns the **shallow** size of an object in bytes
- Does not account for referenced objects (e.g., list elements)
- Different types have different base sizes; containers grow with their contents

### `__slots__`
- Replaces the per-instance `__dict__` with a fixed-size structure
- Reduces memory per instance, significant when creating many objects
- Instances cannot have attributes not listed in `__slots__`

### `tracemalloc`
- **`tracemalloc.start()`** / **`tracemalloc.stop()`**: Enable/disable allocation tracing
- **`tracemalloc.take_snapshot()`**: Capture current allocations for analysis
- **`snapshot.statistics("lineno")`**: Group allocations by source line
- **`snapshot.compare_to(other, key)`**: Diff two snapshots to find leaks
- **`tracemalloc.get_traced_memory()`**: Returns `(current, peak)` bytes
- `peak >= current` always holds; useful for finding high-water marks

### Optimization Strategies
- Prefer tuples over lists for immutable data
- Use generators and `range` instead of materializing large sequences
- Use `array.array` for large homogeneous numeric collections
- Apply `__slots__` to classes instantiated many times
- Explicitly `del` large intermediate objects