# üß† Section 8: Memory Mapping, Shared Arrays, and Performance Profiling

As datasets grow into gigabytes or terabytes, you can‚Äôt always load everything into memory at once. NumPy provides advanced tools for handling **large arrays efficiently**, especially for:
- **Memory mapping** (loading large data from disk without fully reading it into memory)
- **Shared memory arrays** (allowing multiple processes to share data efficiently)
- **Performance profiling** (measuring and optimizing speed and memory usage)

In this section, you‚Äôll:
1. Work with `np.memmap` to stream huge datasets.
2. Use `multiprocessing` with shared memory arrays for parallel computation.
3. Apply profiling techniques (`%timeit`, `tracemalloc`, `np.benchmark`) to optimize performance.
4. Explore real-world applications like **satellite image processing** and **sensor data pipelines**.

## üíæ 1. Memory Mapping with `np.memmap`

Memory mapping allows you to access parts of a file as if it were a NumPy array, without fully loading it into memory. This is crucial for working with large binary datasets ‚Äî for example, climate, genomics, or image sensor data.

In [ ]:
import numpy as np
import os

# Simulate a large dataset (e.g., satellite pixel data)
filename = 'large_satellite_data.dat'
shape = (5000, 5000)  # 25 million pixels

# Create the file using memmap and fill with simulated data
if not os.path.exists(filename):
    data = np.memmap(filename, dtype='float32', mode='w+', shape=shape)
    data[:] = np.random.random(shape)
    del data  # Flush to disk

# Now reopen it in read-only mode
mapped_data = np.memmap(filename, dtype='float32', mode='r', shape=shape)

# Access a small part efficiently (no full load)
sample = mapped_data[1000:1010, 1000:1010]
print("Sample block:\n", np.round(sample, 3))

üëâ The file `large_satellite_data.dat` could be several GBs ‚Äî but `np.memmap` only reads the parts you access, on demand.

This is extremely useful for **out-of-core processing** ‚Äî working with data larger than RAM (e.g., satellite images, MRI scans, or time-series logs).

## ‚öôÔ∏è 2. Real-World Example: Streaming Sensor Data

Imagine a factory producing high-frequency sensor readings ‚Äî temperature, vibration, and pressure ‚Äî every millisecond. Instead of loading all data, you can map the file and process it in blocks.

In [ ]:
# Simulate memory-mapped sensor data file
sensor_file = 'sensor_data.dat'
n_rows = 10_000_000  # 10 million readings

# Create synthetic data file if not exists
if not os.path.exists(sensor_file):
    sensors = np.memmap(sensor_file, dtype='float32', mode='w+', shape=(n_rows, 3))
    sensors[:] = np.random.normal(loc=[25, 0.1, 100], scale=[2, 0.05, 10], size=(n_rows, 3))
    del sensors

# Map and process in chunks
sensor_data = np.memmap(sensor_file, dtype='float32', mode='r', shape=(n_rows, 3))

batch_size = 1_000_000
means = []
for start in range(0, n_rows, batch_size):
    block = sensor_data[start:start + batch_size]
    means.append(block.mean(axis=0))

print("Mean sensor values (temperature, vibration, pressure):")
print(np.round(np.mean(means, axis=0), 3))

‚úÖ Only one batch at a time is loaded into memory, allowing efficient computation on **huge sensor datasets** that would otherwise exceed system RAM.

## üß© 3. Shared Memory Arrays for Multiprocessing

When you use Python‚Äôs `multiprocessing`, each process has its own memory space ‚Äî duplicating large arrays wastes RAM.

NumPy arrays can be **shared across processes** using `multiprocessing.shared_memory`, avoiding redundant copies and making parallel computation fast and memory-efficient.

In [ ]:
from multiprocessing import shared_memory, Process

# Create shared memory NumPy array
shape = (10_000_000,)
data = np.random.random(shape)

shm = shared_memory.SharedMemory(create=True, size=data.nbytes)
shared_arr = np.ndarray(shape, dtype=data.dtype, buffer=shm.buf)
shared_arr[:] = data[:]

# Worker function operating on shared array
def worker(start, end, name):
    shm = shared_memory.SharedMemory(name=name)
    arr = np.ndarray(shape, dtype=np.float64, buffer=shm.buf)
    arr[start:end] = np.sqrt(arr[start:end])  # Example transformation
    shm.close()

# Run parallel workers
chunk = len(shared_arr) // 4
processes = [Process(target=worker, args=(i*chunk, (i+1)*chunk, shm.name)) for i in range(4)]
for p in processes: p.start()
for p in processes: p.join()

print("Shared array processed by 4 workers in parallel.")
shm.close(); shm.unlink()

‚úÖ This technique allows **true parallel processing** of large arrays without data duplication ‚Äî ideal for **image preprocessing**, **financial Monte Carlo simulations**, or **scientific modeling**.

## üßÆ 4. Performance Profiling

NumPy is fast, but optimizing your pipeline often requires measuring performance precisely.

You can profile your operations using:
- `%%timeit` (Jupyter magic command)
- `time` module (basic timing)
- `tracemalloc` (memory tracking)
- `np.benchmark` or `perf_counter` (for advanced use)

In [ ]:
import tracemalloc, time

# Example: comparing two methods for normalization
x = np.random.random(10_000_000)

tracemalloc.start()
start = time.perf_counter()
x_norm1 = (x - np.mean(x)) / np.std(x)
mem1, _ = tracemalloc.get_traced_memory()
time1 = time.perf_counter() - start

tracemalloc.reset_peak()
start = time.perf_counter()
x_norm2 = (x - x.min()) / (x.max() - x.min())
mem2, _ = tracemalloc.get_traced_memory()
time2 = time.perf_counter() - start
tracemalloc.stop()

print(f"Z-score normalization: {time1:.4f}s, {mem1/1e6:.2f} MB")
print(f"Min-max normalization: {time2:.4f}s, {mem2/1e6:.2f} MB")

üîç Profiling helps identify bottlenecks and memory peaks. You can then apply techniques like:
- Using `out=` parameters to avoid unnecessary copies
- Processing data in blocks
- Leveraging compiled functions (Numba, Cython)

## üåç 5. Real-World Example: Climate Data Aggregation Pipeline

Let‚Äôs simulate a scenario where we process large temperature grids across multiple months.

We‚Äôll use `memmap` to stream the data and compute monthly means without loading everything into RAM.

In [ ]:
n_days, n_lat, n_lon = 365, 180, 360
climate_file = 'climate_temp_data.dat'

# Create large climate dataset if missing (180x360 grid for 365 days)
if not os.path.exists(climate_file):
    data = np.memmap(climate_file, dtype='float32', mode='w+', shape=(n_days, n_lat, n_lon))
    data[:] = np.random.normal(loc=15, scale=10, size=(n_days, n_lat, n_lon))
    del data

# Map file and compute monthly means
mapped = np.memmap(climate_file, dtype='float32', mode='r', shape=(n_days, n_lat, n_lon))
month_means = []
for month in range(12):
    start, end = month * 30, min((month + 1) * 30, n_days)
    month_means.append(mapped[start:end].mean())

print("Monthly average temperatures (¬∞C):")
print(np.round(month_means, 2))

‚úÖ This mimics **climate model post-processing**, a common task in Earth science where global temperature grids can be **hundreds of GBs**.

## ‚ö° Summary

- `np.memmap` allows streaming large data directly from disk.
- `multiprocessing.shared_memory` lets you share large arrays efficiently between processes.
- Profiling (`time`, `tracemalloc`, `timeit`) helps detect bottlenecks.
- Real-world applications include **sensor analytics**, **climate modeling**, **image processing**, and **financial simulations**.

## üß© Challenge Exercise

1. Create a 10 GB file-backed array using `np.memmap`.
2. Process it in 100 MB chunks, computing a running average.
3. Measure memory use and runtime with `tracemalloc`.
4. Compare the results with a fully loaded in-memory version.

*Goal:* Understand how streaming computation saves memory and scales with dataset size.

# --- End of Section 8 ---

Next up ‚Üí **Section 9: Advanced Linear Algebra, Eigenvalues, and Decompositions**

We‚Äôll dive into matrix factorization techniques (SVD, PCA, eigen-decomposition) ‚Äî essential for machine learning and scientific computing.