# üß† Section 2: Memory Layout, Strides, and Order Control

In this section, we explore **how NumPy stores and accesses data in memory**. Understanding array **strides** and **memory order** can help you write code that‚Äôs 10‚Äì100√ó faster, especially for large datasets.

We'll cover:
- How NumPy arranges data in memory (C vs. Fortran order)
- What strides are and how they affect slicing
- How to reshape, transpose, and change memory order
- How to optimize cache-friendly operations

## ‚öôÔ∏è 1. Memory Layout Basics

NumPy stores array elements in a **contiguous block of memory**. How it arranges that data depends on the **order**:

- **C-order (row-major)**: Last axis changes fastest (like C, Python).
- **Fortran-order (column-major)**: First axis changes fastest (like MATLAB, Fortran).

You can check this with the `.flags` attribute of any NumPy array.

In [ ]:
import numpy as np

a_c = np.arange(12).reshape(3, 4)  # Default: C order
a_f = np.asfortranarray(a_c)       # Explicit Fortran order

print("C-order array:\n", a_c)
print("\nFortran-order array:\n", a_f)

print("\nC-order flags:", a_c.flags)
print("\nFortran-order flags:", a_f.flags)

Notice that both arrays contain the same values ‚Äî but their **underlying memory layout** is completely different. This matters when interfacing with compiled libraries (e.g., BLAS, Cython, Fortran).

## üîç 2. Understanding Strides

Strides are the **number of bytes to skip in memory** to move along each dimension of an array. You can view them with the `.strides` attribute.

Example: In a `(3, 4)` float64 array (8 bytes per element):
- Moving to the next column ‚Üí +8 bytes
- Moving to the next row ‚Üí +32 bytes (4 columns √ó 8 bytes each)

In [ ]:
print("C-order strides:", a_c.strides)
print("Fortran-order strides:", a_f.strides)

You can think of strides as the **jump size** along each axis. Smaller strides mean data points are closer in memory ‚Äî improving **cache efficiency**.

When you slice an array, NumPy doesn‚Äôt copy the data ‚Äî it just changes the **shape and strides** of the view.

In [ ]:
# Create a slice (view)
b = a_c[:, ::2]  # Take every 2nd column
print(b)
print("\nShape:", b.shape)
print("Strides:", b.strides)

# Confirm it's a view
b[0, 0] = 999
print("\nOriginal array after modifying view:\n", a_c)

üëâ Because slicing returns **views**, not copies**, NumPy avoids unnecessary memory use ‚Äî but you need to be aware that changing one can change the other!

## üß© 3. Reshaping and Transposing

When you use `.reshape()` or `.transpose()`, NumPy tries to **reuse the existing memory buffer**. Whether it succeeds depends on the array‚Äôs layout and strides.

If reshaping requires reordering data, NumPy **creates a copy**.

In [ ]:
a = np.arange(12).reshape(3, 4)

# Transpose (switch axes)
a_T = a.T
print("Original strides:", a.strides)
print("Transposed strides:", a_T.strides)

# Reshape with compatible memory
b = a.reshape(4, 3, order='C')
print("\nReshaped (C order) strides:", b.strides)

# Reshape with incompatible order ‚Üí copy required
c = np.reshape(a, (4, 3), order='F')
print("\nReshaped (F order) strides:", c.strides)

You can **force the memory order** using the `order` argument:
- `'C'`: row-major (default)
- `'F'`: column-major
- `'A'`: adapt to array‚Äôs current order
- `'K'`: keep memory order (respect strides)

## üß† 4. Why Strides and Order Matter for Performance

Modern CPUs use **caches** that load chunks of memory at once. When elements are contiguous (like in C-order row-wise access), NumPy operations can be **vectorized** efficiently.

Let's compare iteration speeds for C-order vs Fortran-order arrays.

In [ ]:
import time

size = (2000, 2000)
arr_c = np.ones(size, order='C')
arr_f = np.ones(size, order='F')

def sum_rows(a):
    total = 0.0
    for row in a:
        total += row.sum()
    return total

t0 = time.time(); sum_rows(arr_c); t1 = time.time()
t2 = time.time(); sum_rows(arr_f); t3 = time.time()

print(f"Row-wise sum (C order): {t1 - t0:.4f}s")
print(f"Row-wise sum (F order): {t3 - t2:.4f}s")

In most cases, C-order is faster for **row-wise** access (Python-style loops), while Fortran-order wins for **column-wise** operations.

Understanding this helps when optimizing scientific or image-processing code ‚Äî especially when using nested loops or integrating with C/Fortran routines.

## üß© 5. Creating Arrays with Specific Memory Layouts

You can create arrays with the desired memory order from the start ‚Äî useful for optimization or compatibility with external libraries.

In [ ]:
# Create Fortran-ordered arrays directly
f_array = np.zeros((4, 3), order='F')
print(f_array.flags)

# Copy an existing array to a different layout
c_copy = np.ascontiguousarray(a_f)
f_copy = np.asfortranarray(a_c)

print("\nConverted layouts:")
print("C contiguous:", c_copy.flags['C_CONTIGUOUS'])
print("F contiguous:", f_copy.flags['F_CONTIGUOUS'])

## ‚ö° 6. Stride Tricks and Advanced Views

The `np.lib.stride_tricks` module lets you create **custom views** by manipulating strides directly.

This is powerful ‚Äî but risky ‚Äî since an incorrect stride configuration can produce overlapping or invalid memory views.

In [ ]:
from numpy.lib.stride_tricks import as_strided

# Rolling window example: 1D moving window of size 3
arr = np.arange(10)
window_size = 3

shape = (arr.size - window_size + 1, window_size)
strides = (arr.strides[0], arr.strides[0])

windows = as_strided(arr, shape=shape, strides=strides)
print(windows)

This technique avoids data copying ‚Äî each row is a view into the original array. You can use it for rolling computations, convolution, or pattern extraction.

## üí° Summary

| Concept | Description |
|----------|--------------|
| **C-order** | Row-major, last index changes fastest |
| **Fortran-order** | Column-major, first index changes fastest |
| **Strides** | Byte jumps per axis ‚Äî define how data is walked in memory |
| **Views** | Slices that share memory, changing strides not data |
| **Performance** | Contiguous memory ‚Üí faster vectorized operations |

Key takeaway: *Data layout and strides define performance at the hardware level.*

## üß© Challenge Exercise

**Task:**
1. Create a 2D array of shape `(6, 6)`.
2. Generate a 3√ó3 rolling window view using `as_strided()`.
3. Compute the mean of each window without using loops.

*(Hint: use `np.mean(windows, axis=-1)` after reshaping your view.)*

# --- End of Section 2 ---

Next up ‚Üí **Section 3: Broadcasting and Advanced Indexing**

We'll explore how NumPy generalizes operations between arrays of different shapes, and how to use indexing tricks to extract, filter, and transform data efficiently.