# üß© Section 1: Custom Data Types and Structured Arrays

NumPy isn‚Äôt limited to homogeneous numerical arrays ‚Äî it can store **heterogeneous records** (like rows in a database or C structs) efficiently in a single contiguous block of memory.

In this section, you'll learn how to:
- Define **custom `dtype` objects** with named fields.
- Create and manipulate **structured arrays**.
- Access fields efficiently using attribute-style or key-based indexing.
- Understand **memory layout**, **alignment**, and **binary compatibility**.

## üîß 1. Why Structured Arrays?

Structured arrays allow you to store and operate on mixed-type tabular data without leaving NumPy. They‚Äôre great for scenarios like:
- Representing sensor data (timestamp, reading, status)
- Binary I/O (reading C structs or binary logs)
- Lightweight replacements for pandas DataFrames in low-level or embedded environments

In [ ]:
import numpy as np

# Define a structured dtype with multiple fields
sensor_dtype = np.dtype([
    ('id', np.int32),
    ('temperature', np.float64),
    ('humidity', np.float64),
    ('status', 'U10')  # Unicode string, up to 10 chars
])

# Create a structured array
data = np.array([
    (101, 21.4, 45.2, 'OK'),
    (102, 19.8, 47.1, 'OK'),
    (103, 28.3, 40.3, 'FAIL'),
], dtype=sensor_dtype)

print(data)
print("\nField names:", data.dtype.names)

## üéØ 2. Accessing and Modifying Fields

Each column (field) in a structured array is itself a **view** into the memory block ‚Äî no copying occurs.

You can access fields using dictionary-style indexing or dot notation.

In [ ]:
# Accessing fields
print("Temperatures:", data['temperature'])

# Modify one field (affects original array)
data['humidity'] *= 1.05  # Increase by 5%
print("\nUpdated data:\n", data)

# Filtering based on a field
failed = data[data['status'] == 'FAIL']
print("\nFailed sensors:\n", failed)

## üß† 3. Nested and Aligned dtypes

Structured dtypes can be nested or aligned to match low-level C structs.
Setting `align=True` ensures that each field‚Äôs offset respects platform alignment requirements ‚Äî crucial when sharing memory with C or binary data.

In [ ]:
# Define a nested dtype (position is a substructure)
position_dtype = np.dtype([
    ('x', np.float32),
    ('y', np.float32),
])

robot_dtype = np.dtype([
    ('id', np.int32),
    ('position', position_dtype),
    ('battery', np.float32)
], align=True)

robots = np.array([
    (1, (12.5, 8.2), 77.5),
    (2, (3.3, 4.4), 45.2)
], dtype=robot_dtype)

print(robots)
print("\nMemory offsets:", [robot_dtype.fields[name][1] for name in robot_dtype.names])

NumPy stores this as a **binary layout** that matches a C struct with identical field order and alignment. This means you can use `.tobytes()` or `np.frombuffer()` for direct file or socket I/O.

In [ ]:
# Example: binary serialization
raw = robots.tobytes()
print("Raw byte length:", len(raw))

# Deserialize back from binary
loaded = np.frombuffer(raw, dtype=robot_dtype)
print("\nReloaded from bytes:\n", loaded)

## üß© 4. Working with Record Arrays (`recarray`)

`np.recarray` adds **attribute-style field access** (`arr.fieldname`) for convenience, but uses the same underlying memory.

Use it for readability when field names are frequently accessed, but avoid it in performance-critical code ‚Äî it adds minor overhead.

In [ ]:
rec = np.rec.array(data)
print(rec.temperature)
print(rec.status)

## üß¨ Under the Hood: How Structured Arrays Work

- Each `dtype` field maps to a **byte offset** within the contiguous memory buffer.
- NumPy doesn‚Äôt store Python objects for each record ‚Äî instead, it interprets raw bytes according to the dtype schema.
- Fields are **views**, not copies, meaning fast access and minimal overhead.
- Structured arrays are extremely efficient for fixed-schema binary data, similar to C structs or database rows.

## ‚öôÔ∏è Best Practices & Pitfalls

‚úÖ Define `dtype`s explicitly to ensure binary stability across machines.
‚úÖ Use `align=True` when interoperating with compiled code or memory-mapped files.
‚úÖ Keep string fields fixed-length (`'U10'`, `'S20'`) for predictable layout.
‚ö†Ô∏è Avoid frequent resizing or appending ‚Äî structured arrays are **static in shape and schema**.
‚ö†Ô∏è Don‚Äôt rely on field order implicitly; always reference by name.

## üí™ Challenge Exercise

**Task:** Create a structured array to represent stock trades with fields:
- `symbol` (string, up to 5 chars)
- `price` (float)
- `volume` (int)
- `timestamp` (float, UNIX time)

Then:
1. Populate the array with at least 5 records.
2. Compute the total traded volume per symbol.
3. Serialize the array to bytes and reload it using `np.frombuffer()`.

*(Hint: use `np.unique(..., return_counts=True)` or boolean masking.)*

# --- End of Section 1 ---

Next up ‚Üí **Section 2: Memory Layout, Strides, and Order Control**

You‚Äôll learn how NumPy arranges memory internally (C vs. Fortran order), how to manipulate strides, and why cache locality can make a 10√ó performance difference.