# Lab 1A: NumPy Fundamentals

## Comprehensive NumPy for Machine Learning

**Duration:** 90 minutes | **Difficulty:** Beginner to Intermediate | **Prerequisites:** Basic Python

---

### Learning Objectives

By the end of this lab, you will be able to:

1. **Create arrays** using various NumPy functions
2. **Understand broadcasting** and how it enables efficient operations
3. **Reshape and slice** arrays for data manipulation
4. **Use boolean indexing** to filter data
5. **Apply aggregation functions** with axis parameters
6. **Perform linear algebra** operations essential for ML
7. **Visualize** array operations with matplotlib

---

## Setup

Run this cell first to import libraries and configure matplotlib for inline display.

In [None]:
# Setup - Run this cell first
import numpy as np
import matplotlib.pyplot as plt

# Configure matplotlib for inline display
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['font.size'] = 12

# Set random seed for reproducibility
np.random.seed(42)

print(f"NumPy version: {np.__version__}")
print("✓ Setup complete! Matplotlib configured for inline display.")

---

## Section 1: Array Creation

NumPy provides many ways to create arrays. Understanding these is fundamental to working with data.

### Exercise 1.1: Basic Array Creation

Create the following arrays:

1. `arr_list` - Create from a Python list: [1, 2, 3, 4, 5]
2. `arr_zeros` - A 4x4 array of zeros
3. `arr_ones` - A 3x5 array of ones
4. `arr_full` - A 2x3 array filled with the value 7
5. `arr_eye` - A 4x4 identity matrix

In [None]:
# Exercise 1.1: Basic Array Creation

# YOUR CODE HERE
arr_list = None      # Use np.array()
arr_zeros = None     # Use np.zeros()
arr_ones = None      # Use np.ones()
arr_full = None      # Use np.full()
arr_eye = None       # Use np.eye()

# Test your code
print("From list:", arr_list)
print(f"\nZeros shape: {arr_zeros.shape}")
print(f"Ones shape: {arr_ones.shape}")
print(f"\nFull array:\n{arr_full}")
print(f"\nIdentity matrix:\n{arr_eye}")

### Exercise 1.2: Range-Based Arrays

Create arrays using ranges and linear spacing:

1. `arr_range` - Values from 0 to 19 using `np.arange()`
2. `arr_step` - Values from 0 to 20 with step 2 using `np.arange()`
3. `arr_linspace` - 11 evenly spaced values from 0 to 1 using `np.linspace()`
4. `arr_logspace` - 5 logarithmically spaced values from 10^0 to 10^4 using `np.logspace()`

In [None]:
# Exercise 1.2: Range-Based Arrays

# YOUR CODE HERE
arr_range = None     # 0 to 19
arr_step = None      # 0, 2, 4, ..., 20
arr_linspace = None  # 11 values from 0 to 1
arr_logspace = None  # 5 values: 1, 10, 100, 1000, 10000

# Test your code
print("Range (0-19):", arr_range)
print("Step of 2:", arr_step)
print("Linspace (0 to 1):", arr_linspace)
print("Logspace (10^0 to 10^4):", arr_logspace)

### Exercise 1.3: Random Arrays

Random arrays are crucial for ML (weight initialization, data augmentation, etc.):

1. `rand_uniform` - 3x4 array with uniform random values in [0, 1)
2. `rand_normal` - 3x4 array with standard normal distribution (mean=0, std=1)
3. `rand_int` - 3x4 array with random integers from 1 to 10
4. `rand_choice` - 10 random samples from ['A', 'B', 'C']

In [None]:
# Exercise 1.3: Random Arrays
np.random.seed(42)  # For reproducibility

# YOUR CODE HERE
rand_uniform = None  # Use np.random.rand()
rand_normal = None   # Use np.random.randn()
rand_int = None      # Use np.random.randint()
rand_choice = None   # Use np.random.choice()

# Test your code
print("Uniform [0,1):\n", rand_uniform)
print("\nNormal distribution:\n", rand_normal)
print("\nRandom integers (1-10):\n", rand_int)
print("\nRandom choice:", rand_choice)

### Visualization: Random Distributions

Let's visualize the difference between uniform and normal distributions.

In [None]:
# Visualize random distributions
np.random.seed(42)
uniform_data = np.random.rand(10000)
normal_data = np.random.randn(10000)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].hist(uniform_data, bins=50, edgecolor='black', alpha=0.7, color='steelblue')
axes[0].set_title('Uniform Distribution [0, 1)')
axes[0].set_xlabel('Value')
axes[0].set_ylabel('Frequency')

axes[1].hist(normal_data, bins=50, edgecolor='black', alpha=0.7, color='coral')
axes[1].set_title('Normal Distribution (μ=0, σ=1)')
axes[1].set_xlabel('Value')
axes[1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

---

## Section 2: Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes efficiently. This is one of NumPy's most powerful features!

### Exercise 2.1: Scalar Broadcasting

When you operate on an array with a scalar, the scalar is "broadcast" to every element.

Given `arr = np.array([1, 2, 3, 4, 5])`, create:
1. `plus_10` - Add 10 to each element
2. `times_3` - Multiply each element by 3
3. `squared` - Square each element
4. `normalized` - Subtract mean and divide by std (standardization)

In [None]:
# Exercise 2.1: Scalar Broadcasting
arr = np.array([1, 2, 3, 4, 5])

# YOUR CODE HERE
plus_10 = None    # arr + 10
times_3 = None    # arr * 3
squared = None    # arr ** 2
normalized = None # (arr - arr.mean()) / arr.std()

# Test your code
print("Original:", arr)
print("Plus 10:", plus_10)
print("Times 3:", times_3)
print("Squared:", squared)
print(f"Normalized: {normalized} (mean={normalized.mean():.2f}, std={normalized.std():.2f})")

### Exercise 2.2: Row and Column Broadcasting

Broadcasting works with higher dimensions too. Given a matrix, you can add a row vector to every row, or a column vector to every column.

```
matrix (3x4):        row (1x4):           result (3x4):
[[1, 2, 3, 4],       [10, 20, 30, 40]  →  [[11, 22, 33, 44],
 [5, 6, 7, 8],                             [15, 26, 37, 48],
 [9, 10, 11, 12]]                          [19, 30, 41, 52]]
```

Create:
1. `row_added` - Add `row_vec` to each row of `matrix`
2. `col_added` - Add `col_vec` to each column of `matrix`
3. `row_multiplied` - Multiply each row by `row_vec`

In [None]:
# Exercise 2.2: Row and Column Broadcasting
matrix = np.arange(1, 13).reshape(3, 4)
row_vec = np.array([10, 20, 30, 40])
col_vec = np.array([[100], [200], [300]])  # Shape (3, 1)

print("Matrix (3x4):")
print(matrix)
print("\nRow vector:", row_vec)
print("Column vector:", col_vec.flatten())

# YOUR CODE HERE
row_added = None      # matrix + row_vec
col_added = None      # matrix + col_vec
row_multiplied = None # matrix * row_vec

print("\nRow added:")
print(row_added)
print("\nColumn added:")
print(col_added)
print("\nRow multiplied:")
print(row_multiplied)

### Exercise 2.3: Broadcasting Rules

**Broadcasting Rules:**
1. Arrays are compared element-wise, starting from the trailing dimensions
2. Dimensions are compatible if they are equal OR one of them is 1
3. Arrays with fewer dimensions are padded with 1s on the left

Predict which operations will work and what the result shape will be:

In [None]:
# Exercise 2.3: Broadcasting Rules

a = np.ones((3, 4))
b = np.ones((4,))
c = np.ones((3, 1))
d = np.ones((1, 4))
e = np.ones((2, 3, 4))

# Predict the shape before running!
print("a shape:", a.shape)
print("b shape:", b.shape)
print("c shape:", c.shape)
print("d shape:", d.shape)
print("e shape:", e.shape)

print("\n--- Results ---")
print("a + b shape:", (a + b).shape)  # (3,4) + (4,) → ?
print("a + c shape:", (a + c).shape)  # (3,4) + (3,1) → ?
print("a + d shape:", (a + d).shape)  # (3,4) + (1,4) → ?
print("c + d shape:", (c + d).shape)  # (3,1) + (1,4) → ?
print("e + b shape:", (e + b).shape)  # (2,3,4) + (4,) → ?

### Visualization: Broadcasting in Action

In [None]:
# Visualize broadcasting: centering data (subtracting column means)
np.random.seed(42)
data = np.random.randn(100, 3) * [1, 2, 3] + [0, 5, 10]  # Different scales

# Center data using broadcasting
column_means = data.mean(axis=0)  # Shape: (3,)
centered_data = data - column_means  # Broadcasting!

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Original data
for i in range(3):
    axes[0].hist(data[:, i], bins=20, alpha=0.5, label=f'Column {i}')
axes[0].set_title('Original Data')
axes[0].set_xlabel('Value')
axes[0].legend()

# Centered data
for i in range(3):
    axes[1].hist(centered_data[:, i], bins=20, alpha=0.5, label=f'Column {i}')
axes[1].axvline(x=0, color='black', linestyle='--', label='Zero')
axes[1].set_title('Centered Data (mean subtracted)')
axes[1].set_xlabel('Value')
axes[1].legend()

plt.tight_layout()
plt.show()

print(f"Original means: {data.mean(axis=0)}")
print(f"Centered means: {centered_data.mean(axis=0)}")

---

## Section 3: Reshaping and Slicing

Data often needs to be reshaped for ML models. Understanding array dimensions is crucial.

### Exercise 3.1: Reshaping Arrays

Given `arr = np.arange(24)`, create:
1. `shape_6x4` - Reshape to 6 rows, 4 columns
2. `shape_2x3x4` - Reshape to 3D: 2 x 3 x 4
3. `shape_auto` - Reshape to 4 rows, auto-calculate columns (use -1)
4. `flattened` - Flatten back to 1D

In [None]:
# Exercise 3.1: Reshaping Arrays
arr = np.arange(24)
print("Original:", arr)

# YOUR CODE HERE
shape_6x4 = None    # arr.reshape(6, 4)
shape_2x3x4 = None  # arr.reshape(2, 3, 4)
shape_auto = None   # arr.reshape(4, -1)
flattened = None    # shape_6x4.flatten()

print(f"\n6x4 shape: {shape_6x4.shape}")
print(shape_6x4)
print(f"\n2x3x4 shape: {shape_2x3x4.shape}")
print(shape_2x3x4)
print(f"\nAuto shape: {shape_auto.shape}")
print(f"Flattened: {flattened}")

### Exercise 3.2: Array Slicing

Slicing syntax: `arr[start:stop:step]`

Given the matrix below, extract:
1. `first_row` - The first row
2. `last_col` - The last column
3. `center` - The center 2x2 subarray
4. `corners` - The four corner elements
5. `reversed_rows` - Matrix with rows in reverse order
6. `every_other` - Every other row and column

In [None]:
# Exercise 3.2: Array Slicing
matrix = np.arange(16).reshape(4, 4)
print("Matrix:")
print(matrix)

# YOUR CODE HERE
first_row = None     # matrix[0, :] or matrix[0]
last_col = None      # matrix[:, -1]
center = None        # matrix[1:3, 1:3]
corners = None       # matrix[[0, 0, -1, -1], [0, -1, 0, -1]]
reversed_rows = None # matrix[::-1]
every_other = None   # matrix[::2, ::2]

print("\nFirst row:", first_row)
print("Last column:", last_col)
print("\nCenter 2x2:")
print(center)
print("\nCorners:", corners)
print("\nReversed rows:")
print(reversed_rows)
print("\nEvery other:")
print(every_other)

### Exercise 3.3: Stacking and Splitting

Combine and split arrays:
1. `vstacked` - Stack `a` and `b` vertically
2. `hstacked` - Stack `a` and `b` horizontally
3. `concatenated` - Concatenate along axis 0
4. `split_3` - Split `vstacked` into 3 parts along axis 0

In [None]:
# Exercise 3.3: Stacking and Splitting
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[7, 8, 9]])

print("a:")
print(a)
print("\nb:")
print(b)

# YOUR CODE HERE
vstacked = None    # np.vstack([a, b])
hstacked = None    # np.hstack([a, a])  # Use a twice since shapes must match
concatenated = None # np.concatenate([a, b], axis=0)
split_3 = None     # np.split(vstacked, 3, axis=0)

print("\nVertically stacked:")
print(vstacked)
print("\nHorizontally stacked (a with a):")
print(hstacked)
print("\nConcatenated:")
print(concatenated)
print("\nSplit into 3:")
for i, part in enumerate(split_3):
    print(f"  Part {i}: {part}")

---

## Section 4: Boolean Indexing

Boolean indexing lets you filter arrays based on conditions - essential for data cleaning and feature selection.

### Exercise 4.1: Basic Boolean Masks

Given `data = np.array([3, 7, 2, 9, 1, 5, 8, 4, 6])`, create:
1. `greater_5` - Elements greater than 5
2. `between_3_7` - Elements between 3 and 7 (inclusive)
3. `even` - Even elements only
4. `not_in_range` - Elements NOT between 3 and 7

In [None]:
# Exercise 4.1: Basic Boolean Masks
data = np.array([3, 7, 2, 9, 1, 5, 8, 4, 6])
print("Data:", data)

# YOUR CODE HERE
greater_5 = None     # data[data > 5]
between_3_7 = None   # data[(data >= 3) & (data <= 7)]
even = None          # data[data % 2 == 0]
not_in_range = None  # data[(data < 3) | (data > 7)]

print("\nGreater than 5:", greater_5)
print("Between 3 and 7:", between_3_7)
print("Even numbers:", even)
print("Not in [3,7]:", not_in_range)

### Exercise 4.2: Boolean Operations on 2D Arrays

Apply boolean indexing to filter a dataset:
1. `positive_rows` - Rows where all values are positive
2. `has_negative` - Rows that contain at least one negative value
3. `col_sum_positive` - Columns where the sum is positive

In [None]:
# Exercise 4.2: Boolean Operations on 2D Arrays
np.random.seed(42)
matrix = np.random.randn(5, 4).round(2)
print("Matrix:")
print(matrix)

# YOUR CODE HERE
# Hint: np.all(condition, axis=1) checks if all elements in each row satisfy condition
# Hint: np.any(condition, axis=1) checks if any element in each row satisfies condition
positive_rows = None     # matrix[np.all(matrix > 0, axis=1)]
has_negative = None      # matrix[np.any(matrix < 0, axis=1)]
col_sum_positive = None  # matrix[:, matrix.sum(axis=0) > 0]

print("\nRows where ALL values are positive:")
print(positive_rows)
print("\nRows that contain ANY negative:")
print(has_negative)
print("\nColumns where sum > 0:")
print(col_sum_positive)

### Exercise 4.3: np.where() - Conditional Selection

`np.where(condition, x, y)` returns x where condition is True, y otherwise.

1. `clipped` - Replace values < 0 with 0, keep others
2. `categorized` - Convert values: <0 → -1, 0 → 0, >0 → 1
3. `indices` - Get indices where values > 0

In [None]:
# Exercise 4.3: np.where()
values = np.array([-2, -1, 0, 1, 2, 3])
print("Values:", values)

# YOUR CODE HERE
clipped = None      # np.where(values < 0, 0, values)
# For categorized, you might need nested where or np.sign()
categorized = None  # np.sign(values) or np.where(values < 0, -1, np.where(values > 0, 1, 0))
indices = None      # np.where(values > 0)[0]  # Returns tuple, get first element

print("Clipped (negative → 0):", clipped)
print("Categorized (-1, 0, 1):", categorized)
print("Indices where > 0:", indices)

---

## Section 5: Aggregation Functions

Understanding the `axis` parameter is crucial for aggregating data correctly.

### Exercise 5.1: Aggregation with Axis

- `axis=None` (default): Operate on all elements
- `axis=0`: Operate along rows (collapse rows → result per column)
- `axis=1`: Operate along columns (collapse columns → result per row)

Given a matrix representing student scores (rows=students, cols=subjects), calculate:
1. `total_all` - Sum of all scores
2. `mean_per_subject` - Average score per subject (mean of each column)
3. `mean_per_student` - Average score per student (mean of each row)
4. `max_per_subject` - Highest score in each subject
5. `min_per_student` - Lowest score for each student

In [None]:
# Exercise 5.1: Aggregation with Axis
# Rows = students, Columns = subjects (Math, Science, English, History)
scores = np.array([
    [85, 90, 78, 92],  # Student 0
    [76, 88, 95, 70],  # Student 1
    [90, 85, 82, 88],  # Student 2
    [65, 70, 75, 80],  # Student 3
])
subjects = ['Math', 'Science', 'English', 'History']

print("Scores (rows=students, cols=subjects):")
print(scores)

# YOUR CODE HERE
total_all = None          # np.sum(scores)
mean_per_subject = None   # np.mean(scores, axis=0)
mean_per_student = None   # np.mean(scores, axis=1)
max_per_subject = None    # np.max(scores, axis=0)
min_per_student = None    # np.min(scores, axis=1)

print(f"\nTotal of all scores: {total_all}")
print(f"\nMean per subject: {mean_per_subject}")
for subj, score in zip(subjects, mean_per_subject):
    print(f"  {subj}: {score:.1f}")
print(f"\nMean per student: {mean_per_student}")
print(f"Max per subject: {max_per_subject}")
print(f"Min per student: {min_per_student}")

### Exercise 5.2: Finding Indices

Sometimes you need to know WHERE the min/max values are, not just what they are.

1. `best_student_per_subject` - Index of highest-scoring student for each subject
2. `best_subject_per_student` - Index of best subject for each student
3. `overall_best` - (student_idx, subject_idx) of highest score overall

In [None]:
# Exercise 5.2: Finding Indices
print("Scores:")
print(scores)

# YOUR CODE HERE
best_student_per_subject = None  # np.argmax(scores, axis=0)
best_subject_per_student = None  # np.argmax(scores, axis=1)
# For overall: use np.unravel_index(np.argmax(scores), scores.shape)
overall_best = None

print(f"\nBest student per subject: {best_student_per_subject}")
for subj, student_idx in zip(subjects, best_student_per_subject):
    print(f"  {subj}: Student {student_idx} (score: {scores[student_idx, subjects.index(subj)]})")

print(f"\nBest subject per student: {best_subject_per_student}")
for i, subj_idx in enumerate(best_subject_per_student):
    print(f"  Student {i}: {subjects[subj_idx]} (score: {scores[i, subj_idx]})")

print(f"\nOverall best: Student {overall_best[0]}, {subjects[overall_best[1]]} = {scores[overall_best]}")

### Visualization: Score Analysis

In [None]:
# Visualize scores
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Bar chart of average per subject
x = np.arange(len(subjects))
axes[0].bar(x, scores.mean(axis=0), color='steelblue', edgecolor='black')
axes[0].set_xticks(x)
axes[0].set_xticklabels(subjects)
axes[0].set_ylabel('Average Score')
axes[0].set_title('Average Score per Subject')
axes[0].set_ylim(0, 100)

# Heatmap of all scores
im = axes[1].imshow(scores, cmap='YlOrRd', aspect='auto')
axes[1].set_xticks(np.arange(len(subjects)))
axes[1].set_xticklabels(subjects)
axes[1].set_yticks(np.arange(4))
axes[1].set_yticklabels([f'Student {i}' for i in range(4)])
axes[1].set_title('Score Heatmap')
plt.colorbar(im, ax=axes[1], label='Score')

# Add score values to heatmap
for i in range(4):
    for j in range(4):
        axes[1].text(j, i, scores[i, j], ha='center', va='center', color='black')

plt.tight_layout()
plt.show()

---

## Section 6: Linear Algebra Operations

Linear algebra is the foundation of machine learning. Matrix operations are used everywhere!

### Exercise 6.1: Matrix Multiplication

There are multiple ways to multiply matrices:
- `*` - Element-wise multiplication
- `@` or `np.dot()` - Matrix multiplication
- `np.matmul()` - Matrix multiplication (same as @)

Given matrices A (2x3) and B (3x2), compute:
1. `elem_wise` - Element-wise multiplication of A with A (same shape required)
2. `mat_mul` - Matrix multiplication A @ B
3. `dot_product` - Dot product of two vectors

In [None]:
# Exercise 6.1: Matrix Multiplication
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[1, 2], [3, 4], [5, 6]])
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])

print("A (2x3):")
print(A)
print("\nB (3x2):")
print(B)

# YOUR CODE HERE
elem_wise = None    # A * A
mat_mul = None      # A @ B or np.dot(A, B)
dot_product = None  # v1 @ v2 or np.dot(v1, v2)

print("\nElement-wise A*A:")
print(elem_wise)
print(f"\nMatrix multiplication A @ B (shape: {mat_mul.shape}):")
print(mat_mul)
print(f"\nDot product v1 · v2: {dot_product}")

### Exercise 6.2: Transpose and Inverse

1. `transposed` - Transpose of A
2. `inverse` - Inverse of a square matrix
3. `pseudo_inverse` - Pseudo-inverse (works for non-square matrices)
4. `identity_check` - Verify that M @ M_inv = I

In [None]:
# Exercise 6.2: Transpose and Inverse
M = np.array([[4, 7], [2, 6]])

print("Matrix M:")
print(M)

# YOUR CODE HERE
transposed = None       # A.T or np.transpose(A)
inverse = None          # np.linalg.inv(M)
pseudo_inverse = None   # np.linalg.pinv(A)  # Works for non-square
identity_check = None   # M @ inverse

print(f"\nA transposed (shape {A.shape} → {transposed.shape}):")
print(transposed)
print("\nM inverse:")
print(inverse)
print("\nM @ M_inv (should be identity):")
print(identity_check.round(10))  # Round to handle floating point errors

### Exercise 6.3: Eigenvalues and Determinant

These operations are used in PCA, covariance analysis, and more.

1. `determinant` - Determinant of M
2. `eigenvalues, eigenvectors` - Eigendecomposition of M
3. `norm` - Frobenius norm of M

In [None]:
# Exercise 6.3: Eigenvalues and Determinant
M = np.array([[4, 2], [1, 3]])
print("Matrix M:")
print(M)

# YOUR CODE HERE
determinant = None       # np.linalg.det(M)
eigenvalues = None       # np.linalg.eig(M)[0]
eigenvectors = None      # np.linalg.eig(M)[1]
norm = None              # np.linalg.norm(M) or np.linalg.norm(M, 'fro')

print(f"\nDeterminant: {determinant}")
print(f"Eigenvalues: {eigenvalues}")
print("Eigenvectors (columns):")
print(eigenvectors)
print(f"\nFrobenius norm: {norm:.4f}")

### Visualization: Linear Transformation

In [None]:
# Visualize a linear transformation
# Start with a unit square and transform it
square = np.array([[0, 1, 1, 0, 0], 
                   [0, 0, 1, 1, 0]])  # Unit square vertices

# Transformation matrix (rotation + scaling)
theta = np.pi / 6  # 30 degrees
scale = 1.5
transform = scale * np.array([[np.cos(theta), -np.sin(theta)],
                               [np.sin(theta), np.cos(theta)]])

transformed = transform @ square

fig, ax = plt.subplots(figsize=(8, 8))
ax.plot(square[0], square[1], 'b-', linewidth=2, label='Original')
ax.plot(transformed[0], transformed[1], 'r-', linewidth=2, label='Transformed')
ax.scatter([0], [0], color='black', s=100, zorder=5)

# Draw eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(transform)
for i in range(2):
    ev = eigenvectors[:, i] * eigenvalues[i].real
    ax.arrow(0, 0, ev[0], ev[1], head_width=0.1, head_length=0.05, 
             fc='green', ec='green', linewidth=2)

ax.set_xlim(-2, 3)
ax.set_ylim(-1, 3)
ax.set_aspect('equal')
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax.legend()
ax.set_title(f'Linear Transformation: {scale}x scale + {np.degrees(theta):.0f}° rotation')
ax.grid(True, alpha=0.3)
plt.show()

---

## Challenge: Implement Linear Regression with NumPy

Use what you've learned to implement linear regression using the normal equation:

$$\theta = (X^T X)^{-1} X^T y$$

This combines matrix multiplication, transpose, and inverse!

In [None]:
# Challenge: Linear Regression with NumPy
np.random.seed(42)

# Generate synthetic data: y = 3x + 2 + noise
X = np.random.rand(100, 1) * 10
y = 3 * X + 2 + np.random.randn(100, 1) * 2

# Add bias column (ones) to X
X_b = np.c_[np.ones((100, 1)), X]  # Shape: (100, 2)

# YOUR CODE HERE: Implement normal equation
# theta = (X^T X)^(-1) X^T y
theta = None  # np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y

print(f"Learned parameters: intercept = {theta[0, 0]:.2f}, slope = {theta[1, 0]:.2f}")
print(f"True parameters: intercept = 2, slope = 3")

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.5, label='Data')
X_line = np.array([[0], [10]])
X_line_b = np.c_[np.ones((2, 1)), X_line]
y_pred = X_line_b @ theta
plt.plot(X_line, y_pred, 'r-', linewidth=2, label=f'Fit: y = {theta[1,0]:.2f}x + {theta[0,0]:.2f}')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression with NumPy')
plt.legend()
plt.show()

---

## Lab Summary

Congratulations! You've mastered the essential NumPy operations for machine learning:

| Topic | Key Functions |
|-------|---------------|
| **Array Creation** | `np.array()`, `np.zeros()`, `np.ones()`, `np.eye()`, `np.arange()`, `np.linspace()` |
| **Random** | `np.random.rand()`, `np.random.randn()`, `np.random.randint()`, `np.random.choice()` |
| **Broadcasting** | Scalar, row, and column broadcasting with compatible shapes |
| **Reshaping** | `reshape()`, `flatten()`, `T`, `np.vstack()`, `np.hstack()` |
| **Boolean Indexing** | `arr[condition]`, `&`, `|`, `~`, `np.where()` |
| **Aggregation** | `sum()`, `mean()`, `std()`, `min()`, `max()` with `axis` parameter |
| **Finding Indices** | `np.argmax()`, `np.argmin()`, `np.argsort()` |
| **Linear Algebra** | `@`, `np.dot()`, `.T`, `np.linalg.inv()`, `np.linalg.eig()` |

### Next Steps

- **Lab 1B:** Pandas for data manipulation
- **Lab 2:** Machine Learning with PyTorch

---

*Remember to save your work! (Ctrl+S)*