Setup

In [None]:
import numpy as np
import time

Section 1 — ndarray Fundamentals

Task 1.1: Array Creation & Shapes

In [2]:
# TODO:
# Create a 1D array with values 0 to 99 (no loops)

# HINT:
# - Use np.arange

arr_1d = np.arange(100)
# print(arr_1d)


In [3]:
# TODO:
# Reshape arr_1d into a (10, 10) array

# HINT:
# - reshape does NOT copy data

arr_2d = arr_1d.reshape(10, 10)
# print(arr_2d)


In [4]:
# TODO:
# Create a 3D array of shape (4, 5, 3)

# HINT:
# - Total elements must match
arr_3d = np.arange(4*5*3).reshape(4,5,3)
# print(arr_3d)
arr_3d.shape


(4, 5, 3)

**Explain:**
- What does `.shape` represent?
  - In NumPy, .shape tells you the dimensions of an array — that is, how many elements it has along each axis.
- Why does contiguous memory matter?
  - It matters because it allows for faster CPU access (cache friendly).


Section 1.2 — dtype & Memory

In [5]:
# TODO:
# Create two arrays with same values but different dtypes

arr = np.arange(1000)
# arr_int = np.array([1, 2, 3, 4], dtype=np.int32)
# arr_float = np.array([1, 2, 3, 4], dtype=np.float32)
arr_int = arr.astype(np.int64)
arr_float = arr.astype(np.float64)
print(arr_int.dtype, arr_float.dtype)


int64 float64


In [6]:
# TODO:
# Compare memory usage

# HINT:
# - Use .nbytes
print(arr_int.nbytes, arr_float.nbytes)


8000 8000


**Interview Question:**  
Why does dtype selection matter in large ML pipelines?
  - In big ML pipelines it matters because:
	- Memory: smaller dtypes use less RAM, so you can fit bigger datasets/batches/models.
	- Speed: smaller/optimized dtypes often run faster on GPUs.
	- Accuracy: too-small precision can make numbers “less exact” and hurt training.


Section 2 — Indexing, Views & Copies

Task 2.1: Views vs Copies

In [7]:
# TODO:
# Create a 2D array and slice every alternate row

# HINT:
# - Use slicing, not fancy indexing

A = np.arange(16).reshape(4, 4)
A_slice = A[::2, :]
print(A)
print(A_slice)



[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
[[ 0  1  2  3]
 [ 8  9 10 11]]


In [8]:
# TODO:
# Modify A_slice and observe A
A = np.arange(16).reshape(4, 4)
# A_slice = A[::2, :].copy()
A_slice = A[::2, :]
A_slice[:] = -1

A_slice2 = A[::2, :].copy()
A_slice2[:] = -1

print(A)
print(A_slice2)



[[-1 -1 -1 -1]
 [ 4  5  6  7]
 [-1 -1 -1 -1]
 [12 13 14 15]]
[[-1 -1 -1 -1]
 [-1 -1 -1 -1]]


Explain:
- Why did the original array change (or not)?
  - The original array changed because A_slice is a view made by slicing, so it shares the same memory as A. Modifying A_slice modifies A (for the sliced rows).
  - A_slice2 (the one made with .copy()) is a separate copy of the data, not a view.
	  - A_slice2 = A[::2, :].copy() allocates new memory and duplicates those rows.
	  - If you modify A_slice2, A will not change, because they no longer share memory.


Section 2.2 — Boolean Masking

In [9]:
# TODO:
# Create random array of size 1000

# X = np.random.rand(1000)
# X = np.random.randint(0, 500, size=1000)
rng = np.random.default_rng(0)
X = rng.standard_normal(1000)
# print(X)
# y


In [10]:
# TODO:
# Extract values greater than mean

# HINT:
# - Mean first
# - Boolean mask
m = X.mean()
val_grt_m = X[X>m]
# print(val_grt_m)




In [11]:
# TODO:
# Replace negative values with 0 (no loops)
X_copy = X.copy()
X_copy[X_copy < 0] = 0

print("mean: ", m)
print("value greater than m : ", val_grt_m.size)
print("negative values : ", (X_copy < 0).sum())

mean:  -0.04802827676298692
value greater than m :  488
negative values :  0


Section 3 — Broadcasting

Task 3.1: Broadcasting Rules

In [12]:
# TODO:
# Create A (1000, 50) and b (50,)

# A = np.arange(1000 * 50).reshape(1000, 50)
# b = np.arange(50,)

rng = np.random.default_rng(1)
A = rng.standard_normal((1000, 50))
b = rng.standard_normal((50,))


In [13]:
# TODO:
# Add b to each row of A

# HINT:
# - No reshape required
Ab = A + b
# print(AB)

In [14]:
# TODO:
# Normalize each row of A

# HINT:
# - Axis matters
# - Keep dimensions in mind
# A_norm = A / A.sum(axis=1, keepdims=True)
row_mean = A.mean(axis=1, keepdims=True)
row_std = A.std(axis=1, keepdims=True)
A_norm = (A - row_mean) / row_std
# print(A_norm)



Explain broadcasting step-by-step.


Section 3.2 — Broadcasting Trap

In [15]:
# TODO:
# Intentionally trigger a broadcasting error
# Then fix it
col = np.arange(1000)
try:
  _ = A - col
except ValueError as e:
  print("Expected broadcasting error: ", e)

col2 = col.reshape(-1, 1)
A_modified = A - col2
A_modified.shape





Expected broadcasting error:  operands could not be broadcast together with shapes (1000,50) (1000,) 


(1000, 50)

What was wrong with the original shapes?
  - The shapes of the two array did not match; one was (1000, 50) amd the other was (1000,).
  - After modifying the array and setting (1000,) -> (1000, 1) the broadcasting step was executed.


Section 4 — Vectorization vs Loops

Task 4.1: Loop vs Vectorized

In [16]:
# TODO:
# Create large array X of size 1,000,000

rng = np.random.default_rng(2)
X = rng.standard_normal(1_000_000)



In [17]:
# TODO:
# Normalize using Python loop

# HINT:
# - Time it

X_mean = X.mean()
sigma = X.std()
t0 = time.time()
out_loop = np.empty_like(X)
for i in range(X.shape[0]):
  out_loop[i] = (X[i] - X_mean) / sigma
total_time = time.time() - t0
print(round(total_time, 4))



0.7518


In [18]:
# TODO:
# Normalize using vectorization
t1 = time.time()
out_vec = (X - X_mean) / sigma
vec_time = time.time() - t1
round(vec_time, 4)
print(round(total_time/max(vec_time, 1e-12), 1))
print('allclose : ', np.allclose(out_loop, out_vec))



71.2
allclose :  True


Why is vectorization faster?
- Runs computations in optimized C code instead of Python loops
- Avoids Python interpreter overhead per iteration
- Uses CPU optimizations (SIMD, cache efficiency)
- Processes many elements in one operation


Task 4.2: Pairwise Distance (FAANG Classic)

In [19]:
# TODO:
# Compute pairwise Euclidean distance matrix without loops

# HINT:
# - Use (x - y)^2 expansion
# - Broadcasting is key

def pairwise_distance(X):
    ...
    sq_dist = np.sum(X * X, axis = 1, keepdims=True)
    total_dist = sq_dist + sq_dist.T - 2 * (X @ X.T)
    total_dist = np.maximum(total_dist, 0.0)
    return np.sqrt(total_dist)

X = np.random.default_rng(3)
Y = X.standard_normal((200, 10))
Distance = pairwise_distance(Y)
print(Distance.shape)
# print(Distance)
print('diag ~ 0 : ', np.allclose(np.diag(Distance), 0, atol = 1e-7))
print('symmetric : ', np.allclose(Distance, Distance.T, atol = 1e-7))


(200, 200)
diag ~ 0 :  True
symmetric :  True


Section 5 — Numerical Stability

Task 5.1: Softmax

In [20]:
# TODO:
# Implement naive softmax

def softmax_naive(X):
  exp_val = np.exp(X)
  return exp_val/exp_val.sum(axis=1, keepdims=True)


X = np.array([[1000.0, 1001.0, 1002.0]])
try:
  softmax_naive(X)
except FloatingPointError as e:
  print('overflow : ', e)


  exp_val = np.exp(X)
  return exp_val/exp_val.sum(axis=1, keepdims=True)


In [21]:
from enum import KEEP
from contextlib import AsyncExitStack
# TODO:
# Fix numerical instability

# HINT:
# - Subtract max per row

def softmax_stable(X):
  new_X = X - X.max(axis=1, keepdims=True)
  # print(new_X)
  exp_val2 = np.exp(new_X)
  return exp_val2/exp_val2.sum(axis = 1, keepdims=True)

print('Stable : ', softmax_stable(X))
print('Stable : ', softmax_stable(X).sum(axis = 1))

Stable :  [[0.09003057 0.24472847 0.66524096]]
Stable :  [1.]


Why does subtracting max work?
- Subtracting the max works because softmax is shift-invariant.
- So subtracting any constant c (for instance, the row-wise max) does not change the output.


Section 6 — Linear Algebra

Task 6.1: Matrix Multiplication

In [22]:
# TODO:
# Try valid and invalid matrix multiplications
A = np.random.default_rng(3).standard_normal((2,4))
B = np.random.default_rng(4).standard_normal((4, 5))
matrix_result = A @ B
print(matrix_result)
print(matrix_result.shape)

try:
  _ = A @ A
except ValueError as e:
  print('Invalid matrix multiplication : ', e)
print('A @ B == dot.product =>', np.allclose(np.dot(A, B), matrix_result))
print('A @ B == matmul => ', np.allclose(np.matmul(A, B), matrix_result))




[[-2.49757386  2.98319337  2.52249303  5.85602501 -4.09211309]
 [-0.7017924  -2.52493983 -1.68029009 -0.90642378  3.91117743]]
(2, 5)
Invalid matrix multiplication :  matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 4)
A @ B == dot.product => True
A @ B == matmul =>  True


Explain difference between dot, @, and matmul.

  - @ and np.matmul(): basically the same thing (matrix multiplication). a @ b == np.matmul(a, b).
  - np.dot(): older, more “general” rule:
	  - 1D·1D -> scalar dot product (same result as @)
	  - 2D·2D -> matrix multiply (same as @)
    - Greater than 2D -> uses different axis rules than matmul (can give different shapes/results)


Task 6.2: Solving Linear Systems

In [23]:
# TODO:
# Solve Ax = b and verify solution
rng = np.random.default_rng(6)
A = rng.standard_normal((5, 5))
b = rng.standard_normal((5, ))
X = np.linalg.solve(A, b)
resid = np.linalg.norm(A @ X - b)
print(X)
print('residual : ',resid)


[-3.36038187  0.99835524  0.56598579  7.10732223  2.42696993]
residual :  1.7702748954631059e-15


Section 7 — Performance & Memory

Task 7.1: In-Place Operations

In [24]:
# TODO:
# Compare in-place vs out-of-place operations
X = np.random.default_rng(8).standard_normal(2_000_000)
Y = X.copy()
t0 = time.time()
Z = X + 1.0
out_time = time.time() - t0

t1 = time.time()
Y += 1.0
out_time2 = time.time() - t1

print(out_time)
print(out_time2)




0.016837120056152344
0.0040743350982666016


Task 7.2: Strides

In [25]:
# TODO:
# Inspect array strides and explain
A = np.arange(24).reshape(6, 4)
B = A[:, ::2]

print('A shape :', A.shape, '-----  A stride : ', A.strides)
print('B shape :', B.shape, '-----  B stride : ', B.strides)



A shape : (6, 4) -----  A stride :  (32, 8)
B shape : (6, 2) -----  B stride :  (32, 16)


Section 8 — Mini Case Study

In [26]:
# TODO:
# Given X (10000, 100):
rng = np.random.default_rng(8)
X = rng.standard_normal((10_000, 100))

# - Normalize features
X_mean = X.mean(axis = 0, keepdims=True)
X_std = X.std(axis=0, keepdims=True)
Xn = (X - X_mean)/ X_std
# print(Xn)

# - Compute covariance
Covariance = (Xn.T @ Xn) / (Xn.shape[0] - 1)

# - Extract top-k eigenvectors
eigen_val, eigen_vec = np.linalg.eigh(Covariance)
# print(eigen_vec.shape)
# print(eigen_val.shape)
k = 10
top_val = np.argsort(eigen_val)[-k:][::-1]
print('Top 10 elements : ', top_val)



Top 10 elements :  [99 98 97 96 95 94 93 92 91 90]


Explain each step and its ML relevance.


1. Where did NumPy save memory?
  - By using contiguous arrays and views
2. Where did it avoid Python overhead?
  - In vectorized ops: mean, std, @ (matmul), and np.linalg.eigh — all run in compiled C, not in python loop.
3. Which operation would break at scale?
  - Eigen-decomposition (np.linalg.eigh): infeasible for very large feature dimensions.
