# NumPy Interview Questions

**Module 08 | Notebook 04**

---

## Overview
Common NumPy interview questions with detailed answers. Covers:
- Conceptual questions
- Coding challenges
- Performance optimization
- Debugging scenarios

In [None]:
import numpy as np
np.set_printoptions(precision=3)

---
## Part 1: Conceptual Questions

### Q1: What is the difference between a Python list and a NumPy array?

**Answer:**

| Feature | Python List | NumPy Array |
|---------|------------|-------------|
| Data Type | Mixed types allowed | Homogeneous (one dtype) |
| Memory | Scattered (pointers) | Contiguous block |
| Speed | Slow for math | Fast (vectorized) |
| Size | Dynamic | Fixed after creation |
| Operations | Element by element | Broadcasting supported |
| Functionality | Basic | Rich math/linear algebra |

In [None]:
# Speed comparison
import time

size = 1000000
py_list = list(range(size))
np_array = np.arange(size)

# Sum with Python
start = time.perf_counter()
sum(py_list)
py_time = time.perf_counter() - start

# Sum with NumPy
start = time.perf_counter()
np.sum(np_array)
np_time = time.perf_counter() - start

print(f"Python list: {py_time*1000:.2f}ms")
print(f"NumPy array: {np_time*1000:.4f}ms")
print(f"NumPy is {py_time/np_time:.0f}x faster")

### Q2: Explain views vs copies in NumPy.

**Answer:**

- **View**: Points to same data, no memory copy. Changes affect original.
- **Copy**: New memory allocation. Changes are independent.

**Views created by:**
- Basic slicing: `arr[1:5]`
- Reshape (usually): `arr.reshape(2,3)`
- Transpose: `arr.T`

**Copies created by:**
- Fancy indexing: `arr[[1,3,5]]`
- Boolean indexing: `arr[arr > 0]`
- `.copy()` method
- `.flatten()` (use `.ravel()` for view)

In [None]:
arr = np.arange(10)

# View
view = arr[2:5]
view[0] = 100
print(f"After view modification: {arr}")  # Original changed!

# Copy
arr = np.arange(10)
copy = arr[2:5].copy()
copy[0] = 100
print(f"After copy modification: {arr}")  # Original unchanged

### Q3: What is broadcasting and what are its rules?

**Answer:**

Broadcasting allows operations on arrays with different shapes.

**Rules:**
1. Compare dimensions from right to left
2. Dimensions are compatible if:
   - They are equal, OR
   - One of them is 1
3. Missing dimensions are treated as 1

**Example shapes that broadcast:**
- (3, 4) + (4,) -> (3, 4)
- (3, 1) + (1, 4) -> (3, 4)
- (5, 3, 1) + (3, 4) -> (5, 3, 4)

In [None]:
# Broadcasting examples
a = np.array([[1], [2], [3]])  # Shape (3, 1)
b = np.array([10, 20, 30])     # Shape (3,)

# (3, 1) + (3,) broadcasts to (3, 3)
result = a + b
print(f"Result shape: {result.shape}")
print(result)

### Q4: Difference between `np.array()` and `np.asarray()`?

**Answer:**

- `np.array()`: Always creates a new array (copy)
- `np.asarray()`: Returns input if already an array (no copy)

In [None]:
original = np.array([1, 2, 3])

arr1 = np.array(original)
arr2 = np.asarray(original)

print(f"np.array same object: {arr1 is original}")
print(f"np.asarray same object: {arr2 is original}")

### Q5: What is the difference between `reshape()` and `resize()`?

**Answer:**

| Feature | `reshape()` | `resize()` |
|---------|------------|------------|
| Returns | New array (view if possible) | None (in-place) |
| Size change | Must match original | Can change total size |
| Fill behavior | N/A | Repeats or truncates |

In [None]:
arr = np.arange(12)

# reshape - must keep same number of elements
reshaped = arr.reshape(3, 4)
print(f"Reshaped: {reshaped.shape}")

# resize - can change size
arr2 = np.arange(6)
np.resize(arr2, (3, 4))  # Returns new array with repeated elements
print(f"np.resize(6 elements, (3,4)):\n{np.resize(arr2, (3, 4))}")

---
## Part 2: Coding Challenges

### Q6: Find all local maxima in a 1D array

A local maximum is an element that is greater than both its neighbors.

In [None]:
arr = np.array([1, 3, 2, 4, 1, 5, 2, 3])

# Your solution here


In [None]:
# Solution
arr = np.array([1, 3, 2, 4, 1, 5, 2, 3])

# Compare with neighbors
greater_than_left = arr[1:-1] > arr[:-2]
greater_than_right = arr[1:-1] > arr[2:]
local_max_mask = greater_than_left & greater_than_right

# Get indices (add 1 because we started from index 1)
local_max_indices = np.where(local_max_mask)[0] + 1

print(f"Array: {arr}")
print(f"Local maxima at indices: {local_max_indices}")
print(f"Local maxima values: {arr[local_max_indices]}")

### Q7: Implement one-hot encoding without loops

Convert array of integers to one-hot encoded matrix.

In [None]:
labels = np.array([0, 2, 1, 0, 3, 2])

# Your solution here


In [None]:
# Solution
labels = np.array([0, 2, 1, 0, 3, 2])

n_classes = labels.max() + 1
one_hot = np.zeros((len(labels), n_classes))
one_hot[np.arange(len(labels)), labels] = 1

print(f"Labels: {labels}")
print(f"One-hot:\n{one_hot.astype(int)}")

# Alternative using eye
one_hot_alt = np.eye(n_classes)[labels]
print(f"\nUsing np.eye:\n{one_hot_alt.astype(int)}")

### Q8: Find the most frequent value in an array

Return the mode (most common element).

In [None]:
arr = np.array([1, 2, 2, 3, 3, 3, 4, 4])

# Your solution here


In [None]:
# Solution
arr = np.array([1, 2, 2, 3, 3, 3, 4, 4])

unique, counts = np.unique(arr, return_counts=True)
mode = unique[np.argmax(counts)]

print(f"Array: {arr}")
print(f"Mode: {mode} (appears {counts.max()} times)")

### Q9: Normalize each row of a matrix to sum to 1

Make each row a probability distribution.

In [None]:
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)

# Your solution here


In [None]:
# Solution
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)

# Use keepdims for broadcasting
row_sums = matrix.sum(axis=1, keepdims=True)
normalized = matrix / row_sums

print(f"Normalized:\n{normalized}")
print(f"Row sums: {normalized.sum(axis=1)}")

### Q10: Compute the rolling/moving average

Compute moving average with window size k.

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=float)
k = 3

# Your solution here


In [None]:
# Solution
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=float)
k = 3

# Using convolution
kernel = np.ones(k) / k
moving_avg = np.convolve(arr, kernel, mode='valid')

print(f"Array: {arr}")
print(f"Moving average (k={k}): {moving_avg}")

# Alternative using cumsum
cumsum = np.cumsum(arr)
cumsum[k:] = cumsum[k:] - cumsum[:-k]
moving_avg_2 = cumsum[k-1:] / k
print(f"Using cumsum: {moving_avg_2}")

---
## Part 3: Performance Questions

### Q11: Why is this code slow? How to fix it?

In [None]:
# Slow code
def slow_euclidean(a, b):
    result = 0
    for i in range(len(a)):
        result += (a[i] - b[i]) ** 2
    return np.sqrt(result)

# Test
a = np.random.rand(10000)
b = np.random.rand(10000)

In [None]:
# Solution: Vectorize!
import time

def fast_euclidean(a, b):
    return np.sqrt(np.sum((a - b) ** 2))

# Even faster: use np.linalg.norm
def fastest_euclidean(a, b):
    return np.linalg.norm(a - b)

a = np.random.rand(10000)
b = np.random.rand(10000)

start = time.perf_counter()
for _ in range(100):
    slow_euclidean(a, b)
slow_time = time.perf_counter() - start

start = time.perf_counter()
for _ in range(100):
    fast_euclidean(a, b)
fast_time = time.perf_counter() - start

start = time.perf_counter()
for _ in range(100):
    fastest_euclidean(a, b)
fastest_time = time.perf_counter() - start

print(f"Slow (loop): {slow_time*1000:.2f}ms")
print(f"Fast (vectorized): {fast_time*1000:.4f}ms")
print(f"Fastest (linalg.norm): {fastest_time*1000:.4f}ms")
print(f"Speedup: {slow_time/fastest_time:.0f}x")

### Q12: What's wrong with this code?

In [None]:
# Buggy code
arr = np.arange(10)
result = []
for i in range(1000):
    result = np.append(result, arr[i % 10])

# What's the problem?

In [None]:
# Solution
print("""
Problem: np.append creates a new array each iteration!
This is O(n^2) complexity.

Solution 1: Use Python list, convert at end
Solution 2: Preallocate array
Solution 3: Vectorize the operation
""")

import time

# Bad approach
arr = np.arange(10)
start = time.perf_counter()
result = []
for i in range(1000):
    result = np.append(result, arr[i % 10])
bad_time = time.perf_counter() - start

# Better: use list
start = time.perf_counter()
result = []
for i in range(1000):
    result.append(arr[i % 10])
result = np.array(result)
list_time = time.perf_counter() - start

# Best: preallocate
start = time.perf_counter()
result = np.empty(1000)
for i in range(1000):
    result[i] = arr[i % 10]
prealloc_time = time.perf_counter() - start

# Even better: vectorize
start = time.perf_counter()
result = arr[np.arange(1000) % 10]
vec_time = time.perf_counter() - start

print(f"Bad (np.append): {bad_time*1000:.2f}ms")
print(f"List + convert: {list_time*1000:.4f}ms")
print(f"Preallocate: {prealloc_time*1000:.4f}ms")
print(f"Vectorized: {vec_time*1000:.5f}ms")

---
## Part 4: Debug These Errors

### Q13: Broadcasting error

In [None]:
# This raises an error - why and how to fix?
a = np.arange(12).reshape(3, 4)
b = np.array([1, 2, 3])

try:
    result = a + b  # Error!
except ValueError as e:
    print(f"Error: {e}")

In [None]:
# Solution
print("""
Problem: a is (3,4), b is (3,)
Broadcasting compares from right: 4 vs 3 - not compatible!

Solutions:
1. If b should be added to each row: reshape b to (1, 4)
2. If b should be added to each column: reshape b to (3, 1)
""")

a = np.arange(12).reshape(3, 4)
b = np.array([1, 2, 3])

# Add to columns
result = a + b[:, np.newaxis]
print(f"Add to columns:\n{result}")

### Q14: Why does this not modify the array?

In [None]:
arr = np.arange(10)

# Try to double the even elements
evens = arr[arr % 2 == 0]
evens *= 2

print(f"Result: {arr}")  # Not modified!

In [None]:
# Solution
print("""
Problem: Boolean indexing creates a COPY, not a view!
Modifying 'evens' doesn't affect 'arr'.

Solution: Use boolean indexing directly on left side.
""")

arr = np.arange(10)
arr[arr % 2 == 0] *= 2
print(f"Correct result: {arr}")

---
## Part 5: Quick Fire Questions

In [None]:
# Q15: How to get the n largest elements?
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5])
n = 3

largest = arr[np.argsort(arr)[-n:]]
# Or
largest_sorted = np.sort(arr)[-n:]
print(f"3 largest: {largest_sorted}")

In [None]:
# Q16: How to find common elements between two arrays?
a = np.array([1, 2, 3, 4, 5])
b = np.array([3, 4, 5, 6, 7])

common = np.intersect1d(a, b)
print(f"Common elements: {common}")

In [None]:
# Q17: How to swap two rows of a matrix?
matrix = np.arange(12).reshape(3, 4)
print(f"Before:\n{matrix}")

matrix[[0, 2]] = matrix[[2, 0]]
print(f"After swapping rows 0 and 2:\n{matrix}")

In [None]:
# Q18: How to get unique rows from a 2D array?
arr = np.array([[1, 2], [3, 4], [1, 2], [5, 6], [3, 4]])

unique_rows = np.unique(arr, axis=0)
print(f"Unique rows:\n{unique_rows}")

In [None]:
# Q19: How to check if array contains NaN?
arr = np.array([1, 2, np.nan, 4, 5])

has_nan = np.isnan(arr).any()
nan_indices = np.where(np.isnan(arr))[0]

print(f"Has NaN: {has_nan}")
print(f"NaN at indices: {nan_indices}")

In [None]:
# Q20: How to replace NaN with mean?
arr = np.array([1.0, 2.0, np.nan, 4.0, np.nan, 6.0])

mean_val = np.nanmean(arr)
arr[np.isnan(arr)] = mean_val

print(f"After replacing NaN with mean (3.25): {arr}")

---
## Summary: Key Interview Topics

**Must Know:**
1. Views vs copies
2. Broadcasting rules
3. Vectorization benefits
4. Basic indexing vs fancy indexing
5. Common pitfalls (np.append, boolean copy)

**Should Know:**
1. Memory layout (C vs F order)
2. Strides and contiguity
3. Structured arrays
4. np.einsum basics
5. Performance optimization patterns

**Nice to Know:**
1. memmap for large files
2. Custom ufuncs
3. FFT and signal processing
4. Linear algebra operations
5. Random number generation details

---
## Congratulations!

You have completed the NumPy Revision Repository!

**Modules Completed:**
1. NumPy Basics
2. Array Manipulation
3. Mathematical Operations
4. Broadcasting and Vectorization
5. Advanced Indexing
6. File I/O
7. Performance Optimization
8. Practice Problems

**Total: 28 notebooks covering NumPy from basics to advanced concepts!**