***PHASE 5: Random Module, File Handling & Performance***

**Why This Phase Matters**

In real-world work you will:

- Simulate data

- Work with large datasets

- Save intermediate results

- Optimize slow code

**NumPy Random Module**

Why NumPy Random Exists ?

- Generate synthetic data

- Simulations

- Testing ML pipelines

- Statistical experiment

In [None]:
import numpy as np


In [None]:
import numpy as np

np.random.rand(3,3)


In [None]:
np.random.randint(1, 100, size=5)


In [None]:
np.random.seed(0)
np.random.randint(1, 10,size=(3,3))


**What Is a Seed?**

*A seed fixes the random number generator’s starting point.*

In [None]:
np.random.seed(0)
np.random.randint(1, 10,size=(3,3))

**Why Seeding Matters**

- Reproducible results

- Debugging

- Scientific experiments

- Model comparison

**Interview Answer:**

*seed() ensures reproducibility by initializing the random number generator to a fixed state.*

In [None]:
np.random.rand(10)


**Random Distributions**

In [None]:
# Uniform Distribution
scores = np.random.randint(40, 100, size=50)

scaled = (scores - scores.min()) / (scores.max() - scores.min())
print(scaled)


In [None]:
# Normal Distribution

np.random.randn(5)

# Mean ≈ 0
# Std ≈ 1


**Interview**

*randn() is commonly used to initialize neural network weights.*

**Saving & Loading NumPy Arrays**

*Why Not CSV?*

- CSV is slow

- Loses dtype

- Larger size

- *NumPy’s solution: .npy and .npz*

In [None]:
# Save Single Array
array = np.random.randn(10)
np.save("data.npy", array)


In [None]:
# Load Array

loaded = np.load("data.npy")
print(loaded)

In [None]:
# Save Multiple Arrays


rand = np.random.randint(1, 100, (5, 5))

np.save("data.npy", rand)


In [None]:
# Save multiple arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

np.savez("data.npz", arr1=a, arr2=b)




In [None]:
data = np.load("data.npz")
print(data["arr1"])
print(data["arr2"])


**Vectorization vs Loops**

*What is Vectorization?*

*Vectorization means performing operations on entire arrays at once instead of using loops.*

In [None]:
#Using Loop (slow)

arr = np.arange(1, 6)
result = []

for i in arr:
    result.append(i * 2)

print(result)


In [None]:
# Using Vectorization (Fast)

arr = np.arange(1, 6)
result = arr * 2
print(result)



*Why faster?*



- No Python interpreter per iteration

- Uses optimized C loops

- Better CPU cache usage

**Performance Comparison**

| Approach         | Speed  | Readability |
| ---------------- | ------ | ----------- |
| Python loop      | ❌ Slow | ❌ Verbose   |
| NumPy vectorized | ✅ Fast | ✅ Clean     |


**Interview Answer:**

*Vectorization avoids Python loop overhead and leverages low-level optimizations.*

**Broadcasting + Vectorization**

In [None]:
scores = np.random.randint(40, 100, size=50)

scaled = (scores - scores.min()) / (scores.max() - scores.min())
print (scaled)

In [None]:


# Seed
np.random.seed(0)

# Generate random array
rand = np.random.randint(1, 100, (5, 5))
print("Random Array:\n", rand)

# Save array
np.save("data.npy", rand)

# Load array
loaded = np.load("data.npy")
print("Loaded Array:\n", loaded)

# Vectorization
result = loaded * 2
print("Vectorized Result:\n", result)


**Interview Questions**

Q: Why is .npy better than .csv?
- Faster, smaller, preserves dtype and shape.

Q: What happens if seed is not set?
- Results differ every run → non-reproducible.

Q: Why is vectorization faster than loops?
- Eliminates Python overhead and uses optimized C implementations.

Q: When would you still use loops?
- Complex logic that cannot be vectorized cleanly.