
# Lesson 8 (Mini): Scaling, Clipping, and Composite Scores — Student Grades

**Goal (~15 minutes):** Practice NumPy fundamentals for ML-style preprocessing using a friendly dataset (Joe, Ted, Sue).  
You’ll compute per-student averages, derive a **study_hours** feature using a formula + noise, and build a **final_score** with clipping.

---


## 1) Setup — Students & Grades

In [None]:

import numpy as np

# Students and their 4 test grades
students = np.array(["Joe","Ted","Sue"])
grades = np.array([
    [85, 92, 78, 90],  # Joe
    [88, 76, 85, 83],  # Ted
    [90, 95, 92, 88]   # Sue
], dtype=float)

print("Students:", students.tolist())
print("Grades matrix:\n", grades)
print("Shape:", grades.shape)  # (3 students, 4 tests)



## 2) Average Grade Per Student (axis=1)

- `axis=1` means "go **across** each row" (per student).
- This gives us the baseline academic performance for each student.


In [None]:

avg = grades.mean(axis=1)
for name, a in zip(students, avg):
    print(f"{name}'s average grade: {a:.2f}")



## 3) Derive `study_hours` with Formula + Jitter + `np.clip`

We mimic a feature like `years_experience`:

\[
\text{baseline\_hours} = \left\lfloor \frac{\text{avg} - 70}{5} \right\rfloor
\]

Then add small random jitter in \{-1, 0, +1\}, and **clip** at 0 so it never goes negative:


In [None]:

# Optional: fix randomness for reproducibility while you learn
np.random.seed(42)

baseline_hours = np.floor((avg - 70) / 5)

# Jitter: integers in [-1, 1]
jitter = np.random.randint(-1, 2, size=avg.shape)

study_hours_unclipped = baseline_hours + jitter
study_hours = np.clip(study_hours_unclipped, 0, None)  # minimum 0

print("Averages:          ", np.round(avg, 2))
print("Baseline hours:    ", baseline_hours.astype(int))
print("Jitter (-1..+1):   ", jitter)
print("Unclipped hours:   ", study_hours_unclipped)
print("Clipped study_hours", study_hours)



**Why `np.clip`?**  
- `np.clip(x, 0, None)` enforces a **lower bound** of 0 (no negatives).
- `None` for the upper bound means "no maximum".



## 4) Composite `final_score` (Analogous to `credit_rating`)

We combine multiple parts:
- Base = 50  
- Weighted average grade: `0.4 * avg`  
- Attendance (0–10): stronger attendance helps: `0.8 * attendance`  
- Study hours: `2 * study_hours`  
- Random noise in `[-3, +3]`  
Finally, **clip** to the valid range `[0, 100]`.


In [None]:

# Deterministic attendance for the demo (you can change these)
attendance = np.array([8, 7, 9], dtype=float)

# Noise in [-3, 3]
noise = np.random.randint(-3, 4, size=avg.shape).astype(float)

final_score_raw = 50 + 0.4*avg + 0.8*attendance + 2*study_hours + noise
final_score = np.clip(final_score_raw, 0, 100)

for i, name in enumerate(students):
    print(f"{name:>3} | avg={avg[i]:5.2f} | study_hours={int(study_hours[i])} | "
          f"att={attendance[i]:.0f} | noise={int(noise[i]):+} | raw={final_score_raw[i]:6.2f} | final={final_score[i]:6.2f}")



## 5) (Optional) Simple Visualization

A quick look at `final_score` as a sanity check.


In [None]:

import matplotlib.pyplot as plt

plt.figure()
plt.bar(students, final_score)
plt.title("Final Score (Clipped to [0, 100])")
plt.xlabel("Student")
plt.ylabel("Final Score")
plt.show()



## 6) Exercises (≈5 minutes)

1. Change `jitter` to always be `[-2, -2, -2]`. Does `np.clip` keep `study_hours` from going negative?  
2. Increase the weight on attendance from `0.8` to `1.5`. Who benefits most? Why?  
3. Set `noise = 0` for everyone. What happens to `final_score_raw` vs `final_score`?  
4. Change Joe’s first grade to 20 (a low outlier) and re-run cells. How do `avg`, `study_hours`, and `final_score` change?  
5. Try an **upper bound** clip on `study_hours` (e.g., `np.clip(..., 0, 8)`) to limit max hours. What changes?



## Appendix — Handy One-Liners

- Per-student averages: `avg = grades.mean(axis=1)`  
- Lower-bound clip: `np.clip(x, 0, None)`  
- Two-sided clip: `np.clip(x, 0, 100)`  
- Reproducible randomness: `np.random.seed(42)`  
- Vectorized arithmetic: works elementwise on NumPy arrays.
