# Lab 3: Production-Grade Testing Checklist

**Class 3 — Non-Functional Testing & AI Security**

---

## Lab Overview

In this lab, you will produce **four deliverables** that together form a production testing checklist for the ADAS model API:

| Deliverable | Description |
|-------------|-------------|
| **1. Performance Benchmark Report** | DataFrame + chart: latency, throughput, memory by batch size |
| **2. Security Checklist** | 8-item security assessment with pass/fail status |
| **3. Automated Test Scripts** | Three pytest test classes (API response, accuracy, latency) |
| **4. Quantized Model Comparison** | Side-by-side comparison: FP32 vs INT8 (size, speed, accuracy) |

---

## Instructions

1. Complete all cells marked with `# TODO:`
2. Run the entire notebook from top to bottom after completing each section
3. Review the **Final Checklist** at the end and fill in the pass/fail column
4. Save the notebook — it IS your submission

**Reference materials:**
- Part 1: `Part_1_Performance_Benchmarking.ipynb`
- Part 2: `Part_2_Security_Hardening.ipynb`
- Part 3: `Part_3_Automated_Testing.ipynb`

---

> **Tip**: Don't just copy-paste from the Part notebooks. Understand each piece before you write it.

In [None]:
# ── Setup (provided — do not modify) ────────────────────────────────────────
import os, io, sys, time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import psutil
from pathlib import Path
from PIL import Image
from datetime import datetime, timedelta
from typing import Optional, Dict
from collections import defaultdict

import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision.models import resnet18
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

from fastapi import FastAPI, File, UploadFile, HTTPException, Depends, Request
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
from fastapi.testclient import TestClient
from pydantic import BaseModel
from jose import JWTError, jwt
import bcrypt as _bcrypt

np.random.seed(42)
torch.manual_seed(42)

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
CLASS_NAMES = ['animal', 'name_board', 'other_vehicle', 'pedestrian',
               'pothole', 'road_sign', 'speed_breaker']
NUM_CLASSES = len(CLASS_NAMES)
DATASET_PATH = r'C:\Users\Lucifer\python_workspace\BITS\AI_Quality_Engineering\dataset'
TEST_PATH = os.path.join(DATASET_PATH, 'test')

# Model (random weights for demo)
class ADASModel(nn.Module):
    def __init__(self, num_classes=7):
        super().__init__()
        self.resnet = resnet18(weights=None)
        self.resnet.fc = nn.Linear(512, num_classes)
    def forward(self, x): return self.resnet(x)

transform = transforms.Compose([
    transforms.Resize((128, 128)), transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
model = ADASModel(NUM_CLASSES).to(DEVICE)
model.eval()

def create_test_image(size=(100, 100), color='red', fmt='PNG'):
    img = Image.new('RGB', size, color=color)
    buf = io.BytesIO()
    img.save(buf, format=fmt)
    buf.seek(0)
    return buf, f'test.{fmt.lower()}'

print(f'Setup complete. Device: {DEVICE}')

---
## Deliverable 1: Performance Benchmark Report

Complete the benchmarking functions and produce a DataFrame + chart.

In [None]:
# ── Deliverable 1: Performance Benchmarking ──────────────────────────────────

def measure_latency(model, batch_tensor, n_warmup=5, n_runs=50):
    """
    Measure inference latency in milliseconds.
    Returns dict with p50, p95, p99, mean latency.
    """
    # TODO: Implement warmup loop (n_warmup runs, no timing)
    # Hint: use torch.no_grad() and call model(batch_tensor)
    # YOUR CODE HERE
    pass

    # TODO: Implement timed measurement loop (n_runs runs, record latencies)
    # Hint: use time.perf_counter() before and after each model call
    # Convert to milliseconds (* 1000)
    # YOUR CODE HERE
    latencies = []

    # TODO: Return dict with p50, p95, p99, mean in ms
    # Hint: use np.percentile(latencies, 50) etc.
    return {
        'p50_ms':  0.0,  # TODO: replace with np.percentile
        'p95_ms':  0.0,  # TODO: replace
        'p99_ms':  0.0,  # TODO: replace
        'mean_ms': 0.0,  # TODO: replace
    }


def measure_throughput(model, batch_size, n_runs=20):
    """
    Measure throughput in images/second.
    Formula: batch_size / mean_time_per_batch_in_seconds
    """
    dummy = torch.randn(batch_size, 3, 128, 128).to(DEVICE)
    # TODO: Time n_runs model calls, compute mean time, return throughput
    # YOUR CODE HERE
    return 0.0  # TODO: return batch_size / mean_time


def get_model_size_mb(model, path='_tmp.pth'):
    """Save model weights to temp file, measure size, delete."""
    # TODO: Save model.state_dict() to path, measure file size, remove file, return MB
    # YOUR CODE HERE
    return 0.0  # TODO: return size in MB


# TODO: Run the benchmark loop for batch_sizes = [1, 2, 4, 8, 16, 32]
# For each batch size, measure: latency (p50/p95/p99), throughput, memory delta
# Store results in a list of dicts, then convert to pd.DataFrame
# YOUR CODE HERE
BATCH_SIZES = [1, 2, 4, 8, 16, 32]
benchmark_results = []

# --- your benchmark loop here ---

df_bench = pd.DataFrame(benchmark_results)  # should have columns: batch_size, p50_ms, p95_ms, p99_ms, throughput_img_per_s
print('Benchmark DataFrame:')
print(df_bench)

In [None]:
# TODO: Create a 2-panel visualization:
#   Panel 1: Latency (p50, p95, p99) vs batch size — line chart
#   Panel 2: Throughput (img/s) vs batch size — line chart
# Save as 'lab3_benchmark.png'
# YOUR CODE HERE

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
fig.suptitle('Lab 3 — Benchmark Report', fontsize=14)

# TODO: Plot latency chart in axes[0]
# TODO: Plot throughput chart in axes[1]

plt.tight_layout()
plt.savefig('lab3_benchmark.png', dpi=150, bbox_inches='tight')
plt.show()
print('Saved: lab3_benchmark.png')

In [None]:
# ── Quantized Model Comparison (Deliverable 4 prereq) ────────────────────────

# TODO: Apply dynamic quantization to the model
# Hint: torch.quantization.quantize_dynamic(model, {nn.Linear}, dtype=torch.qint8)
# YOUR CODE HERE
model_quantized = None  # TODO: replace

# TODO: Compare sizes
size_original  = get_model_size_mb(model)
size_quantized = get_model_size_mb(model_quantized) if model_quantized else 0.0

# TODO: Compare latency (use batch size 1 on CPU)
model_cpu = model.to('cpu')
dummy_cpu = torch.randn(1, 3, 128, 128)
lat_orig  = measure_latency(model_cpu, dummy_cpu)
lat_quant = {'p50_ms': 0.0, 'p95_ms': 0.0, 'p99_ms': 0.0, 'mean_ms': 0.0}  # TODO: measure quantized

# TODO: Evaluate accuracy of both models on test set
test_dataset = ImageFolder(TEST_PATH, transform=transform)
test_loader  = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=0)
acc_original  = 0.0  # TODO: evaluate model accuracy
acc_quantized = 0.0  # TODO: evaluate model_quantized accuracy

# TODO: Print comparison table
print('=== Quantization Comparison ===')
print(f'Original   — Size: {size_original:.2f}MB, Acc: {acc_original:.2f}%, p50: {lat_orig["p50_ms"]:.2f}ms')
print(f'Quantized  — Size: {size_quantized:.2f}MB, Acc: {acc_quantized:.2f}%, p50: {lat_quant["p50_ms"]:.2f}ms')
model = model.to(DEVICE)  # restore

---
## Deliverable 2: Security Checklist

Build and test the secure API, then complete the security assessment.

In [None]:
# ── Deliverable 2: Security Checklist ────────────────────────────────────────

# --- JWT Setup ---
SECRET_KEY = 'lab3-demo-secret-key'
ALGORITHM  = 'HS256'
oauth2_scheme = OAuth2PasswordBearer(tokenUrl='token')

FAKE_USERS_DB = {
    'labuser': {'username': 'labuser',
                'hashed_password': _bcrypt.hashpw(b'labpass', _bcrypt.gensalt()),
                'disabled': False}
}

# TODO: Implement create_access_token(data, expires_delta)
# Hint: copy pattern from Part 2 or api_secure.py
def create_access_token(data: dict, expires_delta=None) -> str:
    # YOUR CODE HERE
    pass  # TODO


# TODO: Implement get_current_user(token) as a FastAPI dependency
# It should raise HTTP 401 if the token is invalid or expired
async def get_current_user(token: str = Depends(oauth2_scheme)):
    # YOUR CODE HERE
    pass  # TODO


# --- Rate Limiter ---
# TODO: Implement RateLimiter class with is_allowed(client_id) method
# Use a sliding window (defaultdict of timestamp lists)
class RateLimiter:
    def __init__(self, max_requests=5, window_seconds=60):
        self.max_requests   = max_requests
        self.window_seconds = window_seconds
        self._requests = defaultdict(list)

    def is_allowed(self, client_id: str) -> bool:
        # TODO: prune old timestamps, check limit, record new timestamp
        # YOUR CODE HERE
        return True  # TODO: replace with real logic

    def reset(self):
        self._requests.clear()


rate_limiter = RateLimiter(max_requests=3, window_seconds=60)

# --- Build the secure FastAPI app ---
class Token(BaseModel):
    access_token: str; token_type: str

class PredResponse(BaseModel):
    prediction: str; confidence: float; model_version: str; latency_ms: float

secure_app = FastAPI(title='Lab3 Secure API')

@secure_app.post('/token', response_model=Token)
async def login(form_data: OAuth2PasswordRequestForm = Depends()):
    user = FAKE_USERS_DB.get(form_data.username)
    if not user or not _bcrypt.checkpw(form_data.password.encode('utf-8'), user['hashed_password']):
        raise HTTPException(status_code=401, detail='Invalid credentials')
    token = create_access_token({'sub': user['username']}, timedelta(minutes=30))
    return {'access_token': token, 'token_type': 'bearer'}

@secure_app.get('/health')
async def health(): return {'status': 'healthy'}

# TODO: Implement the /predict endpoint with:
#   - JWT authentication (Depends(get_current_user))
#   - Rate limiting (check rate_limiter.is_allowed)
#   - File validation (extension, corruption, size)
#   - Model inference
@secure_app.post('/predict', response_model=PredResponse)
async def predict(
    request: Request,
    file: UploadFile = File(...),
    current_user: dict = Depends(get_current_user),
):
    # TODO: Implement prediction endpoint
    # YOUR CODE HERE
    raise HTTPException(status_code=501, detail='Not implemented yet')


secure_client = TestClient(secure_app)
print('Secure app skeleton ready. Complete the TODOs above.')

In [None]:
# ── Security Verification Tests ───────────────────────────────────────────────

print('Running security verification...')
results = {}

# Test 1: Unauthenticated request
buf, fname = create_test_image()
r = secure_client.post('/predict', files={'file': (fname, buf, 'image/png')})
results['JWT blocks unauthenticated'] = r.status_code == 401

# Test 2: Wrong password
r = secure_client.post('/token', data={'username': 'labuser', 'password': 'wrongpass'})
results['Auth rejects wrong password'] = r.status_code == 401

# Test 3: Valid login
r = secure_client.post('/token', data={'username': 'labuser', 'password': 'labpass'})
results['Valid login returns token'] = r.status_code == 200
token = r.json().get('access_token', '') if r.status_code == 200 else ''
headers = {'Authorization': f'Bearer {token}'}

# Test 4: Authenticated predict
if token:
    buf, fname = create_test_image()
    r = secure_client.post('/predict', headers=headers, files={'file': (fname, buf, 'image/png')})
    results['Authenticated predict works'] = r.status_code == 200
else:
    results['Authenticated predict works'] = False

# Test 5: Rate limiting
rate_limiter.reset()
statuses = []
for _ in range(5):  # max=3, so last 2 should be 429
    buf, fname = create_test_image()
    r = secure_client.post('/predict', headers=headers, files={'file': (fname, buf, 'image/png')})
    statuses.append(r.status_code)
results['Rate limit returns 429'] = 429 in statuses

# Print results
print('\nSecurity Verification Results:')
for check, passed in results.items():
    status = '✓ PASS' if passed else '✗ FAIL'
    print(f'  {status}  {check}')
all_pass = all(results.values())
print(f'\nOverall: {"ALL PASSED" if all_pass else "SOME FAILED — review TODOs"}')

In [None]:
# ── Security Checklist ────────────────────────────────────────────────────────
# TODO: For each item, set True (implemented) or False (not yet)
# Base your answers on the secure_app you built above

security_checklist = [
    # (description, implemented: True/False, your_notes)
    ('JWT Authentication on /predict',     None,  'TODO: set True or False'),
    ('Rate limiting (sliding window)',      None,  'TODO: set True or False'),
    ('File type validation',                None,  'TODO: set True or False'),
    ('Image size validation (min 32x32)',   None,  'TODO: set True or False'),
    ('Empty file handling',                 None,  'TODO: set True or False'),
    ('No secrets hardcoded in source',      None,  'TODO: set True or False — check your SECRET_KEY'),
    ('Error messages do not expose stack',  None,  'TODO: set True or False'),
    ('HTTPS in production deployment',      None,  'TODO: set True or False'),
]

print('\n=== Security Checklist ===')
passed = sum(1 for _, done, _ in security_checklist if done is True)
for item, done, note in security_checklist:
    if done is None:
        status = '? PENDING'
    elif done:
        status = '✓ DONE'
    else:
        status = '✗ TODO'
    print(f'  {status}  {item:<40}  {note}')
total = len(security_checklist)
print(f'\nScore: {passed}/{total} items completed')

---
## Deliverable 3: Automated Test Scripts

Complete the three test classes. The `client` below uses a simple (non-secure) API for testing — you can also test your `secure_client` by adding auth headers.

In [None]:
# ── Simple API for tests (no auth — focus on test writing) ───────────────────
simple_app = FastAPI(title='Lab3 Test API')

@simple_app.get('/health')
async def s_health(): return {'status': 'healthy', 'version': '3.0'}

@simple_app.post('/predict')
async def s_predict(file: UploadFile = File(...)):
    start = time.time()
    ext = Path(file.filename).suffix.lower() if file.filename else ''
    if ext not in {'.jpg', '.jpeg', '.png', '.gif', '.bmp'}:
        raise HTTPException(status_code=400, detail=f'Invalid file type: {ext}')
    contents = await file.read()
    if not contents: raise HTTPException(status_code=400, detail='Empty file')
    try: img = Image.open(io.BytesIO(contents)).convert('RGB')
    except: raise HTTPException(status_code=400, detail='Cannot open image')
    if img.size[0] < 32 or img.size[1] < 32:
        raise HTTPException(status_code=400, detail='Image too small')
    tensor = transform(img).unsqueeze(0).to(DEVICE)
    with torch.no_grad():
        out = model(tensor); probs = torch.softmax(out, 1); conf, idx = torch.max(probs,1)
    lat = (time.time() - start) * 1000
    class_probs = {CLASS_NAMES[i]: float(probs[0,i]) for i in range(NUM_CLASSES)}
    return {'prediction': CLASS_NAMES[idx.item()], 'confidence': round(float(conf.item()),4),
            'class_probabilities': class_probs, 'model_version': 'v3.0-lab', 'latency_ms': round(lat,2)}

client = TestClient(simple_app)
print('Simple test API ready.')

In [None]:
# ── Deliverable 3: Test Class 1 — API Response Tests ─────────────────────────

class TestAPIResponse:
    """Tests for API endpoint correctness and error handling."""

    def test_health_endpoint_returns_200(self):
        # TODO: Call GET /health and assert status_code == 200
        # YOUR CODE HERE
        pass  # TODO

    def test_predict_returns_200_for_valid_image(self):
        # TODO: Create a test image, POST to /predict, assert 200
        # YOUR CODE HERE
        pass  # TODO

    def test_predict_response_has_prediction_field(self):
        # TODO: POST valid image, assert 'prediction' key in json response
        # AND assert prediction is in CLASS_NAMES
        # YOUR CODE HERE
        pass  # TODO

    def test_predict_class_probabilities_sum_to_one(self):
        # TODO: POST valid image, sum class_probabilities values, assert ≈ 1.0
        # YOUR CODE HERE
        pass  # TODO

    def test_invalid_file_type_returns_400(self):
        # TODO: POST a .txt file, assert status_code == 400
        # YOUR CODE HERE
        pass  # TODO

    def test_corrupt_image_returns_400(self):
        # TODO: POST random bytes as .png, assert status_code == 400
        # YOUR CODE HERE
        pass  # TODO

    def test_too_small_image_returns_400(self):
        # TODO: POST a 16x16 image, assert status_code == 400
        # YOUR CODE HERE
        pass  # TODO


print('TestAPIResponse defined (complete the TODO methods).')

In [None]:
# ── Test Class 2 — Accuracy Threshold Tests ──────────────────────────────────

# NOTE: With random weights, accuracy ≈ 14% (1/7 chance level)
# Set threshold to 10% so tests pass even with untrained model
# In production with a trained model: set ACCURACY_THRESHOLD = 70.0
ACCURACY_THRESHOLD = 10.0

class TestAccuracyThreshold:
    """Model quality gate tests — block deployment if accuracy is too low."""

    def test_overall_accuracy_above_threshold(self):
        # TODO: Load ImageFolder from TEST_PATH, create DataLoader,
        #       evaluate model accuracy, assert >= ACCURACY_THRESHOLD
        # YOUR CODE HERE
        pass  # TODO

    def test_model_is_deterministic(self):
        # TODO: Call /predict 5 times with the same image
        #       Assert all predictions are identical
        # YOUR CODE HERE
        pass  # TODO

    def test_confidence_values_in_valid_range(self):
        # TODO: Call /predict, assert 0.0 <= confidence <= 1.0
        # YOUR CODE HERE
        pass  # TODO


print('TestAccuracyThreshold defined (complete the TODO methods).')

In [None]:
# ── Test Class 3 — Latency Threshold Tests ───────────────────────────────────

SINGLE_LATENCY_MS  = 200   # ms — conservative SLA for CPU demo
LATENCY_P99_SLA_MS = 500   # ms — p99 across 50 requests

class TestLatencyThreshold:
    """Performance SLA tests — block deployment if latency is too high."""

    def test_single_request_under_sla(self):
        # TODO: Time a single /predict call (wall-clock)
        #       Assert elapsed_ms < SINGLE_LATENCY_MS
        # YOUR CODE HERE
        pass  # TODO

    def test_p99_latency_under_sla(self):
        # TODO: Make 50 requests, collect latencies
        #       Compute p99 = np.percentile(latencies, 99)
        #       Assert p99 < LATENCY_P99_SLA_MS
        # YOUR CODE HERE
        pass  # TODO

    def test_api_reported_latency_positive(self):
        # TODO: POST image, assert response.json()['latency_ms'] > 0
        # YOUR CODE HERE
        pass  # TODO


print('TestLatencyThreshold defined (complete the TODO methods).')

In [None]:
# ── Run All Tests ─────────────────────────────────────────────────────────────

def run_test_class(test_class):
    instance = test_class()
    results  = []
    methods  = [m for m in dir(instance) if m.startswith('test_')]
    for name in sorted(methods):
        try:
            getattr(instance, name)()
            results.append((name, 'PASS', None))
        except (AssertionError, NotImplementedError, TypeError) as e:
            results.append((name, 'FAIL/TODO', str(e)[:60]))
        except Exception as e:
            results.append((name, 'ERROR', str(e)[:60]))
    passed = sum(1 for _, s, _ in results if s == 'PASS')
    return passed, len(results) - passed, results

print('=' * 65)
print('  Lab 3 — Test Results')
print('=' * 65)
total_p = total_f = 0
for cls in [TestAPIResponse, TestAccuracyThreshold, TestLatencyThreshold]:
    p, f, r = run_test_class(cls)
    total_p += p; total_f += f
    status = '✓ ALL PASS' if f == 0 else f'✗ {f} FAIL/TODO'
    print(f'\n  {cls.__name__}: {p}/{p+f} passed  [{status}]')
    for name, state, err in r:
        icon = '  ✓' if state == 'PASS' else '  ✗'
        print(f'{icon} {name}')
        if err: print(f'       → {err}')
print(f'\n  TOTAL: {total_p} passed, {total_f} fail/todo')
print('=' * 65)
print('\nComplete all TODO methods to get 13/13 passing.')

---
## Final: Production Checklist

Fill in the `status` column for each item based on your work above.

In [None]:
# ── Final Checklist ───────────────────────────────────────────────────────────
# TODO: Update each status to 'PASS', 'FAIL', or 'PARTIAL' based on your work

checklist = [
    # (Category, Item, Status, Notes)
    ('Performance',  'Latency measured (p50/p95/p99)',      'TODO', ''),
    ('Performance',  'Throughput measured (img/s)',          'TODO', ''),
    ('Performance',  'Dynamic quantization applied',         'TODO', ''),
    ('Performance',  'Benchmark chart saved (PNG)',          'TODO', ''),
    ('Security',     'JWT authentication on /predict',       'TODO', ''),
    ('Security',     'Rate limiting (HTTP 429)',             'TODO', ''),
    ('Security',     'Input validation (type/size)',         'TODO', ''),
    ('Security',     'Security checklist 6/8 completed',    'TODO', ''),
    ('Testing',      'API response tests written (7)',       'TODO', ''),
    ('Testing',      'Accuracy threshold test written',      'TODO', ''),
    ('Testing',      'Latency SLA test written',             'TODO', ''),
    ('Testing',      'All tests pass',                       'TODO', ''),
    ('Quantization', 'Size comparison table shown',          'TODO', ''),
    ('Quantization', 'Accuracy delta computed',              'TODO', ''),
]

df_checklist = pd.DataFrame(checklist, columns=['Category', 'Item', 'Status', 'Notes'])
df_checklist.index += 1

print('=== Production Testing Checklist ===')
print(df_checklist.to_string())

passed_items = (df_checklist['Status'] == 'PASS').sum()
print(f'\nScore: {passed_items}/{len(checklist)} items PASS')
print('Replace TODO with PASS/FAIL/PARTIAL as you complete each item.')

---
## Submission Requirements

Before submitting, verify:

- [ ] All `# TODO:` cells are completed with working code
- [ ] `lab3_benchmark.png` was generated successfully
- [ ] Security verification shows all 5 checks passing
- [ ] All 13 test methods pass in the test runner output
- [ ] Quantization comparison table is filled in
- [ ] Final checklist shows at least 10/14 items as PASS

---

**Good luck!** Refer to the Part notebooks for guidance. The solution notebook (`Lab_3_Production_Testing_Solution.ipynb`) is available after submission.