# Tier B: ML Quickfire Cards

**Goal**: 60-second automatic answers for credibility questions.

These are not "gotcha" questions - interviewers use them to verify you've done real ML work.  
Memorize the ONE-LINER, understand the CONTEXT.

---

## Card 1: Leakage vs Overfitting

### One-Liner
> **Leakage** = using future/test info during training (data problem).  
> **Overfitting** = model memorizes training data, fails on new data (model problem).

### How to Detect
| Issue | Detection |
|-------|-----------|
| Leakage | Suspiciously high validation score, drops in production |
| Overfitting | Training acc >> validation acc (gap grows with complexity) |

### Quick Fixes
- **Leakage**: Audit feature pipeline, check temporal splits, remove target-derived features
- **Overfitting**: Regularization, dropout, early stopping, more data, simpler model


## Card 2: ROC-AUC vs PR-AUC

### One-Liner
> **ROC-AUC**: Use when classes are balanced, care about overall ranking.  
> **PR-AUC**: Use when positive class is rare (imbalanced), care about precision at different recalls.

### The Key Insight
ROC-AUC can look great (0.95+) even when your model is useless on rare positives.  
PR-AUC is harsh and honest about performance on the minority class.

### Decision Rule
```
if positive_rate < 10%:
    use PR-AUC
else:
    ROC-AUC is fine
```

### Visual Intuition
- **ROC**: TPR vs FPR (can hide poor precision in imbalanced data)
- **PR**: Precision vs Recall (directly shows what you care about)


## Card 3: Calibration

### One-Liner
> **Calibration** = when model says "70% probability", it should be correct 70% of the time.

### Why It Matters
- Uncalibrated models give **rankings**, not real probabilities
- Critical for: risk assessment, decision thresholds, combining predictions

### How to Check
- **Reliability diagram**: Plot predicted prob vs actual frequency in bins
- Perfect calibration = diagonal line

### How to Fix
| Method | When to Use |
|--------|-------------|
| Platt Scaling | Binary, need sigmoid fit on validation set |
| Isotonic Regression | More flexible, needs more data |
| Temperature Scaling | Neural nets, single parameter |

### Code Snippet
```python
from sklearn.calibration import CalibratedClassifierCV
calibrated = CalibratedClassifierCV(model, method='isotonic', cv=5)
```


## Card 4: Class Imbalance Levers

### One-Liner
> Multiple tools exist; choose based on problem constraints and where in the pipeline you can intervene.

### The Toolkit (in order of preference)

| Lever | How | When |
|-------|-----|------|
| **Threshold tuning** | Adjust decision threshold post-training | Always try first, no retraining |
| **Class weights** | `class_weight='balanced'` | Built into most models, easy |
| **Evaluation metrics** | PR-AUC, F1, balanced accuracy | Ensures you measure the right thing |
| **Undersampling majority** | Random or Tomek links | Fast, loses information |
| **Oversampling minority** | SMOTE, ADASYN | Can overfit, synthetic artifacts |
| **Focal Loss** | Down-weight easy negatives | Deep learning, well-calibrated |
| **Collect more data** | Active learning on minority | Best long-term, expensive |

### Red Flags
- Never SMOTE the test set
- Threshold tuning beats resampling 80% of the time
- Metrics matter more than tricks


## Card 5: Slice-Based Error Analysis

### One-Liner
> Overall metrics hide where your model fails. Slice by meaningful subgroups to find systematic errors.

### The Process
1. **Define slices**: demographics, input features, data sources, edge cases
2. **Compute metrics per slice**: precision, recall, error rate
3. **Find underperformers**: slices where performance << overall
4. **Diagnose root cause**: data quality? feature coverage? distribution shift?
5. **Fix targeted**: more data for slice, specific features, or model ensemble

### Common Slices
- User segments (new vs returning, geo, device)
- Input characteristics (length, language, category)
- Temporal (weekday/weekend, hour, seasonality)
- Label source (manual vs auto-labeled)

### Tools
- Pandas groupby + custom metrics
- `slicefinder` library
- ML monitoring platforms (Arize, Fiddler, WhyLabs)

### Interview Answer Template
> "I slice the data by [meaningful dimension], compute [metric] per slice, identify underperforming segments, then investigate whether it's a data coverage or feature gap."


---

## Quick Self-Test

Can you answer each in under 60 seconds?

1. "Your model has 0.98 AUC but performs poorly in production. What's happening?"
2. "When would you use PR-AUC over ROC-AUC?"
3. "How do you know if your model's probabilities are calibrated?"
4. "You have 1% positive rate. Walk me through your approach."
5. "How do you find where your model is failing?"

**If you hesitate on any, re-read that card.**
