# Lab 6, Module 0: What Is Saliency?

**Estimated time:** 5 minutes

---

## **Opening: The "Why" Behind AI Decisions**

Imagine this scenario:

- **A doctor** reviews an X-ray and diagnoses pneumonia. When asked, they can point to the specific cloudy regions that indicate infection.
- **A loan officer** denies a mortgage application. By law, they must explain which factors (income, credit score, debt ratio) led to the decision.
- **An AI system** analyzes the same X-ray and predicts pneumonia with 95% confidence. But when asked "why?", it can't answer.

**This is a problem.**

When AI systems make important decisions‚Äîmedical diagnoses, loan approvals, criminal sentencing recommendations, hiring decisions‚Äîwe need to understand **what they're looking at** and **what drives their predictions**.

That's where **saliency maps** come in.

---

# üìò **What Is a Saliency Map?**

A **saliency map** is a visualization that shows which parts of an input were most important to a model's decision.

Think of it as **highlighting the parts the model "looked at" when making a prediction**.

### **Examples Across Different Input Types:**

| Input Type | What Saliency Reveals | Example |
|------------|------------------------|----------|
| **Text** | Which words drove the sentiment? | In "The movie was **excellent** and the acting was **terrible**", which words matter most for positive/negative prediction? |
| **Images** | Which pixels identified the object? | When classifying a dog photo, does the model focus on the ears, the snout, or the background? |
| **Tabular Data** | Which features influenced the decision? | For loan approval, does the model rely on income, credit score, or... zip code? |

**The key insight:** Saliency maps let us **see inside the black box** and understand what features the model relies on.

---

## üß† **How Do Saliency Maps Work? (Intuition)**

The core idea is simple:

> **If I change or remove a part of the input, how much does the prediction change?**

### **Method 1: Masking/Perturbation** (Module 1 & 3)
- Remove one word at a time from a sentence
- See how much the sentiment score drops
- Words that cause big drops are "important"

Example:
- Original: "The movie was **excellent**" ‚Üí 95% positive
- Remove "excellent": "The movie was" ‚Üí 50% positive
- **Importance of "excellent"**: 45%

### **Method 2: Gradients** (Module 2)
- For image models, we can compute **how much each pixel needs to change** to increase the class score
- Pixels with high gradients are "important"
- This creates a heatmap showing where the model focuses

**Don't worry about the math!** The key idea is:
> Saliency measures **sensitivity**: small changes to important parts ‚Üí big changes to predictions

---

## ‚ö†Ô∏è **What Saliency Maps Are NOT**

Before we dive in, it's critical to understand the **limitations** of saliency maps:

### **1. Saliency ‚â† Causality**
Just because a feature is important doesn't mean it *causes* the outcome.

**Example:** A hiring model might find "years of experience" salient, but it doesn't mean experience *causes* good job performance‚Äîit might just be correlated.

### **2. Saliency ‚â† Complete Explanation**
Saliency shows **what** the model uses, but not **how** it combines features or **why** it learned to focus on them.

**Example:** Knowing a medical AI focuses on "cloudy regions" in an X-ray doesn't tell you if it understands lung anatomy or just memorized patterns.

### **3. Saliency Can Reveal Spurious Correlations**
Sometimes models focus on **the wrong things**‚Äîartifacts, watermarks, or biased proxies.

**Example:** An image classifier trained on hospital X-rays might focus on the **hospital logo** in the corner rather than actual anatomy! Saliency maps would reveal this problem.

---

## üéØ **Why Saliency Matters: Ethics & Responsible AI**

Explainability isn't just a technical curiosity‚Äîit's an **ethical and legal requirement** in many domains.

### **Real-World Stakes:**

#### **Medical Diagnosis**
- **Problem:** An AI predicts skin cancer from photos
- **Saliency reveals:** The model focuses on rulers/measuring tapes (often in malignant lesion photos from dermatology clinics)
- **Consequence:** The model learned a spurious correlation, not actual cancer indicators
- **Action:** Retrain with diverse images, remove artifacts

#### **Criminal Justice**
- **Problem:** An AI predicts recidivism risk for sentencing
- **Saliency reveals:** Zip code and neighborhood are highly salient
- **Consequence:** The model encodes systemic bias (redlining, over-policing)
- **Action:** Remove proxy features, audit for fairness

#### **Hiring Systems**
- **Problem:** An AI screens resumes
- **Saliency reveals:** Names, college names, and graduation years drive decisions
- **Consequence:** Potential discrimination by ethnicity, age, or socioeconomic status
- **Action:** Remove identifying information, focus on skills

### **Legal Requirements**
- **GDPR (Europe):** Right to explanation for automated decisions
- **Equal Credit Opportunity Act (US):** Lenders must explain loan denials
- **Emerging AI regulations:** Many jurisdictions are requiring explainability for high-stakes AI

**The bottom line:** If you can't explain what your model is doing, you probably shouldn't deploy it in production.

---

## ‚öñÔ∏è **The Interpretability-Performance Tradeoff**

There's often a tension between **how well a model performs** and **how well we can understand it**:

| Model Type | Performance | Interpretability |
|------------|-------------|------------------|
| **Linear regression** | ‚≠ê‚≠ê (low) | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (very high) |
| **Decision tree** | ‚≠ê‚≠ê‚≠ê (medium) | ‚≠ê‚≠ê‚≠ê‚≠ê (high) |
| **Random forest** | ‚≠ê‚≠ê‚≠ê‚≠ê (high) | ‚≠ê‚≠ê‚≠ê (medium) |
| **Neural network** | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (very high) | ‚≠ê‚≠ê (low) |
| **Large language model (GPT-4)** | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (very high) | ‚≠ê (very low) |

### **The Dilemma**
- Simple models (linear regression, decision trees) are easy to understand but often don't perform well
- Complex models (neural networks, transformers) perform amazingly but are "black boxes"

### **Saliency as a Bridge**
Saliency methods let us:
- Use powerful models when we need high performance
- Still get **some** insight into their decision-making
- Detect when they're using problematic features
- Build trust and meet regulatory requirements

**It's not perfect understanding, but it's better than nothing.**

---

## üîó **Connection to Previous Labs**

Saliency builds on everything you've learned so far:

### **Lab 1: Models and Parameters**
- You learned models make predictions using parameters
- **Lab 6:** Saliency reveals **which inputs affect those predictions most**

### **Lab 2: Gradient Descent**
- You learned how gradients point in the direction of steepest change
- **Lab 6:** Saliency uses gradients to find **which input changes affect the output most**

### **Lab 3: Activation Functions**
- You learned how nonlinearities transform input space
- **Lab 6:** Saliency shows **which parts of the transformed space drive decisions**

### **Lab 4: Hidden Layers**
- You learned hidden layers create internal representations
- **Lab 6:** Saliency reveals **what aspects of the input those representations capture**

### **Lab 5: Embeddings**
- You learned how words and sentences become vectors
- **Lab 6:** Saliency shows **which words in those embeddings drive predictions**

**Today, you're completing the picture:** Not just "how does the model work?" but "what does the model actually look at?"

---

## üìã **What You'll Do in This Lab**

Today's lab has **5 modules**:

| Module | Focus | Time |
|--------|-------|------|
| 0 (this one) | Conceptual foundation | 5 min |
| 1 | Text saliency via word masking | 15 min |
| 2 | Image saliency via gradient visualization | 20 min |
| 3 | Tabular saliency via feature perturbation | 10 min |
| 4 | Ethics & explainability in practice | 10 min |

**Total: ~60 minutes**

By the end, you'll understand:
- How to compute saliency for different data types
- What saliency can and cannot tell you
- Why explainability is crucial for responsible AI
- How to spot when models focus on the wrong features

---

## üìù **Questions (Q1-Q3)**

Before moving on to the hands-on modules, let's reflect on these conceptual questions. Record your answers in the **Answer Sheet**.

---

### **Q1. In your own words, what is a saliency map and why would we want one?**

*Think about: What problem does saliency solve? Why isn't just knowing a model's accuracy enough?*

**Record your answer in the Answer Sheet.**

---

### **Q2. Give an example of a real-world application where model explainability is ethically important.**

*Think about: Where are the stakes high? Where might bias or spurious correlations cause harm?*

**Record your answer in the Answer Sheet.**

---

### **Q3. Why might a saliency map not tell the whole story about how a model makes decisions?**

*Think about: What are the limitations we discussed? What can saliency show, and what can't it show?*

**Record your answer in the Answer Sheet.**

---

## ‚úÖ Module 0 Complete!

You now understand:
- **What saliency maps are** (visualizations of input importance)
- **Why they matter** (ethics, debugging, trust, legal requirements)
- **Their limitations** (not causality, not complete explanations)
- **The interpretability tradeoff** (simple models vs. powerful models)

**Ready for hands-on work?**

Move on to **Module 1: Text Saliency with Word Masking**, where you'll train a sentiment classifier and see which words drive its predictions!

---