# Lab 6, Module 2: Image Saliency with MobileNetV2

**Estimated time:** 20 minutes

---

## **Opening: From Words to Pixels**

In **Module 1**, you learned how to find which **words** matter most for text predictions.

Now let's ask the same question for images:

> When a model classifies an image as "Golden Retriever", **which pixels** drove that decision?

### **Why This Matters:**

Imagine a medical AI that analyzes X-rays for lung cancer. If it achieves 95% accuracy, that's impressive‚Äîbut:
- **Is it looking at the lung tissue?** ‚úì Good
- **Or is it focusing on the hospital logo in the corner?** ‚úó Spurious correlation

Saliency maps let us **see what the model sees**‚Äîrevealing both successes and failures.

---

# üìò **How Image Saliency Works**

Unlike text (where we could just remove words), we can't simply "delete" pixels. Instead, we use **gradients**‚Äîremember them from Lab 2?

### **The Core Idea:**

Recall from **Lab 2** that gradients measure **sensitivity to changes**:
- When training, we compute gradients with respect to **weights** (how to change parameters)
- For saliency, we compute gradients with respect to **input pixels** (which pixels affect the output most)

### **The Math (Intuition Only, No Calculus Required):**

```
Saliency(pixel) = How much would the class score change if we changed this pixel slightly?
```

- **High gradient** ‚Üí small pixel change ‚Üí big prediction change ‚Üí **important pixel**
- **Low gradient** ‚Üí small pixel change ‚Üí no prediction change ‚Üí **unimportant pixel**

### **Visual Example:**

For an image of a dog:
- **Ears, snout, eyes:** High saliency (distinctive dog features)
- **Grass, sky background:** Low saliency (doesn't help identify "dog")

---

## üß± **Setup: Loading a Pre-Trained Image Classifier**

We'll use **MobileNetV2**, a lightweight image classifier:
- Pre-trained on **ImageNet** (1.4 million images, 1000 classes)
- Recognizes objects like dogs, cats, vehicles, food, etc.
- Only **14 MB** in size (perfect for Colab!)
- Runs on CPU in ~2 seconds per image

This is a production-quality model used in mobile apps and edge devices.

In [None]:
# Install required packages
!pip install tensorflow matplotlib numpy pillow -q

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
from PIL import Image
import os

print(f"‚úì TensorFlow version: {tf.__version__}")
print("‚úì Libraries loaded successfully!")

In [None]:
# Load pre-trained MobileNetV2
print("Loading MobileNetV2 model (this may take a few seconds to download)...")
model = MobileNetV2(weights='imagenet', include_top=True)
print("‚úì Model loaded successfully!")
print(f"Model can recognize {model.output_shape[1]} different object classes.")

---

## üñºÔ∏è **Loading and Classifying Images**

First, let's see how the model classifies images. We'll start with some example images.

### **Option 1: Load from URL (Built-in Examples)**

We'll provide some sample images for demonstration:

In [None]:
def load_and_preprocess_image_from_url(url, target_size=(224, 224)):
    """
    Load an image from URL and preprocess for MobileNetV2.
    """
    import urllib.request
    from io import BytesIO
    
    # Download image
    with urllib.request.urlopen(url) as url_response:
        img_data = url_response.read()
    
    # Load as PIL Image
    img = Image.open(BytesIO(img_data))
    img = img.convert('RGB')  # Ensure RGB format
    img = img.resize(target_size)
    
    # Convert to array and preprocess
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = preprocess_input(img_array)
    
    return img, img_array

def classify_image(img_array, model, top_k=3):
    """
    Classify an image and return top predictions.
    """
    predictions = model.predict(img_array, verbose=0)
    decoded = decode_predictions(predictions, top=top_k)[0]
    return decoded

print("‚úì Image loading functions defined!")

### **Example Image 1: Golden Retriever**

Let's start with a dog image:

In [None]:
# Sample dog image from a public URL
dog_url = "https://hips.hearstapps.com/hmg-prod/images/golden-retriever-royalty-free-image-506756303-1560962726.jpg"
print("Loading dog image...")
dog_img, dog_array = load_and_preprocess_image_from_url(dog_url)

# Display image
plt.figure(figsize=(6, 6))
plt.imshow(dog_img)
plt.axis('off')
plt.title("Input Image: Golden Retriever", fontsize=14)
plt.show()

# Classify
predictions = classify_image(dog_array, model)
print("\nTop 3 predictions:")
for i, (imagenet_id, label, score) in enumerate(predictions, 1):
    print(f"{i}. {label}: {score*100:.2f}%")

---

## üî• **Computing Saliency Maps**

Now let's compute which pixels matter most for the model's prediction.

We'll use **TensorFlow's GradientTape**‚Äîa tool that automatically computes gradients.

In [None]:
def compute_saliency_map(img_array, model, class_idx):
    """
    Compute saliency map using gradient √ó input method.
    
    Args:
        img_array: Preprocessed image (1, 224, 224, 3)
        model: Trained Keras model
        class_idx: Index of target class
    
    Returns:
        saliency_map: 2D array showing pixel importance
    """
    # Convert to tensor
    img_tensor = tf.convert_to_tensor(img_array)
    
    # Compute gradients
    with tf.GradientTape() as tape:
        tape.watch(img_tensor)
        predictions = model(img_tensor)
        target_class_score = predictions[0, class_idx]
    
    # Get gradients of target class score with respect to input image
    gradients = tape.gradient(target_class_score, img_tensor)
    
    # Compute saliency as maximum absolute gradient across color channels
    saliency = tf.reduce_max(tf.abs(gradients), axis=-1)
    saliency = saliency.numpy()[0]  # Remove batch dimension
    
    # Normalize to [0, 1] for visualization
    saliency = (saliency - saliency.min()) / (saliency.max() - saliency.min() + 1e-8)
    
    return saliency

print("‚úì Saliency computation function defined!")

### **Visualizing Saliency Maps**

Let's create a function to show:
1. Original image
2. Saliency heatmap
3. Overlay (saliency on top of original)

In [None]:
def visualize_saliency(original_img, img_array, model, class_idx, class_label):
    """
    Visualize saliency map with 3 views: original, heatmap, overlay.
    """
    # Compute saliency
    saliency_map = compute_saliency_map(img_array, model, class_idx)
    
    # Create figure with 3 subplots
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # 1. Original image
    axes[0].imshow(original_img)
    axes[0].set_title("Original Image", fontsize=14)
    axes[0].axis('off')
    
    # 2. Saliency heatmap
    im = axes[1].imshow(saliency_map, cmap='hot', interpolation='bilinear')
    axes[1].set_title(f"Saliency Map\n(Prediction: {class_label})", fontsize=14)
    axes[1].axis('off')
    plt.colorbar(im, ax=axes[1], fraction=0.046, pad=0.04)
    
    # 3. Overlay
    axes[2].imshow(original_img)
    axes[2].imshow(saliency_map, cmap='hot', alpha=0.5, interpolation='bilinear')
    axes[2].set_title("Overlay (Original + Saliency)", fontsize=14)
    axes[2].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print(f"\n‚úì Saliency map computed for class '{class_label}'")
    print(f"  Red/bright regions = high saliency (important for prediction)")
    print(f"  Blue/dark regions = low saliency (not important)")

print("‚úì Visualization function defined!")

---

## üìä **Example 1: Dog Saliency**

Let's see which pixels matter for recognizing the Golden Retriever:

In [None]:
# Get top prediction
predictions = classify_image(dog_array, model)
top_class_id = np.argmax(model.predict(dog_array, verbose=0)[0])
top_class_label = predictions[0][1]

# Visualize saliency
visualize_saliency(dog_img, dog_array, model, top_class_id, top_class_label)

### **What to Notice:**

Look at the saliency map and overlay:
- **Dog's face (ears, snout, eyes)** ‚Üí Likely HIGH saliency (bright red/yellow)
- **Fur texture** ‚Üí Moderate saliency
- **Background (grass, sky)** ‚Üí LOW saliency (dark blue)

**This makes sense!** The model focuses on distinctive dog features, not the background.

---

## üìù **Question 9 (Observation)**

**Q9.** For the dog image, which parts of the image had the highest saliency? Does this make sense for recognizing a dog?

*Look at the red/bright regions in the heatmap. Are they on the dog's body or the background? Which specific features (ears, face, legs, etc.) are highlighted?*

*Record your answer in the Answer Sheet.*

---

## üìä **Example 2: Handwritten Digit**

Let's try a different type of image‚Äîa handwritten digit:

In [None]:
# Load MNIST digit (we'll use a sample from the web)
digit_url = "https://upload.wikimedia.org/wikipedia/commons/f/f7/MnistExamples.png"

print("Loading digit image...")
# Note: This URL shows multiple digits. For demo, we'll use a simple digit image
# In practice, you'd crop a single digit. For now, let's use a simpler approach:

# Create a simple handwritten-style 8
from PIL import Image, ImageDraw, ImageFont

# Create a simple digit image
digit_img = Image.new('RGB', (224, 224), color='white')
draw = ImageDraw.Draw(digit_img)

# Draw a large "8" shape
draw.ellipse([60, 40, 164, 120], outline='black', width=8)
draw.ellipse([60, 104, 164, 184], outline='black', width=8)

# Preprocess
digit_array = image.img_to_array(digit_img)
digit_array = np.expand_dims(digit_array, axis=0)
digit_array = preprocess_input(digit_array)

# Display
plt.figure(figsize=(5, 5))
plt.imshow(digit_img)
plt.axis('off')
plt.title("Input Image: Handwritten '8'", fontsize=14)
plt.show()

# Classify
predictions = classify_image(digit_array, model)
print("\nTop 3 predictions:")
for i, (imagenet_id, label, score) in enumerate(predictions, 1):
    print(f"{i}. {label}: {score*100:.2f}%")

print("\nNote: MobileNetV2 wasn't trained on digits, so predictions may be unexpected!")
print("But we can still compute saliency to see which pixels matter.")

In [None]:
# Visualize saliency for digit
top_class_id = np.argmax(model.predict(digit_array, verbose=0)[0])
predictions = classify_image(digit_array, model)
top_class_label = predictions[0][1]

visualize_saliency(digit_img, digit_array, model, top_class_id, top_class_label)

---

## üìù **Question 10 (Observation)**

**Q10.** For the handwritten digit, which parts had high saliency? Why would those regions be important?

*Look at where the bright regions are. Are they on the loops? The edges? What do these regions tell you about what the model "sees"?*

*Record your answer in the Answer Sheet.*

---

## üì§ **Upload Your Own Image!**

Now it's your turn! Upload any image and see:
1. What the model classifies it as
2. Which pixels drove that classification

### **Instructions:**
1. Run the cell below
2. Click "Choose Files" and select an image from your computer
3. The image will be classified and saliency will be computed automatically

**Tips:**
- Try photos of animals, objects, vehicles, food
- MobileNetV2 recognizes 1000 classes from ImageNet
- Works best with clear, centered objects

In [None]:
# Image upload widget for Colab
from google.colab import files
from PIL import Image
import io

def process_uploaded_image():
    """
    Handle image upload and process it.
    """
    print("Please upload an image (JPG, PNG, etc.):\n")
    uploaded = files.upload()
    
    if not uploaded:
        print("No file uploaded.")
        return
    
    # Get the first uploaded file
    filename = list(uploaded.keys())[0]
    
    # Load and preprocess
    img = Image.open(io.BytesIO(uploaded[filename]))
    img = img.convert('RGB')
    img_resized = img.resize((224, 224))
    
    img_array = image.img_to_array(img_resized)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = preprocess_input(img_array)
    
    # Display original
    plt.figure(figsize=(6, 6))
    plt.imshow(img)
    plt.axis('off')
    plt.title(f"Your Uploaded Image: {filename}", fontsize=14)
    plt.show()
    
    # Classify
    predictions = classify_image(img_array, model, top_k=5)
    print("\nTop 5 predictions:")
    for i, (imagenet_id, label, score) in enumerate(predictions, 1):
        print(f"{i}. {label}: {score*100:.2f}%")
    
    # Compute and visualize saliency
    print("\nComputing saliency map...\n")
    top_class_id = np.argmax(model.predict(img_array, verbose=0)[0])
    top_class_label = predictions[0][1]
    
    visualize_saliency(img_resized, img_array, model, top_class_id, top_class_label)
    
    return img, predictions

# Run the upload process
my_image = process_uploaded_image()

---

## üìù **Questions 11-12 (Experimentation)**

**Q11.** Upload your own image using the cell above. What object did the model classify it as? What was the top prediction?

- **My uploaded image:** (describe it) _____________________
- **Top prediction:** _______________
- **Confidence:** ______%

*Record your answer in the Answer Sheet.*

---

**Q12.** For your uploaded image, which parts had the highest saliency? Does this reveal how the model recognized the object?

*Think about: Did the model focus on the right features? Or did it focus on something unexpected (background, texture, etc.)?*

*Record your answer in the Answer Sheet.*

---

## ‚ö†Ô∏è **When Saliency Reveals Problems**

Sometimes saliency maps reveal that models focus on the **wrong features**. This is called **spurious correlation**.

### **Real-World Example: Medical Imaging**

**Problem:** An AI trained to detect pneumonia from chest X-rays achieved 90% accuracy.

**Saliency revealed:** The model focused on:
- Hospital logos and watermarks in the corners
- Patient positioning markers
- Image quality artifacts

**Why?** Different hospitals used different imaging equipment. Certain hospitals happened to have more pneumonia cases. The model learned to recognize **hospitals** instead of **lung pathology**!

**Solution:** Retrain with more diverse data, remove watermarks, use data augmentation.

### **Another Example: Husky vs. Wolf**

A famous case study found that a "Husky vs. Wolf" classifier focused on:
- **Background snow** (wolves photographed in snow)
- **Green grass** (huskies photographed in yards)

Not on the animals themselves!

---

## üìù **Question 13 (Critical Thinking)**

**Q13.** Looking at the saliency maps you've seen, can you think of a scenario where a model might focus on the "wrong" features?

*Hint: Think about spurious correlations‚Äîwhen two things appear together by coincidence, not causation.*

*Examples: Backgrounds, watermarks, image quality, lighting, etc.*

*Record your answer in the Answer Sheet.*

---

## üîó **Comparing Text and Image Saliency**

Let's reflect on what's **similar** and **different** between Module 1 (text) and Module 2 (images):

| Aspect | Text Saliency (Module 1) | Image Saliency (Module 2) |
|--------|--------------------------|---------------------------|
| **Input type** | Words in a sentence | Pixels in an image |
| **Method** | Masking (remove words) | Gradients (compute sensitivity) |
| **Output** | Importance per word | Importance per pixel |
| **Visualization** | Bar chart | Heatmap overlay |
| **Interpretation** | Which words drive sentiment? | Which pixels drive classification? |

### **What's Similar:**
- Both measure **sensitivity**: how much the prediction changes when we change the input
- Both reveal **which features matter most**
- Both help **debug models** and **detect spurious correlations**

### **What's Different:**
- Text is **discrete** (remove whole words) vs. images are **continuous** (use gradients)
- Text has clear **semantic units** (words) vs. pixels have no inherent meaning
- Text saliency is **sparse** (few important words) vs. image saliency is **dense** (regions of pixels)

---

## üìù **Question 14 (Synthesis)**

**Q14.** How is image saliency different from the word saliency you explored in Module 1? What's similar?

*Think about: The methods used, the visualizations, what they reveal, and their limitations.*

*Record your answer in the Answer Sheet.*

---

## ‚úÖ Module 2 Complete!

You've learned:
- **How gradient-based saliency works** (sensitivity to pixel changes)
- **Which pixels matter for image classification** (objects vs. backgrounds)
- **How to compute and visualize saliency heatmaps**
- **Why saliency is crucial for debugging** (detecting spurious correlations)
- **How to interpret saliency overlays** (red = important, blue = unimportant)

**Key Insight:** Saliency reveals **what the model actually looks at**‚Äîsometimes it's the right features, sometimes it's not!

**Next up:** Module 3, where you'll explore **tabular saliency** for structured data‚Äîand discover how saliency can reveal bias in features like zip codes and demographics.

---