### Week 2 Exercise — ImageNet ResNet‑50 on Sentinel‑2 (Guided Examples)

- Kernel: "Python (geoai)"
- AOI: `data/external/aoi.geojson`
- Outputs: `figures/`, `reports/`

Sections (run top-to-bottom):
1) Imports, paths, reproducibility
2) Earth Engine init and AOI load
3) Sentinel‑2 median RGB composite
4) Sample 5 land-cover points and fetch 224×224 patches
5) Load ResNet‑50 (ImageNet) + labels
6) Inference: per-sample top‑5 predictions
7) Visualization: multi-panel patches + predictions (save)
8) Grad‑CAM on one sample (save)
9) Save prediction summary CSV


# 1. Imports, paths, reproducibility
Objective:
This code cell initializes the notebook environment — importing the necessary libraries, ensuring reproducibility, and establishing consistent file paths for data, figures, and reports.

Key Components:
	•	pathlib: Modern, object-oriented way to manage file paths (safer and cleaner than string paths).
	•	geopandas, earthengine-api (ee), geemap: Tools for spatial data handling and access to Google Earth Engine datasets.
	•	torch & torchvision: Deep learning framework and model zoo (ResNet will appear later).
	•	Reproducibility seeds: Fixing random states ensures results are repeatable across runs.
	•	find_repo_root(): A convenience function to automatically locate the repository root, supporting relative path resolution — a hallmark of reproducible research environments.


In [None]:
# 1) Imports, paths, reproducibility

# Core utilities for file handling, randomness, and basic data types
from pathlib import Path
import os, io, json, random
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import requests

# Geospatial libraries
import geopandas as gpd       # For handling vector spatial data (GeoJSON, shapefiles)
import ee, geemap             # For accessing and visualizing Earth Engine datasets

# Deep learning / model utilities
import torch
import torchvision as tv
from torchvision.models import resnet50, ResNet50_Weights  # Pretrained CNN and associated weights

# Reproducibility: set fixed seeds for Python, NumPy, and PyTorch
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)

# --- Repository Path Management ---

# Function to detect repo root by looking upward for known folder names
def find_repo_root(start: Path) -> Path:
    for p in [start] + list(start.parents):
        # We assume a repo structure containing 'data' and 'figures'
        if (p / 'data').exists() and (p / 'figures').exists():
            return p
    # If nothing found, default to current working directory
    return start

# Resolve key paths
CWD = Path.cwd()                # Current working directory (where notebook is running)
REPO = find_repo_root(CWD)      # Root folder of the project/repo
DATA = REPO / 'data'            # Data folder
FIGS = REPO / 'figures'         # Folder for saving output images
REPORTS = REPO / 'reports'      # Folder for CSVs and summaries

# Define Area of Interest (AOI) path (used later for Sentinel-2 imagery)
AOI_PATH = DATA / 'external' / 'aoi.geojson'

# Ensure output directories exist (idempotent)
FIGS.mkdir(exist_ok=True, parents=True)
REPORTS.mkdir(exist_ok=True, parents=True)

# Diagnostic prints for sanity check
print('CWD:', CWD)
print('Repo root:', REPO)
print('Resolved AOI:', AOI_PATH)
print('AOI exists:', AOI_PATH.exists())

**Outcome:**
After running this cell, the environment is ready for geospatial and deep learning operations. The project structure (`data/`, `figures/`, `reports/`) exists and the AOI GeoJSON file is present.

- If `AOI exists: True`, you can proceed to Earth Engine initialization.
- If `False`, fix the AOI path before fetching Sentinel-2 data.

## 🌟 The More You Know! — ResNet‑50 and Weights

🧠 What is ResNet50?

ResNet50 (short for Residual Network, 50 layers deep) is a convolutional neural network (CNN) architecture introduced by He et al., 2015 in the paper “Deep Residual Learning for Image Recognition.”

It’s one of the first deep networks to successfully train very deep models (50+ layers) by using skip connections (or residuals) — shortcuts that allow the network to “skip” one or more layers.
These connections solve the vanishing gradient problem that used to plague very deep CNNs, enabling better accuracy with efficient training.

In short:

ResNet50 is a 50-layer deep CNN that uses residual learning to improve performance and stability.

🧩 How it’s used

In your notebook:

from torchvision.models import resnet50

You’ll later call it like this:

model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

That means:
	•	You’re instantiating the ResNet50 model architecture.
	•	You’re initializing it with pretrained weights (parameters learned from training on ImageNet — a dataset of 1.2 million images across 1,000 categories).

The model then acts as a feature extractor or classifier that can recognize common objects (e.g., “dog,” “airplane,” “traffic light”) or be adapted to new domains (like satellite imagery).

⸻

🏋️ What is ResNet50_Weights?

ResNet50_Weights is a convenient enumeration object in torchvision that keeps track of available pretrained weight configurations for ResNet50.

It’s part of the new API in torchvision>=0.13, which replaced the old pretrained=True syntax with something clearer and more controlled.

You can inspect it directly:

from torchvision.models import ResNet50_Weights
print(ResNet50_Weights.DEFAULT)

Commonly used options include:
	•	ResNet50_Weights.IMAGENET1K_V1: Original weights trained on ImageNet-1k (2015 baseline).
	•	ResNet50_Weights.IMAGENET1K_V2: Updated weights with better performance (recommended).
	•	ResNet50_Weights.DEFAULT: Alias for the latest best-performing set.

When you use one, you can also access its preprocessing transforms (normalization, resizing, etc.):

weights = ResNet50_Weights.IMAGENET1K_V2
preprocess = weights.transforms()

That gives you a pipeline for preparing your input images in exactly the way the model expects — ensuring consistent predictions.

⸻

🧩 TL;DR — Quick Reference

Tool	What it is	How you use it	Why it matters
resnet50	CNN architecture with 50 layers and residual (skip) connections	model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)	Extracts visual features or performs classification
ResNet50_Weights	Pretrained weight enumerator + transform helper	weights = ResNet50_Weights.DEFAULT	Provides pretrained parameters and image preprocessing


# 2. Earth Engine Initialization

Objective:
This block connects your notebook to Google Earth Engine (EE) and loads the Area of Interest (AOI) GeoJSON file into both GeoPandas (for local Python-side manipulation) and Earth Engine (for server-side geospatial computation).

Key Components:
	•	ee.Initialize() / ee.Authenticate() → Log in and establish a connection to Google Earth Engine.
	•	geopandas → Reads and checks your AOI GeoJSON locally; ensures it’s in WGS84 (EPSG:4326) coordinate reference system.
	•	geemap.geopandas_to_ee() → Converts the GeoDataFrame to an Earth Engine geometry object for use in cloud-based image queries.
	•	geemap.Map() → Creates an interactive map widget right inside your notebook, letting you visualize your AOI and, later, image layers.

Why this matters:
Think of this cell as connecting your lab notebook to the Earth Engine satellite archive. Once this works, you can query massive datasets (e.g., Sentinel-2) without downloading gigabytes of imagery.

In [None]:
# 2) Earth Engine init and AOI load

try:
    # Attempt to initialize an existing Earth Engine session.
    ee.Initialize()
    print('EE initialized')
except Exception:
    # If initialization fails (e.g., first use or expired token), authenticate and re-initialize.
    print('EE not initialized; authenticating...')
    ee.Authenticate()
    ee.Initialize()

# --- Load and validate Area of Interest (AOI) locally ---

# Confirm that the AOI GeoJSON file exists before proceeding
assert AOI_PATH.exists(), f"Missing AOI at {AOI_PATH}"

# Read AOI into a GeoDataFrame (local vector representation)
aoi_gdf = gpd.read_file(AOI_PATH)
assert not aoi_gdf.empty, "AOI GeoDataFrame is empty!"

# Ensure coordinate reference system (CRS) is defined; default to WGS84 if missing
if aoi_gdf.crs is None:
    aoi_gdf.set_crs(epsg=4326, inplace=True)

# Convert to standard geographic coordinates (latitude/longitude)
aoi_gdf = aoi_gdf.to_crs('EPSG:4326')

# Convert GeoDataFrame geometry to an Earth Engine feature collection
AOI_EE = geemap.geopandas_to_ee(aoi_gdf[['geometry']])

# --- Display and visualize AOI ---

# Print bounding box for context (minx, miny, maxx, maxy)
print('AOI bounds:', aoi_gdf.total_bounds)

# Create an interactive map object using geemap
Map = geemap.Map()

# Add AOI layer to the map for visualization
Map.add_gdf(aoi_gdf, layer_name='AOI')

# Center the map on the AOI (zoom level 10 is a moderate regional scale)
Map.centerObject(AOI_EE, 10)

# Display the map widget
Map

**Outcome:**
This cell verifies your Earth Engine connection and loads your AOI into both your local and cloud environments.

### 🌟 The More You Know! — Earth Engine & GeoPandas Integration

🔭 Google Earth Engine (EE)
A cloud-based planetary data platform that hosts petabytes of satellite imagery and geospatial data. Instead of downloading imagery, you send queries that Earth Engine executes server-side.
You can authenticate once (ee.Authenticate()) and then access imagery by dataset name (e.g., "COPERNICUS/S2_SR_HARMONIZED" for Sentinel-2).

🪶 GeoPandas ↔ Earth Engine Bridge (geemap)
geemap is a Python package that acts as the bridge between GeoPandas (local vector data) and Earth Engine (cloud vector/raster data).

# 3. Creating a composite RGB from Sentinel-2 data

Objective:
Retrieve and visualize a cloud-free median composite of Sentinel-2 surface reflectance imagery over your Area of Interest (AOI).

Key Components:
	•	Dataset: COPERNICUS/S2_SR_HARMONIZED — harmonized Sentinel-2 Level-2A (surface reflectance) imagery from both Sentinel-2A and 2B.
	•	Filters:
	•	Spatial: restricted to your AOI (filterBounds).
	•	Temporal: January 2020 – December 2023 (filterDate).
	•	Quality: exclude images with > 20 % cloud cover (CLOUDY_PIXEL_PERCENTAGE).
	•	Cloud Mask: Uses the QA60 band, which encodes cloud and cirrus information as bit flags.
	•	Composite: Takes the median value for each pixel across the filtered image collection, producing a representative, nearly cloud-free image.
	•	Visualization: Generates a 512-pixel thumbnail URL for quick viewing and adds the RGB layer to your interactive geemap map.

In [None]:
# 3) Sentinel-2 median RGB composite (harmonized)

# Load Sentinel-2 Surface Reflectance (harmonized) collection from Earth Engine
s2 = (
    ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')  # Level-2A SR data from Sentinel-2A/B
      .filterBounds(AOI_EE)                            # Spatial filter: only scenes intersecting AOI
      .filterDate('2020-01-01', '2023-12-31')          # Temporal filter: Jan 2020 – Dec 2023
      .filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', 20))  # Metadata filter: ≤ 20 % cloud cover
)

# --- Define a simple cloud mask function using QA60 bit flags ---
def mask_s2_sr(img):
    qa60 = img.select('QA60')          # QA60 band contains cloud/cirrus bits
    clouds = qa60.bitwiseAnd(1 << 10).eq(0)  # Bit 10 = clouds (1 = cloudy, 0 = clear)
    cirrus = qa60.bitwiseAnd(1 << 11).eq(0)  # Bit 11 = cirrus (high thin clouds)
    # Keep only pixels where both cloud and cirrus bits are clear
    return img.updateMask(clouds.And(cirrus))

# Apply the mask function to every image in the collection
s2_clean = s2.map(mask_s2_sr)

# Build a median composite of the cleaned images and select RGB bands
rgb = (
    s2_clean.median()                   # Pixel-wise median across time
            .select(['B4','B3','B2'])   # True-color bands: Red, Green, Blue
            .clip(AOI_EE)               # Clip result to AOI boundary
)

# --- Diagnostics and visualization ---

# Print number of Sentinel-2 scenes included after filtering
print('S2 count:', s2.size().getInfo())

# Generate a thumbnail URL (512 px wide) for quick visual inspection
thumb = rgb.getThumbURL({
    'min': 0, 'max': 3000, 'dimensions': 512, 'region': AOI_EE.geometry()
})
print('Thumb URL:', thumb)

# Add the RGB composite to the interactive map
Map.addLayer(rgb, {'min':0,'max':3000}, 'S2 RGB (median)')

# Optionally display the map widget in compatible environments
Map

**Outcome:**
You’ve now created a median composite image representing typical surface conditions in your AOI from 2020–2023.
	•	The console output S2 count: tells you how many individual Sentinel-2 scenes contributed to this composite.
	•	The Thumb URL provides a quick-load image you can open in a browser to confirm the color balance and cloud removal.
	•	The interactive map now displays your AOI with a clean, cloud-free true-color background.

Why median?
Taking the median of each pixel stack removes outliers — mostly clouds and shadows — while preserving consistent surface features. It’s a statistical way to produce a “most typical” image over time.

## 🌟 The More You Know! — Sentinel‑2, QA60, and Harmonized Composites

**🛰️ Sentinel-2 & the “Harmonized” Collection**

- **Sentinel-2** is a twin-satellite system (S2A + S2B) providing 10 m–60 m resolution multispectral imagery.
- **Harmonized SR (Surface Reflectance)**: ESA reprocessed both satellites’ data so that their spectral responses match — this is the COPERNICUS/S2_SR_HARMONIZED dataset.

### ☁️ Cloud Masking with QA60

The QA60 band encodes quality information using **bit flags**:

| **Bit** | **Meaning** | **Mask logic** |
| --- | --- | --- |
| 10 | Cloud | keep if bit 10 = 0 |
| 11 | Cirrus | keep if bit 11 = 0 |

By combining both, img.updateMask(clouds.And(cirrus)) removes cloud-affected pixels from further analysis.

### 🎨 Median Composites — “Typical Earth” View

Each pixel in the composite is the **median reflectance** value across all cloud-free images.

It’s not one real date — it’s a *synthetic, representative scene*.

This technique is standard in remote sensing when producing annual or multi-year base layers.

**Analogy:**

Imagine stacking dozens of photos of the same landscape taken throughout the year.

If you take the **median color** of each pixel stack, you get a single clean image with clouds statistically “erased.”

# 4. Point Extraction

Objective:
Define and visualize a small set of representative land-cover points (Forest, Water, Urban, Agriculture) that you’ll use to extract Sentinel-2 image patches for classification and interpretability (via ResNet and Grad-CAM later on).

Key Components:
	•	PTS_COORDS: Dictionary of labeled longitude/latitude pairs representing known land-cover types.
	•	pts: Reformatted dictionary for easier access in later loops.
	•	Visualization: Each point is added to the interactive Earth Engine map for spatial verification — ensuring they fall within your AOI and represent diverse surface conditions.

Why this matters:
These labeled points are the anchors for your analysis — they link semantic meaning (“forest”, “water”) to actual imagery pixels. You’ll use them to probe how a pretrained image model (ResNet-50) interprets satellite textures across land-cover types.

In [None]:
# 4) Use specified Week 1 points (Forest, Water, Urban, Agriculture)

# Define dictionary of labeled coordinates (longitude, latitude)
PTS_COORDS = {
    'Forest': (-73.028383, -41.124933),
    'Water': (-73.016458, -41.137219),
    'Urban': (-73.054658, -41.124836),
    'Agriculture': (-73.046583, -41.139317),
}

# Restructure dictionary to make later access easier (adds 'lon' and 'lat' keys explicitly)
pts = {k: {'lon': lon, 'lat': lat} for k, (lon, lat) in PTS_COORDS.items()}

# Display the points for verification
print('Using fixed points (lon, lat):')
for k, v in pts.items():
    print(k, v['lon'], v['lat'])

# --- Add points to interactive map for visual check ---

for k, v in pts.items():
    # Create a point geometry for each coordinate
    point = ee.Geometry.Point([v['lon'], v['lat']])
    # Add point as a red dot layer with the land-cover label
    Map.addLayer(point, {'color':'red'}, f'{k} (fixed)')

# Display the updated map
Map

**Outcome:**
This cell defines four labeled geographic points representing distinct land-cover types.
After running it:
	•	You’ll see a printed list of coordinates for each class.
	•	The interactive map will display red dots labeled Forest, Water, Urban, and Agriculture.

These fixed reference points serve two purposes:
	1.	They ensure your extracted image chips are spatially and semantically distinct.
	2.	They form the foundation for comparing model predictions — observing how the pretrained ResNet interprets different landscapes.

In essence: you’ve now created your “ground truth seeds” for visual AI interpretation.

🌟 The More You Know! — Ground Truth Points and Spatial Sampling

🗺️ Why Fixed Points Matter

When testing pretrained vision models on satellite imagery, it’s critical to use consistent, representative locations.
By fixing coordinates:
	•	You remove randomness, ensuring reproducibility (each run uses the same locations).
	•	You can compare outputs across sessions, models, or preprocessing changes.

🧭 How This Ties Into Machine Learning

In your workflow, these points will act like mini labeled samples:
	•	Each coordinate → one image patch (128×128 pixels).
	•	Each patch → passed through ResNet-50 → predicted label(s).

You’ll then compare what the model “thinks” (e.g., tennis court) to what you know (e.g., urban area).
This contrast demonstrates domain shift — a central concept in Week 2.

# 5. Fetch Patches
Objective:
Extract small, square Sentinel-2 RGB image patches around your labeled points (Forest, Water, Urban, Agriculture) using Earth Engine’s thumbnail service and store them locally in memory as PIL images.

Key Components:
	•	Patch parameters:
	•	PATCH_SIZE = 128 → pixel width and height of each extracted chip.
	•	SCALE = 20 → spatial resolution in meters per pixel (Sentinel-2 RGB bands are at 10 m; 20 m gives manageable file sizes).
	•	MAX_SAMPLES = 3 → limits how many labeled sites to fetch (useful for debugging).
	•	get_patch() function:
	•	Builds a buffered bounding box around each point.
	•	Uses getThumbURL() to request a JPEG thumbnail (not full raster) from Earth Engine.
	•	Retrieves the image using Python’s requests library and loads it into a PIL.Image object for later processing.
	•	Loop over sample points: Fetches patches for up to three labeled locations (e.g., Forest, Water, Urban).

Why this matters:
This cell forms the bridge between remote sensing and machine learning — taking geospatial imagery (Earth Engine) and converting it into local tensors or arrays suitable for PyTorch inference.

In [None]:
# 5) Fetch patches 

# --- Patch extraction parameters ---
PATCH_SIZE = 128        # Desired patch size (pixels per side)
SCALE = 20              # Spatial scale (meters per pixel)
MAX_SAMPLES = 3         # Max number of labeled samples to retrieve
REQUEST_TIMEOUT = 20    # Timeout for HTTP requests (seconds)

# --- Helper function: fetch one patch centered on (lon, lat) ---

def get_patch(img, lon, lat, size_m=PATCH_SIZE*SCALE, scale=SCALE):
    # Compute half-width of patch in meters (for buffer radius)
    half = size_m / 2
    
    # Reproject image to a consistent coordinate system and resolution
    proj = img.reproject(crs='EPSG:4326', scale=scale)
    
    # Define export/thumbnail parameters
    params = {
        'region': ee.Geometry.Point([lon, lat]).buffer(half).bounds(),  # square region
        'dimensions': f'{PATCH_SIZE}x{PATCH_SIZE}',                     # output size in pixels
        'format': 'jpg', 'min': 0, 'max': 3000                          # stretch values for RGB
    }
    
    # Generate a temporary download URL from Earth Engine
    url = proj.getThumbURL(params)
    
    # Retrieve the image bytes via HTTP request
    r = requests.get(url, timeout=REQUEST_TIMEOUT)
    r.raise_for_status()   # Raise error if download failed
    
    # Load bytes into a PIL Image and convert to RGB
    return Image.open(io.BytesIO(r.content)).convert('RGB')

# --- Loop through labeled points and fetch patches ---

patches = {}
for idx, (name, info) in enumerate(pts.items()):
    if idx >= MAX_SAMPLES:  # Optional limit for testing
        break
    try:
        patches[name] = get_patch(rgb, info['lon'], info['lat'])
    except Exception as e:
        print('Patch fetch failed for', name, e)

# Confirm which patches were successfully retrieved
print('Patch keys:', list(patches.keys()))

**Outcome:**

This cell creates a dictionary `patches` containing small RGB image crops centered on up to three labeled points (e.g., `{'Forest': <PIL.Image>, 'Water': <PIL.Image>, 'Urban': <PIL.Image>}`).

After running it, you should see a printout such as:

```
Patch keys: ['Forest', 'Water', 'Urban']
```

Each entry represents a 128×128 pixel Sentinel-2 thumbnail, roughly covering a 2.56 km × 2.56 km area (128 × 20 m per pixel).

These image patches are now ready for:

- Visualization (sanity checks)
- Input to your ResNet-50 model to generate top-5 ImageNet predictions and Grad-CAM explanations

If a request fails (e.g., network hiccup or invalid coordinates), the try/except block reports it without stopping execution.

## **🌟 The More You Know! — Thumbnails, Scale, and Sampling**

### **🛰️ Earth Engine Thumbnails**

getThumbURL() is Earth Engine’s efficient way to export small visualization-ready images.

Rather than downloading raw geotiffs, it creates lightweight RGB JPEGs using your specified visualization parameters (min, max, dimensions, region).

Perfect for quick model prototyping or visualization tasks.

### **⚖️ Scale and Patch Size**

- **Scale (m/pixel)** controls the ground resolution.
    - At 20 m/pixel, a 128×128 patch covers about **2.56 km × 2.56 km**.
    - Smaller scale (e.g., 10 m) increases resolution but also doubles file size and load time.
- **Patch size (pixels)** controls how much context the model sees — a balance between local detail and landscape context.

### **💡 Why Use JPEG Thumbnails?**

For exploratory ML and explainability work, speed matters more than spectral precision.

JPEGs load fast, use less memory, and are fully compatible with vision models like ResNet-50, which expect 3-band RGB input tensors.

# 5. Load ResNet-50
**Objective:**
Load a pretrained ResNet-50 model from torchvision — one of the most widely used architectures for image classification — along with its associated preprocessing pipeline and label set from ImageNet-1K (1,000 everyday object categories).

**Key Components:**
	•	ResNet50_Weights.IMAGENET1K_V2 — provides pretrained model weights and the correct transforms for image normalization.
	•	Device selection — automatically uses a GPU (cuda) if available, else defaults to CPU.
	•	model.eval() — switches the model to inference (evaluation) mode, disabling dropout and gradient updates.
	•	weights.transforms() — returns the preprocessing pipeline that matches the ImageNet training setup (resize, crop, normalize).
	•	weights.meta['categories'] — contains the 1,000 human-readable ImageNet class names used for model predictions.

**Why this matters:**
This step equips your notebook with a general-purpose visual recognition model trained on everyday imagery — perfect for exploring domain shift between photographic and satellite data.

In [None]:
# 5) Load ResNet-50 (ImageNet) + labels

# Import pretrained weight enumerator (provides weights + metadata)
from torchvision.models import ResNet50_Weights

# --- Hardware configuration ---
# Use GPU (cuda) if available, else fallback to CPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Torch device:', device)

# --- Model weights and initialization ---
# Select the improved ImageNet pretrained weights (Version 2)
weights = ResNet50_Weights.IMAGENET1K_V2

# Instantiate the ResNet-50 architecture with pretrained weights
# Move model to the selected device and set to evaluation mode
model = resnet50(weights=weights).to(device).eval()

# --- Preprocessing pipeline ---
# The weight object includes the exact preprocessing transforms used during training
# (resize → center-crop → normalize to ImageNet mean/std)
preprocess = weights.transforms()

# --- Class labels ---
# Retrieve human-readable class names from ImageNet metadata (1,000 categories)
labels = weights.meta.get('categories', None)

# Diagnostic printout
print('Model loaded with', len(labels) if labels else 'unknown', 'classes')

**Outcome:**

This cell successfully loads a pretrained **ResNet-50 model** with **1,000 ImageNet categories**, ready to classify your Sentinel-2 patches.

You’ll see output similar to:

```
Torch device: cpu (or cuda)
Model loaded with 1000 classes
```

At this point:

- The variable **model** holds a fully initialized neural network on your chosen device.
- **preprocess** is a transformation function that standardizes your patches for inference.
- **labels** is a list of the 1,000 possible prediction classes (e.g., “airplane,” “broccoli,” “volcano”).

In the next step, you’ll apply preprocess to each patch, feed them through model, and interpret the predicted top-k classes.

## **🌟 The More You Know! — Transfer Learning, ImageNet, and Preprocessing**

### **🧠 What is Transfer Learning?**

Instead of training from scratch, you “transfer” knowledge from a model trained on a large, diverse dataset (ImageNet) to your own task.

ResNet-50 has already learned to detect **edges, textures, shapes, and patterns** that are general to many kinds of imagery — these features transfer surprisingly well to new visual domains like remote sensing.

> In short: we borrow a model’s
> 
> 
> *visual literacy*
> 

### **🏋️ ImageNet and**

### **IMAGENET1K_V2**

- **ImageNet-1K** contains **1.2 million** labeled photos in **1,000 categories** (cats, planes, churches, etc.).
- **IMAGENET1K_V2** are refined weights released in PyTorch’s new API (better calibration, accuracy).
- Using ResNet50_Weights.IMAGENET1K_V2 ensures consistency between preprocessing, model weights, and label mapping.

### **🧩 Why weights.transforms()**

### **Matters**

Every pretrained model expects inputs scaled and normalized exactly as during training.

Calling weights.transforms() automatically returns the correct preprocessing steps:

1. Resize → 256 px shorter side
2. Center-crop → 224×224
3. Convert to tensor
4. Normalize with ImageNet mean and std

This alignment prevents subtle prediction errors caused by mismatched input distributions.

# 6. Inference: top‑5 predictions per sample

**Objective:**

Run **inference** with your pretrained **ResNet-50 (ImageNet)** model on each of your Sentinel-2 image patches and display the **top-5 predicted labels** (and their probabilities).

**Key Components:**

- **top5() helper function**: Converts raw model outputs (logits) into probabilities using softmax and extracts the 5 most likely classes.
- **Inference loop**:
    - Applies the correct preprocessing pipeline to each patch.
    - Performs a forward pass through the network (model(inp)).
    - Retrieves top-5 predictions and human-readable labels.
- **results list**: Stores structured prediction data (class name, labels, probabilities) for later use — like generating Grad-CAMs or saving reports.

**Why this matters:**

This is your first hands-on view of how **general-purpose image recognition** models behave when faced with **remote-sensing data** — usually revealing mismatches between what the model “sees” (e.g., “sports field”) and the real surface type (“agriculture”).

In [None]:
# 6) Inference: top-5 predictions per sample

# --- Helper function to extract top-5 predictions from logits ---

def top5(logits):
    # Apply softmax to convert logits → probabilities
    probs = torch.softmax(logits, dim=1)
    # Extract top-5 probabilities and their indices
    p, i = probs.topk(5, dim=1)
    # Remove batch dimension for convenience
    return p.squeeze(0).tolist(), i.squeeze(0).tolist()

# --- Run inference on each patch ---

results = []

for name, img in patches.items():
    # Apply ImageNet preprocessing: resize, crop, normalize
    inp = preprocess(img).unsqueeze(0).to(device)

    # Disable gradient tracking for efficiency (inference mode)
    with torch.no_grad():
        logits = model(inp)

    # Get top-5 probabilities and class indices
    p, idx = top5(logits)

    # Convert indices to human-readable labels
    labs = [labels[j] if labels else str(j) for j in idx]

    # Print the sample name and its top-5 predictions (label + probability)
    print(name, list(zip(labs, [round(x,4) for x in p])))

    # Store structured results for later analysis
    results.append({
        'class': name,
        'top5_labels': labs,
        'top5_probs': [float(x) for x in p]
    })

# Display aggregated results for all patches
results

**Outcome:**

You now have a list of top-5 model predictions for each Sentinel-2 patch.

Each patch produces:

- `top5_labels`: the most likely ImageNet classes
- `top5_probs`: corresponding probabilities (sum ≤ 1 because only top-5 shown)

This demonstrates domain shift — how pretrained models interpret out-of-distribution data using their existing visual vocabulary. The `results` list feeds forward into CSV summaries, visualization panels, and Grad-CAM analyses.

## **🌟 The More You Know! — Logits, Softmax, and Top-k Predictions**

### **🔢 Logits → Probabilities**

A model’s raw output (logits) are **unnormalized scores** — higher means more confident, but they don’t sum to 1.

torch.softmax(logits, dim=1) transforms them into **probabilities** that sum to 1 across all 1,000 classes.

P(y_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}

### **🥇 Top-k Logic**

topk(5) returns the indices and probabilities of the 5 most likely predictions.

The “top-5” metric is standard in ImageNet evaluation because it allows for near-misses (the true class being among the top-5 guesses).

### **🧠 Why Use**

### **torch.no_grad()**

When you’re not training, you disable gradient computation to save memory and speed up inference.

Without it, PyTorch tracks all operations for backpropagation, which is unnecessary here.

### **🌍 Domain Shift in Action**

- **ResNet-50** learned from human-eye-level photos (ImageNet).
- **Sentinel-2** captures 10–20 m ground pixels from orbit.
 
    The visual features overlap (edges, shapes, colors) but the *context* differs.
    
    This mismatch is what your analysis explores — how transferable general visual features are to environmental imagery.

# 7. Visualization: multi-panel patches + predictions

**Objective:**

Create a clean, multi-panel **visual summary** showing each Sentinel-2 patch alongside its top-3 ImageNet predictions and probabilities.

This is both a diagnostic tool and a storytelling artifact — allowing you to visually compare what the model perceives with what you know each scene represents.

**Key Components:**

- **Matplotlib grid layout** dynamically sized to the number of patches.
- **Titles** summarize the model’s top-3 predictions (labels + confidence values).
- **Saved figure** stored in your repository’s /figures directory for reproducibility and report inclusion.

**Why this matters:**

Visualizing results is crucial for interpretability. You’ll quickly spot patterns — for example, the model confusing “agriculture” with “golf course” or “runway.” These qualitative insights complement the quantitative outputs.

In [None]:
# 7) Visualization: multi-panel patches + predictions
import math

# --- Layout setup ---
n = len(patches)           # Number of image patches to display
cols = 3                   # Number of columns in the grid
rows = math.ceil(n / cols) if n else 1  # Rows needed to display all patches

# Create a Matplotlib figure with dynamic sizing
fig, axes = plt.subplots(rows, cols, figsize=(12, 4 * rows))

# Ensure axes are iterable even if only one subplot is created
axes = axes.flatten() if isinstance(axes, np.ndarray) else [axes]

# Hide empty subplot axes (for unused grid cells)
for ax in axes:
    ax.axis('off')

# --- Populate grid with image patches and predictions ---
for i, (name, img) in enumerate(patches.items()):
    ax = axes[i]
    ax.imshow(img)  # Show the Sentinel-2 RGB patch

    # Retrieve model results corresponding to this patch
    row = next(r for r in results if r['class'] == name)

    # Compose title string with top-3 predicted labels and probabilities
    title = (
        f"{name}: " +
        ", ".join([f"{l} ({p:.2f})" for l, p in zip(row['top5_labels'][:3], row['top5_probs'][:3])])
    )

    # Apply title to subplot
    ax.set_title(title, fontsize=9)

# Adjust layout to prevent overlap
plt.tight_layout()

# --- Save figure to /figures directory ---
out_fig = FIGS / 'week2_imagenet_predictions.png'
plt.savefig(out_fig, dpi=200)
print('Saved:', out_fig)

**Outcome:**

This cell produces a multi-panel figure — one image per sample patch — each annotated with its top-3 predicted ImageNet classes and corresponding confidence scores. The figure is saved to `figures/week2_imagenet_predictions.png`.

You can now qualitatively assess how the pretrained ResNet model “reads” Earth’s surface textures. This provides a visual foundation for your Grad-CAM exploration — where you examine which parts of the image the model attends to.

## **🌟 The More You Know! — Visualizing AI Predictions**

### **🎨 Why Multi-Panel Layouts Matter**

Humans excel at **side-by-side comparison**.

By juxtaposing images with their predictions, you can quickly detect patterns, misclassifications, or biases — for example, consistent misinterpretation of agricultural textures as sports fields.

### **🧠 Reading the Titles**

Each subplot title includes:

- The **ground truth label** (Forest, Urban, etc.)
- The **model’s top-3 predictions** (ImageNet categories)
- The **confidence values** (softmax probabilities)

Confidence values can be misleadingly high even for wrong classes — a reminder that model confidence ≠ correctness.

### **💾 Reproducibility & Reporting**

Saving the figure to /figures/week2_imagenet_predictions.png ensures your visual outputs are version-controlled and easy to embed in future reports, hackathon summaries, or reflective logs.

# 8. Grad‑CAM Heatmap

**Objective:**
Implement and apply Grad-CAM (Gradient-weighted Class Activation Mapping) to visualize where ResNet-50 is “looking” when it makes a classification decision on one Sentinel-2 patch.

**Key Components:**
	•	GradCAM class: Captures forward activations and backward gradients from a chosen layer (here, layer4, the final convolutional block in ResNet-50).
	•	Hook functions: Attach to the model so you can extract intermediate data during the forward and backward passes.
	•	Weighted activation map computation: Combines gradients and activations to form a heatmap highlighting the most influential spatial regions.
	•	Visualization: Overlays the heatmap (red = high importance) on the original image to interpret the model’s spatial attention.

**Why this matters:**
Grad-CAM translates numerical deep-learning processes into human-interpretable visuals, letting you see which features or textures influenced a model’s decision — crucial for building trust, identifying biases, and diagnosing domain shift.

In [None]:
# 8) Grad-CAM on one sample (ResNet-50 layer4)

# --- Define a lightweight Grad-CAM utility class ---
class GradCAM:
    def __init__(self, model: torch.nn.Module, target_layer: torch.nn.Module):
        self.model = model
        self.target_layer = target_layer
        self.gradients = None     # To store gradients from backward pass
        self.activations = None   # To store activations from forward pass

        # Register hooks on the target layer to capture activations and gradients
        self.h1 = target_layer.register_forward_hook(self._fwd)
        # Use full backward hook (register_backward_hook is deprecated)
        self.h2 = target_layer.register_full_backward_hook(self._bwd)

    def _fwd(self, m, i, o):
        # Save forward activations (feature maps)
        self.activations = o.detach()

    def _bwd(self, m, gin, gout):
        # Save gradients flowing back from the output
        self.gradients = gout[0].detach()

    def __call__(self, inp: torch.Tensor, class_idx: int | None = None):
        # Zero existing gradients to avoid accumulation
        self.model.zero_grad(set_to_none=True)
        logits = self.model(inp)

        # If no class index provided, use the model's top-predicted class
        if class_idx is None:
            class_idx = int(logits.argmax(dim=1).item())

        # Compute gradient of the target class score w.r.t. activations
        score = logits[:, class_idx]
        score.backward(retain_graph=True)

        # Retrieve stored gradients and activations
        grads = self.gradients
        acts = self.activations

        # Compute per-channel weights: mean gradient over spatial dimensions
        w = grads.mean(dim=(2,3), keepdim=True)

        # Weighted sum of activations → class activation map (CAM)
        cam = torch.relu((w * acts).sum(dim=1))[0].cpu().numpy()

        # Normalize CAM to [0,1] for visualization
        cam -= cam.min() if cam.max() != cam.min() else 0
        cam /= cam.max() if cam.max() > 0 else 1
        return cam

# --- Select first sample for Grad-CAM visualization ---
first_name, first_img = next(iter(patches.items()))
first_inp = preprocess(first_img).unsqueeze(0).to(device)

# Run forward pass to obtain predicted class index
with torch.no_grad():
    logits = model(first_inp)
cls_idx = int(logits.argmax(dim=1).item())

# --- Generate Grad-CAM heatmap for layer4 ---
cam = GradCAM(model, model.layer4)(first_inp, cls_idx)

# --- Plot original image and Grad-CAM overlay ---
fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].imshow(first_img)
ax[0].set_title(f'{first_name} input')
ax[0].axis('off')

# Overlay CAM on image with transparency (jet colormap)
ax[1].imshow(first_img)
ax[1].imshow(cam, cmap='jet', alpha=0.4)
ax[1].set_title('Grad-CAM layer4')
ax[1].axis('off')

plt.tight_layout()

# --- Save figure for reproducibility ---
cam_path = FIGS / 'week2_gradcam_layer4.png'
plt.savefig(cam_path, dpi=200)
print('Saved:', cam_path)

**Outcome:**

This cell generates and saves a two-panel figure showing:

1. The original Sentinel-2 patch (left)
2. The Grad-CAM overlay highlighting the areas that most influenced ResNet-50’s top prediction (right)

Saved to `figures/week2_gradcam_layer4.png`.

Interpretation:

- Red/yellow regions = high activation importance
- Blue/transparent areas = low influence

This indicates which visual cues drove the model’s decision — useful for evaluating whether it’s reasoning sensibly or relying on artifacts.

## **🌟 The More You Know! — How Grad-CAM Sees Inside CNNs**

### **🔍 The Core Idea**

Grad-CAM computes a weighted sum of a convolutional layer’s activations, where each channel’s weight reflects **how important that feature map was** for the chosen class.

Formally:

\text{Grad-CAM}(x,y) = \text{ReLU}\left( \sum_k \alpha_k A^k_{x,y} \right), \quad
\alpha_k = \frac{1}{Z}\sum_i\sum_j \frac{\partial y^c}{\partial A^k_{ij}}

This produces a coarse spatial heatmap showing where the model “looked.”

### **🧠 Why Layer 4?**

- Early ResNet layers capture low-level features (edges, textures).
- Deeper layers like **layer4** encode high-level semantic patterns (objects, shapes).
    
    Using layer4 gives you a meaningful, object-level attention map rather than noisy textures.
    

### **🎨 Reading the Heatmap**

- **Bright areas** = strong influence on classification.
- **Dark areas** = minimal effect.
- The overlay’s transparency (alpha=0.4) allows you to see both the image and attention simultaneously.

### **🪄 Conceptual Analogy**

Imagine highlighting the parts of a photograph that most convinced an ecologist that an image shows “forest.”

Grad-CAM does that automatically for your CNN — it’s the model’s *eye-tracking heatmap*.

### **⚖️ Limitations**

- Grad-CAM provides coarse spatial insights, not pixel-level accuracy.
- Interpretations depend on the chosen layer — earlier layers give finer but less semantically clear results.
- Still, it’s one of the most intuitive and widely used **explainability tools** in deep learning.

# 9. Export predictions to CSV

**Objective:**
Export all inference results (predicted labels and probabilities for each image patch) into a CSV summary file for transparent documentation and later analysis.

**Key Components:**
	•	CSV output file: /reports/week2_prediction_summary.csv
	•	Fields included:
	•	class — your true label (Forest, Water, etc.)
	•	top1_label / top1_prob — model’s most confident prediction and probability
	•	top5_labels / top5_probs — semicolon-separated lists of all top-5 predictions and probabilities
	•	csv.writer() — Python’s standard library writer for creating structured tabular files without extra dependencies.

**Why this matters:**
Saving structured outputs like this makes your workflow auditable, shareable, and reproducible — core principles of applied machine learning and scientific computing.

In [None]:
# 9) Save prediction summary CSV
import csv

# Define output file path inside the reports directory
csv_path = REPORTS / 'week2_prediction_summary.csv'

# Open file for writing (overwrite if exists)
with open(csv_path, 'w', newline='') as f:
    w = csv.writer(f)

    # Write header row
    w.writerow(['class', 'top1_label', 'top1_prob', 'top5_labels', 'top5_probs'])

    # Write one row per sample
    for r in results:
        top1_label = r['top5_labels'][0]  # Most likely predicted class
        top1_prob = r['top5_probs'][0]    # Corresponding probability

        # Join lists with semicolons for compact storage
        w.writerow([
            r['class'],                                   # Ground truth label
            top1_label, f'{top1_prob:.4f}',               # Top-1 label and probability
            ';'.join(r['top5_labels']),                   # All top-5 labels
            ';'.join([f'{p:.4f}' for p in r['top5_probs']])  # All top-5 probabilities
        ])

print('Saved CSV:', csv_path)

**Outcome:**

This cell generates a CSV summarizing model predictions for each sampled patch and confirms the path: `reports/week2_prediction_summary.csv`.

This structured summary enables:

- Quantitative review of model predictions
- Integration with other analytical tools (e.g., pandas, Excel, Tableau)
- Version-controlled record of inference results — useful for tracking model behavior over time

You now have numerical (CSV), visual (multi-panel figure), and interpretability (Grad-CAM) artifacts, forming a complete analytical narrative for Week 2.

## **🌟 The More You Know! — Why Saving Results Matters**

### **📁 Reproducibility & Transparency**

Saving results to CSV formalizes your analysis — anyone (including future-you) can re-open the file and reproduce or extend your work without rerunning the notebook.

It’s the digital equivalent of **archiving lab notes**.

### **🧩 Structured Data Pipelines**

In real-world ML workflows, this step transitions you from experimentation to production-like structure.

Downstream tools (e.g., dashboards, metrics scripts, model comparisons) all expect tidy, structured outputs like this.


## **🏁 Week 2 Notebook Summary — What You’ve Built**

| **Step** | **Focus** | **Key Concept** |
| --- | --- | --- |
| 1–2 | Setup & AOI Load | Geospatial data handling, reproducibility |
| 3 | Sentinel-2 Composite | Cloud masking & median compositing |
| 4–5 | Sampling & Patching | Bridging Earth Engine and PyTorch |
| 6 | Inference | Transfer learning, domain shift |
| 7 | Visualization | Qualitative interpretability |
| 8 | Grad-CAM | Model explainability |
| 9 | CSV Export | Reproducible results & reporting |

You’ve essentially built a complete mini-pipeline for geospatial AI model evaluation — end-to-end, explainable, and research-grade.

**Next steps:**
- Fine-tune ResNet on a small labeled Sentinel-2 set; compare pre/post Grad-CAM.
- Swap in a remote-sensing model (e.g., ResNet pretrained on BigEarthNet) and re-run.