# MTH 4326 / 5326 â€” Explainable AI  
Homework 1: Differential Explanations and Curvature

## Purpose

The goal of this assignment is to develop a rigorous understanding of differentiable explanations from three angles

- Theory: what questions explanations answer and what assumptions they rely on  
- Mathematics: the precise objects being computed and their structural properties  
- Practice: how these objects behave when computed in real neural networks  

Computation is used as an experimental tool to probe and validate theoretical claims, not as an end in itself.

## Deadline

Feb 16, 2026

## Points

50 points

## Academic Honesty

All submitted work must be your own. You may consult textbooks, documentation, and online resources, and you may discuss ideas with others at a high level. However, you must not copy or paste solution blocks or derivations. Your submission should reflect your own reasoning, derivations, and code structure.

## Submission Instructions

Submit a single Jupyter notebook (.ipynb) and, if needed, images of mathematical work (.png or .jpg).

Your submission must include:

- Mathematical work, clearly presented (typed in Markdown cells or photos of handwritten work)

- Written explanations interpreting results and connecting them to theory

- Well-commented code

- Outputs such as tables, printed values, and plots, with clear labeling

Do not submit compressed files (zip, rar, etc.). Only the notebook should be submitted.

## I. First-Order vs. Second-Order Differential Explanations

### Problem 1. Differential Explanations as First-Order Approximations (10 points)

Let $f : \mathbb{R}^d \to \mathbb{R}$ be twice differentiable at a point $x_0$.

a. Write the second-order Taylor expansion of $f(x_0 + \delta x)$ about $x_0$, clearly identifying each term.

b. State precisely what class of functions would be *perfectly* explained by a gradient explanation, in the sense that the first-order approximation is exact. Prove your claim.


### Problem 2. Curvature, Stability, and Scalarization (10 points)

Define the directional second derivative

$$
\kappa(v) = v^\top H_f(x_0)\, v,
\qquad \|v\| = 1.
$$

a. Show that if $\kappa(v) = 0$, then the gradient explanation along direction $v$ is locally stable to second order.

b. Show that large $|\kappa(v)|$ implies that the linear explanation changes rapidly under finite perturbations along $v$.

Now let $g : \mathbb{R}^k \to \mathbb{R}$ be a scalarization and define

$$
h(x) = g(f(x)).
$$

c. Write an expression for $H_h(x)$ in terms of $\nabla f(x)$, $H_f(x)$, and derivatives of $g$.


### Problem 3. Second-Order Explanations on MNIST (15 points)

In this problem, you will compute and interpret second-order explanations using the Hessian of the model output with respect to the input.

You must use the MNIST dataset and train a neural network classifier with scalarization given by the maximum class score as the output.  
Assume batch size = 1 for all Hessian computations.

Use softplus activations so that second derivatives are well-defined.

a. Train or load a neural network on MNIST with softplus nonlinearities, and fix a correctly classified input image
   $x_0 \in \mathbb{R}^{28 \times 28}$ with scalar output $f(x)$.

b. Using PyTorch automatic differentiation, compute the full Hessian
   
   $$H_f(x_0) = \nabla_x^2 f(x_0)$$
   
   with respect to the input $x$.

c. Plot the spectrum of eigenvalues of $H_f(x_0)$ and interpret:
   - the sign of the eigenvalues,
   - the magnitude of the eigenvalues,
   - and whether the local geometry near $x_0$ is closer to locally flat, locally convex/concave, or strongly anisotropic.

d. Repeat this analysis using the same network architecture but replacing softplus activations with ReLU.
   Contrast the source of explanation instability in each case.

## II. Practical Differential Explanations on Dogs vs. Cats

In this part, you will study differential explanations in a real image classification setting using the Dogs vs. Cats dataset:
https://github.com/rtwhite1546/Fall-2024-Deep-Learning/tree/main/data/dogs-vs-cats

You will use the following explanation methods throughout:
raw input gradients, SmoothGrad, and VarGrad.

Unless otherwise stated, explanations should be computed with respect to a scalar output (e.g., a class logit or logit margin) and visualized alongside the original image. Any postprocessing (normalization, absolute value, masking) must be stated clearly and applied consistently.

### Problem 1. Raw Gradients, SmoothGrad, and VarGrad (5 points)

Train a dogs-vs-cats classifier.

a. Select at least five test images and compute raw gradient, SmoothGrad, and VarGrad explanations for the predicted class score.

b. Visualize the three explanations for each image alongside the original image.

c. Briefly interpret how the explanations differ, commenting on noise, localization, and what additional structure (if any) SmoothGrad and VarGrad reveal relative to raw gradients.

### Problem 2. Incorrect Predictions and Competing Explanations (5 points)

Select at least five test images that your model classifies incorrectly.

a. For each image, compute explanations for both the predicted-class score and the ground-truth-class score using raw gradients, SmoothGrad, and VarGrad.

b. Visualize and compare the explanations for the two scalar outputs on the same image.

c. Analyze how the explanations differ between the predicted and true class scores, and what this comparison suggests about why the model failed.

### Problem 3. Multi-Target Explanations in Multi-Animal Scenes (5 points)

Find three images online with the following properties:

1. One image that contains both a dog and a cat.
2. One image that contains multiple dogs and/or multiple cats.
3. One image of your choice that you believe will be challenging or interesting for explanation (e.g., occlusion, unusual pose, ??)

Record the source URL for each image.

a. For each image, compute SmoothGrad explanations for both the dog-class score and the cat-class score, and visualize them separately.

b. For each image, produce a combined visualization in which both explanations are overlaid on the same image using different colormaps. You may suppress low-magnitude regions if desired, but must state the rule used.

c. Compare the three images. Discuss how explanation localization and overlap differ across the cases, and what this reveals about the limitations of differential explanations in multi-object scenes.


## Part III. Advanced Analysis of Differential Explanations (10 points)

This part is required for students enrolled in ORP 5050 and optional for students enrolled in MTH 4326 (earn up to +5).

### Problem 1. Path-Based Differential Explanations Between Classes

Select one image of a cat and one image of a dog from the Dogs vs. Cats dataset.

a. Define a straight-line path in input space from the cat image to the dog image,

$$x(t) = (1 - t)\,x_{\text{cat}} + t\,x_{\text{dog}}, \quad t \in [0,1],$$

and construct a sequence of interpolated images along the path using at least 25 evenly spaced values of $t$.

b. For each value of $t$, compute differential explanations of the predicted class score using raw input gradients, SmoothGrad, and VarGrad.

c. Create a video of approximately 5 seconds duration at a frame rate of at least 5 frames per second that, as $t$ increases, shows the interpolated image together with the corresponding raw gradient, SmoothGrad, and VarGrad explanations.

d. Interpret how the predictions and explanations evolve along the path, commenting on smooth versus abrupt changes and what this reveals about the limitations of local (differential) explanations for finite interventions.