diff --git a/20_CAM.ipynb b/20_CAM.ipynb
index cc7df84d2..9d0c8246c 100644
--- a/20_CAM.ipynb
+++ b/20_CAM.ipynb
@@ -435,7 +435,7 @@
    "source": [
     "The method we just saw only lets us compute a heatmap with the last activations, since once we have our features, we have to multiply them by the last weight matrix. This won't work for inner layers in the network. A variant introduced in the paper [Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/abs/1611.07450) in 2016 uses the gradients of the final activation for the desired class: if you remember a little bit about the backward pass, the gradients of the output of the last layer with respect to the input of that layer is equal to the layer weights, since it is a linear layer.\n",
     "\n",
-    "With deeper layers, we still want the gradients, but they won't just be equal to the weights any more. We have to calculate them. The gradients of every are calculated for us by PyTorch during the backward pass, but they're not stored (except for tensors where `requires_grad` is `True`). We can, however, register a hook on the *backward* pass, which PyTorch will give the gradients to as a parameter, so we can store them there. We'll use a `HookBwd` class that will work like `Hook`, but intercepts and stores gradients, instead of activations:"
+    "With deeper layers, we still want the gradients, but they won't just be equal to the weights any more. We have to calculate them. The gradients of every layer are calculated for us by PyTorch during the backward pass, but they're not stored (except for tensors where `requires_grad` is `True`). We can, however, register a hook on the *backward* pass, which PyTorch will give the gradients to as a parameter, so we can store them there. We'll use a `HookBwd` class that will work like `Hook`, but intercepts and stores gradients, instead of activations:"
    ]
   },
   {