resnets & cam

ethen8181 · Sep 5, 2018 · 445b85f · 445b85f
1 parent b49db4b
commit 445b85f
Show file tree

Hide file tree

Showing 33 changed files with 16,502 additions and 72 deletions.
diff --git a/README.md b/README.md
@@ -124,11 +124,13 @@ Curated notes on deep learning.
 
 #### keras : 2016.06.29
 
-Note that this is mostly a API walkthrough, not a tutorial on the details of deep learning. For those interested there's also a [keras cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf) that may come in handy.
+For those interested there's also a [keras cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf) that may come in handy.
 
 - Multi-layers Neural Network (keras basics). [[nbviewer](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/keras/nn_keras_basics.ipynb)][[html](http://ethen8181.github.io/machine-learning/keras/nn_keras_basics.html)]
 - Multi-layers Neural Network hyperparameter tuning via scikit-learn like API. [[nbviewer](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/keras/nn_keras_hyperparameter_tuning.ipynb)][[html](http://ethen8181.github.io/machine-learning/keras/nn_keras_hyperparameter_tuning.html)]
-- Convolutional Neural Network (CNN) - image classification basics. [[nbviewer](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/keras/cnn_image_keras.ipynb)][[html](http://ethen8181.github.io/machine-learning/keras/cnn_image_keras.html)]
+- Convolutional Neural Network (CNN)
+    - Image classification basics. [[nbviewer](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/keras/cnn_image_keras.ipynb)][[html](http://ethen8181.github.io/machine-learning/keras/cnn_image_keras.html)]
+    - Introduction to Residual Networks (ResNets) and Class Activation Maps (CAM). [[nbviewer](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/keras/resnet_cam/resnet_cam.ipynb)][[html](http://ethen8181.github.io/machine-learning/keras/resnet_cam/resnet_cam.html)]
 - Recurrent Neural Network (RNN) - language modeling basics. [[nbviewer](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/keras/rnn_language_model_basic_keras.ipynb)][[html](http://ethen8181.github.io/machine-learning/keras/rnn_language_model_basic_keras.html)]
 
 #### text_classification : 2016.06.15

diff --git a/changelog.md b/changelog.md
@@ -2,6 +2,16 @@
 
 The changelog will record what content was **changed** (e.g. changed an existing paragraph to a better-explained version, re-ran the notebook using an updated version of the package), **added** (e.g. a completely new jupyter notebook).
 
+## [2018-09]
+
+### Added
+
+Introduction to Residual Networks (ResNets) and Class Activation Maps (CAM). [[nbviewer](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/keras/resnet_cam/resnet_cam.ipynb)][[html](http://ethen8181.github.io/machine-learning/keras/resnet_cam/resnet_cam.html)]
+
+### Changed
+
+Hosted html-version of all jupyter notebook on github pages.
+
 ## [2018-08]
 
 ### Added

diff --git a/deep_learning/cnn_image_tensorflow.html b/deep_learning/cnn_image_tensorflow.html
diff --git a/deep_learning/cnn_image_tensorflow.ipynb b/deep_learning/cnn_image_tensorflow.ipynb
diff --git a/deep_learning/rnn/2_tensorflow_lstm.html b/deep_learning/rnn/2_tensorflow_lstm.html
@@ -12289,7 +12289,8 @@ <h2 id="LSTM-Step-by-Step-Walk-Through">LSTM Step by Step Walk Through<a class="
 \end{align}</p>
 <p>Looking at this formula more carefully, we can see that the information carried by the previous cell state, $C_{t-1}$ will not be lost if its weight, i.e. the forget gate $f_t$ is on (close to 1), making LSTM better at learning long-term dependencies compared to vanilla RNN.</p>
 <p>In the case of the language model, this is where we'd actually drop the information about the old subject's gender and add the new information, as we decided in the previous steps.</p>
-<p>Finally, we need to decide what we're going to output. This output will be a filtered version of our cell state. First, we run it through a sigmoid layer which decides what parts of the cell state we're going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.</p>
+<p>Finally, we need to decide what we're going to output. This output will be a filtered version of our cell state. First, we run it through a sigmoid layer which decides what parts of the cell state we're going to output. This is essentially our output gate. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.</p>
+<p><img src="img/output_gate.png" width="40%" height="40%"></p>
 <p>\begin{align}
 o_t &amp;= \sigma(W_o \cdot [ h_{t - 1}, x_t ] + b_o) \\
 h_t &amp;= o_t * tanh(C_t)

diff --git a/deep_learning/rnn/2_tensorflow_lstm.ipynb b/deep_learning/rnn/2_tensorflow_lstm.ipynb
@@ -559,7 +559,10 @@
     "\n",
     "In the case of the language model, this is where we'd actually drop the information about the old subject's gender and add the new information, as we decided in the previous steps.\n",
     "\n",
-    "Finally, we need to decide what we're going to output. This output will be a filtered version of our cell state. First, we run it through a sigmoid layer which decides what parts of the cell state we're going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.\n",
+    "Finally, we need to decide what we're going to output. This output will be a filtered version of our cell state. First, we run it through a sigmoid layer which decides what parts of the cell state we're going to output. This is essentially our output gate. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.\n",
+    "\n",
+    "\n",
+    "<img src=\"img/output_gate.png\" width=\"40%\" height=\"40%\">\n",
     "\n",
     "\\begin{align}\n",
     "o_t &= \\sigma(W_o \\cdot [ h_{t - 1}, x_t ] + b_o) \\\\\n",

diff --git a/keras/cnn_image_keras.html b/keras/cnn_image_keras.html
@@ -12213,7 +12213,7 @@ <h1 id="Convolutional-Network">Convolutional Network<a class="anchor-link" href=
 <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># one-hot encode the class (target) vectors</span>
 <span class="n">n_class</span> <span class="o">=</span> <span class="mi">10</span>
 <span class="n">y_train</span> <span class="o">=</span> <span class="n">np_utils</span><span class="o">.</span><span class="n">to_categorical</span><span class="p">(</span><span class="n">y_train</span><span class="p">,</span> <span class="n">n_class</span><span class="p">)</span>
-<span class="n">y_test</span> <span class="o">=</span> <span class="n">np_utils</span><span class="o">.</span><span class="n">to_categorical</span><span class="p">(</span><span class="n">y_test</span> <span class="p">,</span> <span class="n">n_class</span><span class="p">)</span>
+<span class="n">y_test</span> <span class="o">=</span> <span class="n">np_utils</span><span class="o">.</span><span class="n">to_categorical</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">n_class</span><span class="p">)</span>
 <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;y_train shape:&#39;</span><span class="p">,</span> <span class="n">y_train</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
 </pre></div>
 

diff --git a/keras/cnn_image_keras.ipynb b/keras/cnn_image_keras.ipynb
@@ -395,7 +395,7 @@
     "# one-hot encode the class (target) vectors\n",
     "n_class = 10\n",
     "y_train = np_utils.to_categorical(y_train, n_class)\n",
-    "y_test = np_utils.to_categorical(y_test , n_class)\n",
+    "y_test = np_utils.to_categorical(y_test, n_class)\n",
     "print('y_train shape:', y_train.shape)"
    ]
   },

diff --git a/keras/resnet_cam/__pycache__/explainer.cpython-36.pyc b/keras/resnet_cam/__pycache__/explainer.cpython-36.pyc
diff --git a/keras/resnet_cam/explainer.py b/keras/resnet_cam/explainer.py
@@ -0,0 +1,88 @@
+import numpy as np
+from scipy.ndimage import zoom
+from keras.models import Model
+from keras.applications.resnet50 import preprocess_input
+from keras.preprocessing.image import load_img, img_to_array
+
+
+class CAMExplainer:
+    """CAM (Class Activation Map) Explainer"""
+
+    def __init__(self, model, target_size=(224, 224)):
+        self.model = model
+        self.target_size = target_size
+
+        self.class_weights_ = None
+        self.resnet50_cam_layers_ = None
+
+    def fit(self, img_path):
+        if self.class_weights_ is None:
+            self.resnet50_cam_layers_, self.class_weights_ = self._get_resnet50_cam_info()
+
+        self.img_, keras_img = self._read_and_process_img(img_path)
+        self.cam_, self.predicted_class_ = self._create_cam(keras_img)
+        return self
+
+    def plot(self):
+        import matplotlib.pyplot as plt
+        fig, ax = plt.subplots()
+
+        ax.imshow(self.img_, alpha=0.5)
+        ax.imshow(self.cam_, cmap='jet', alpha=0.5)
+        plt.show()
+
+    def _create_cam(self, img):
+        # before_gap_output will be of shape [1, 7, 7, 2048] for resnet50
+        before_gap_output, prediction = self.resnet50_cam_layers_.predict(img)
+        img_width = before_gap_output.shape[1]
+        img_height = before_gap_output.shape[2]
+        n_activation = before_gap_output.shape[3]
+
+        predicted_class = np.argmax(prediction)
+        dominate_class_weight = self.class_weights_[:, predicted_class]
+
+        # we resize the shape of the activation so we can perform a dot product with
+        # the dominated class weight
+        before_gap_output = np.squeeze(before_gap_output).reshape((-1, n_activation))
+        cam = np.dot(before_gap_output, dominate_class_weight).reshape((img_width, img_height))
+
+        # we reshape it back to the target image size
+        # so we can overlay the class activation map on top of our image later
+        # order 1 = bi-linear interpolation was fast enough for this use-case
+        width_scale_ratio = self.target_size[0] // img_width
+        height_scale_ratio = self.target_size[1] // img_height
+        cam = zoom(cam, (width_scale_ratio, height_scale_ratio), order=1)
+        return cam, predicted_class
+
+    def _read_and_process_img(self, img_path):
+        """
+        Reads in a single image, resize it to the specified target size
+        and performs the same preprocessing on the image as the original
+        pre-trained model.
+        """
+        img = load_img(img_path, target_size=self.target_size)
+        img_arr = img_to_array(img)
+
+        # keras works with batches of images, since we only have 1 image
+        # here, we need to add an additional dimension to turn it into
+        # shape [samples, size1, size2, channels]
+        keras_img = np.expand_dims(img_arr, axis=0)
+
+        # different pre-trained model preprocess the images differently
+        # we also preprocess our images the same way to be consistent
+        keras_img = preprocess_input(keras_img)
+        return img, keras_img
+
+    def _get_resnet50_cam_info(self):
+        # we need the output of the activation layer right before the
+        # global average pooling (gap) layer and the last dense/softmax
+        # layer that generates the class prediction
+        before_gap_layer = self.model.layers[-4]
+        class_pred_layer = self.model.layers[-1]
+
+        outputs = before_gap_layer.output, class_pred_layer.output
+        resnet50_cam_layers = Model(inputs=self.model.input, outputs=outputs)
+
+        # only access the first element of weights, we won't be needing the bias term here
+        class_weights = class_pred_layer.get_weights()[0]
+        return resnet50_cam_layers, class_weights
diff --git a/keras/resnet_cam/images/bmw.png b/keras/resnet_cam/images/bmw.png
diff --git a/keras/resnet_cam/images/boat.png b/keras/resnet_cam/images/boat.png
diff --git a/keras/resnet_cam/images/clint_eastwood.jpg b/keras/resnet_cam/images/clint_eastwood.jpg
diff --git a/keras/resnet_cam/images/dog.png b/keras/resnet_cam/images/dog.png
diff --git a/keras/resnet_cam/images/jemma.png b/keras/resnet_cam/images/jemma.png
diff --git a/keras/resnet_cam/images/office.png b/keras/resnet_cam/images/office.png
diff --git a/keras/resnet_cam/images/scotch.png b/keras/resnet_cam/images/scotch.png
diff --git a/keras/resnet_cam/images/soccer_ball.jpg b/keras/resnet_cam/images/soccer_ball.jpg
diff --git a/keras/resnet_cam/images/tv.png b/keras/resnet_cam/images/tv.png
diff --git a/keras/resnet_cam/img/cam_example.png b/keras/resnet_cam/img/cam_example.png
diff --git a/keras/resnet_cam/img/cam_process.png b/keras/resnet_cam/img/cam_process.png
diff --git a/keras/resnet_cam/img/convblock.png b/keras/resnet_cam/img/convblock.png
diff --git a/keras/resnet_cam/img/deep_network_error.png b/keras/resnet_cam/img/deep_network_error.png
diff --git a/keras/resnet_cam/img/deep_resnet_error.png b/keras/resnet_cam/img/deep_resnet_error.png
diff --git a/keras/resnet_cam/img/gap.png b/keras/resnet_cam/img/gap.png
diff --git a/keras/resnet_cam/img/idblock3.png b/keras/resnet_cam/img/idblock3.png
diff --git a/keras/resnet_cam/img/residual.png b/keras/resnet_cam/img/residual.png
diff --git a/keras/resnet_cam/img/residual_block.png b/keras/resnet_cam/img/residual_block.png