##1. Essay: Are Neural Nets Destined to Fail
  
In the light of the downfalls of the once popular perceptrons and expert systems, it definitely makes sense to keep a tempered view of the current hype around deep learning and neural networks. These new techniques are making remarkable strides forward, allowing AI to solve problems that were too challenging for the perceptrons and expert systems.


Surrounded by all the hype, it's challenging to discern any particular weaknesses with neural nets akin to the insurmountable brittleness of the expert systems. However, if I had to guess at which problems are likely to be most exacerpated as neural nets grow: I'm guessing both the obscurity of our black box approach will be a contender. With increasing concern around the validity of some of these algorithms and their systemic impact in our lives (see *Weapons of Math Destruction* and the like), being able to peak inside the "mind of the AI" will prove invaluable, and might not be possible with our current approach. Another drawback might come from the data collection side. Perhaps there might be problems that will prove too challenging to find sufficient data for or the data's biases will prove insurmountable. Finding data to create a "wise" AI seems unlikely, since we can hardly figure out how to become wiser ourselves.

On the other hand, the capabilities of deep learning seem to be particularly well-suited for a variety of tasks. With more big data and more GPUs to process the data with, neural networks have done quite well. I suspect, while we might find better ways to enhance or augment these approaches, deep learning is likely to be around for the long-haul in some capacity or another. 

##2. Backpropagation Computation Example: XOR: [1,1] = 0



1. Fill in random weights (same as in class).

    $\begin{aligned}
    &\begin{bmatrix}
    w_{i_1,h_1} & w_{i_1,h_2} \\
    w_{i_2,h_1} & w_{i_2,h_2}
    \end{bmatrix}
    \leftarrow
    \begin{bmatrix}
    0.11 & 0.12 \\
    0.21 & 0.08
    \end{bmatrix} \\
    &\begin{bmatrix}
    w_{h_1, o_1} \\ 
    w_{h_2, o_1} 
    \end{bmatrix}
    \leftarrow
    \begin{bmatrix}
    0.14 \\
    0.15
    \end{bmatrix}
    \end{aligned}$
    
2. Compute the output for one sample (XOR: `[1, 1]` &rarr; `0`).

    $\begin{aligned}
    o_j &= 
    \begin{bmatrix}
    1 & 1 \\ 
    \end{bmatrix}
    \cdot
    \begin{bmatrix}
    0.11 & 0.12 \\
    0.21 & 0.08
    \end{bmatrix}
    \cdot
    \begin{bmatrix}
    0.14 \\
    0.15
    \end{bmatrix}
    \\ &=
    \begin{bmatrix}
    1 * 0.11 + 1 * 0.21 & 1 * 0.12 + 1 * 0.08
    \end{bmatrix}
    \cdot
    \begin{bmatrix}
    0.14 \\ 
    0.15
    \end{bmatrix}
    \\ &=
    \begin{bmatrix}
    0.34 & 0.20
    \end{bmatrix}
    \cdot
    \begin{bmatrix}
    0.14 \\ 
    0.15 
    \end{bmatrix}
    \\ &=
    \begin{bmatrix}
    0.34 * 0.14 + 0.20 * 0.15
    \end{bmatrix}
    \\ &= 0.0748
    \end{aligned}
    \\
    $

3. Compute the error (and more importantly, the delta).

    $\begin{aligned}
    L_2Error &= (1 - 0.0748)^2 \\
    &= 0.8559 \\
    \Delta_{o_1} &= (1 - 0.0748) \\
    &= 0.9252 \\
    \end{aligned}$

4. Backpropagate updates back through the network, assuming: 
    $learning\_rate = 0.05$; 
    identity activation functions for all nodes(f(x)=x and f'(x) = 1).
     
    $\begin{aligned}
    \begin{bmatrix}
    w_{h_1, o_1} \\ 
    w_{h_2, o_1}
    \end{bmatrix} &\leftarrow 
    \begin{bmatrix}
    0.14 \\ 
    0.15 
    \end{bmatrix} + 0.05 \cdot 
    \begin{bmatrix}
    0.34 \\ 
    0.20 
    \end{bmatrix} \cdot 1.0 \cdot 0.9252 \\\\
    &= 
    \begin{bmatrix}
    0.14 \\ 
    0.15 
    \end{bmatrix} + 
    \begin{bmatrix}
    0.05 * 0.32 * 1.0 * 0.9252 \\
    0.05 * 0.20 * 1.0 * 0.9252 
    \end{bmatrix} \\
    &= 
    \begin{bmatrix}
    0.14 \\ 
    0.15 
    \end{bmatrix} +
    \begin{bmatrix}
    0.0148032 \\
    0.009252 
    \end{bmatrix} \\
    &=
    \begin{bmatrix}
    0.1548032 \\ 
    0.159252
    \end{bmatrix}
    \end{aligned}$

    $\begin{aligned}
    \begin{bmatrix}
    w_{i_1,h_1} & w_{i_1,h_2} \\ 
    w_{i_2,h_1} & w_{i_2,h_2}
    \end{bmatrix} &\leftarrow 
    \begin{bmatrix}
    0.11 & 0.12 \\
    0.21 & 0.08
    \end{bmatrix} + 0.05 \cdot
    \begin{bmatrix}
    1 & 1 \\ 
    1 & 1
    \end{bmatrix} \cdot 1.0 \odot
    \begin{bmatrix}
    0.14 & 0.15 \\ 
    0.14 & 0.15
    \end{bmatrix} \cdot 0.9252 \\ &=
    \begin{bmatrix}
    0.11 & 0.12 \\
    0.21 & 0.08
    \end{bmatrix} + 
    \begin{bmatrix}
    0.05 * 1 * 1.0 & 0.05 * 1 * 1.0 \\ 
    0.05 * 1 * 1.0 & 0.05 * 1 * 1.0 \\ 
    \end{bmatrix} \odot 
    \begin{bmatrix}
    0.14 * 0.9252 & 0.15 * 0.9252\\ 
    0.14 * 0.9252 & 0.15 * 0.9252 
    \end{bmatrix} \\ &=
    \begin{bmatrix}
    0.11 & 0.12 \\
    0.21 & 0.08
    \end{bmatrix} + 
    \begin{bmatrix}
    0.05 & 0.05 \\ 
    0.05 & 0.05 
    \end{bmatrix} \odot 
    \begin{bmatrix}
    0.1342 & 0.1438 \\
    0.1342 & 0.1438
    \end{bmatrix} \\ &=
    \begin{bmatrix}
    0.11 & 0.12 \\
    0.21 & 0.08
    \end{bmatrix} + 
    \begin{bmatrix}
    0.05 * 0.1342 & 0.05 * 0.1438 \\
    0.05 * 0.1342 & 0.05 * 0.1438
    \end{bmatrix} \\ &=
    \begin{bmatrix}
    0.11 & 0.12 \\
    0.21 & 0.08
    \end{bmatrix} +     
    \begin{bmatrix}
    0.0067 & 0.0072 \\
    0.0067 & 0.0072
    \end{bmatrix} \\ &= 
    \begin{bmatrix}
    0.1167 & 0.1272 \\
    0.2167 & 0.0872
    \end{bmatrix}  
    \end{aligned}$

## 3. Keras-based CNN

Herein begins the most fashionable CNN we've done yet!

First we just need to pull in the images from *fashion_mnist*.

In [2]:
from keras.datasets import fashion_mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Using TensorFlow backend.


Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz


Then we can create a model with convolution layers. The one below is the same structure as the example with the regular *mnist* dataset.

In [3]:
from keras import layers
from keras import models

model = models.Sequential()

# Configure a convnet with 3 layers of convolutions and max pooling.
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# Add layers to flatten the 2D image and then do a 10-way classification.
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

model.summary()

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dens

And then we can train the model and see how well it works.

In [0]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)
model.evaluate(test_images, test_labels)

Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.27244248328208925, 0.9053]

Which seems to do a decent job. Not unsurprisingly, it does a little worse with the test data than it does with the training data, but a drop from 0.915 to 0.905 is pretty tolerable.
