# Neural Networks

## Introduction

popular

### Setup
neurons w/ weights $w$ (+ biases $b$) and nonlinearity/activation $\phi$

$\phi(\sum_i x_i w_i + b_i)$

In layers w/ weights $W  \in \mathbb{R}^{n_l \times n_{l+1}}$ and biases $b_l \in \mathbb{R}^{n_k}$ w/ $n_l$ neurons in layer $l$:
$$\phi(W_{l}x_l + b_l)$$ 
(abuse of notation w/ $\phi$)

(if input points are $x \in \mathbb{R}^d$, then $n_l = d$)

Do this for all layers to get some output values in your final layer (*forward pass*)

set initial weights $W_{l}$ randomly

*Tons* of different shapes/types of NNs

split data into  train and test (80/20ish is good)

### Backpropagation
Loss $L(y)$ is a function of the output $y$ and the target $t$, e.g.:
$$L(y) = (t-j)^2$$

Calculate derivative wrt each weight $D_n = \frac{\partial L(y)}{\partial w_n}$ and use gradient descent to update weights:
$$w_n \leftarrow w_n - \eta D_n$$
for learning rate $\eta > 0$


In [113]:
from sklearn.datasets import load_iris
X, y = load_iris(as_frame = True, return_X_y=True)

In [114]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [115]:
import tensorflow as tf
train = tf.data.Dataset.from_tensor_slices((X_train, y_train))
test = tf.data.Dataset.from_tensor_slices((X_test, y_test))

In [116]:
train = train.repeat(20).shuffle(1000).batch(32)
test = test.batch(1)

In [117]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation=tf.nn.relu),   # hidden layer
    # tf.keras.layers.Dense(10, activation=tf.nn.relu),   # hidden layer
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(3, activation=tf.nn.softmax)  # output layer
])

model.compile(
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

model.fit(
    train,
    validation_data=test,
    epochs=10,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f00181a6740>

In [118]:
for pred_dict, expected in zip(predictions, ["setosa", "versicolor", "virginica"]):
    predicted_index = pred_dict.argmax()
    predicted = load_iris().target_names[predicted_index]
    probability = pred_dict.max()
    tick_cross = "✓" if predicted == expected else "✗"
    print(f"{tick_cross} Prediction is '{predicted}' ({100 * probability:.1f}%), expected '{expected}'")


✓ Prediction is 'setosa' (100.0%), expected 'setosa'
✓ Prediction is 'versicolor' (99.8%), expected 'versicolor'
✓ Prediction is 'virginica' (91.9%), expected 'virginica'


## Convolutional Neural Networks (CNNs)

### Image Kernel Convolutions
images are matrices of pixel values
use kernel to convolve over image to get new image (using padding at edges maybe so output image is same size as input image---e.g. zero padding (add a border of zeros) or mirror padding (add a border of identical pixels to the edge pixels))

e.g. for kernel $w$ and image w/ pixel coords $f(x,y)$ we get pixel value $g(x,y)$ where:
$$g(x,y) = w \ast f(x,y) = \sum_{dx=-a}^a \sum_{dy=-b}^b w(dx,dy) f(x-dx, y-dy)$$

e.g. for a 3x3 kernel:
$$ w = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 0 & -1 \\ 1 & 0 & -1 \end{bmatrix}$$

(^^directional edge detection kernel (I think?))

### CNNs

#### Convolutional Layers
We'll make an NN learn the convolution kernels for us! (i.e. learn the weights $w_{x,y}(dx,dy)$---i.e. the weights of the kernel depend on the pixels being convolved over.)
And we can stack these layers to get more complex kernels.

#### Pooling Layers
We can also use pooling layers to reduce the size of the image (e.g. max pooling). These just take a window of pixels and output the max value (or average value or something), meaning we can reduce the size of the image without losing too much information (downsampling).

#### Fully Connected Layers (FC/Dense Layers)
Fully connected layers are just like the ones we've seen before (i.e. in non-convolution-land), but we flatten the image first (i.e. we take the image and turn it into a vector of pixel values).

In [141]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        filters=16,
        kernel_size=5,
        padding="same",
        activation=tf.nn.relu
    ),
    tf.keras.layers.MaxPool2D((2, 2), (2, 2), padding="same"),
    tf.keras.layers.Conv2D(
        filters=32,
        kernel_size=5,
        padding="same",
        activation=tf.nn.relu
    ),
    tf.keras.layers.MaxPool2D((2, 2), (2, 2), padding="same"),
    tf.keras.layers.Conv2D(
        filters=64,
        kernel_size=5,
        padding="same",
        activation=tf.nn.relu
    ),
    tf.keras.layers.MaxPool2D((2, 2), (2, 2), padding="same"),
    tf.keras.layers.Conv2D(
        filters=128,
        kernel_size=5,
        padding="same",
        activation=tf.nn.relu
    ),
    tf.keras.layers.MaxPool2D((2, 2), (2, 2), padding="same"),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.4),
    tf.keras.layers.Dense(10, activation="softmax")
])

model.compile(
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

In [142]:
import tensorflow_datasets as tfds

ds_train, ds_test = tfds.load(
    "mnist",
    split=["train", "test"],
    as_supervised=True,
)

In [143]:
ds_train.element_spec

(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

In [144]:
def normalize_img(image, label):
    return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(normalize_img)

ds_train = ds_train.shuffle(1000)
ds_train = ds_train.batch(128)

ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(128)

In [160]:
model.fit(
    ds_train,
    validation_data=ds_test,
    epochs=20,
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7eff9b45e920>

In [161]:
from urllib.request import urlretrieve

for i in list(range(1,10)) + ["dog"]:
    urlretrieve(f"https://github.com/milliams/intro_deep_learning/raw/master/{i}.png", f"{i}.png")

In [162]:
import numpy as np
from skimage.io import imread

images = []
for i in list(range(1,10)) + ["dog"]:
    images.append(np.array(imread(f"{i}.png")/255.0, dtype="float32"))
images = np.array(images)[:,:,:,np.newaxis]
images.shape

(10, 28, 28, 1)

In [163]:
probabilities = model.predict(images)



In [164]:
truths = list(range(1, 10)) + ["dog"]

table = []
for truth, probs in zip(truths, probabilities):
    prediction = probs.argmax()
    if truth == 'dog':
        print(f"{truth}. CNN thinks it's a {prediction} ({probs[prediction]*100:.1f}%)")
    else:
        print(f"{truth} at {probs[truth]*100:4.1f}%. CNN thinks it's a {prediction} ({probs[prediction]*100:4.1f}%)")
    table.append((truth, probs))

1 at 51.3%. CNN thinks it's a 1 (51.3%)
2 at 84.8%. CNN thinks it's a 2 (84.8%)
3 at 91.2%. CNN thinks it's a 3 (91.2%)
4 at  0.1%. CNN thinks it's a 5 (41.7%)
5 at 100.0%. CNN thinks it's a 5 (100.0%)
6 at  0.0%. CNN thinks it's a 3 (99.9%)
7 at 99.7%. CNN thinks it's a 7 (99.7%)
8 at  4.7%. CNN thinks it's a 1 (21.8%)
9 at 15.2%. CNN thinks it's a 8 (64.3%)
dog. CNN thinks it's a 8 (17.0%)


### Data Augmentation
add inveted images to training data to make the NN more robust to different images
(could also do rotated images, &c.)

In [171]:
ds_train, ds_test = tfds.load(
    "mnist",
    split=["train", "test"],
    as_supervised=True,
)

def invert_img(image, label):
    return 1.-image, label

ds_train = ds_train.map(normalize_img)
ds_train = ds_train.concatenate(ds_train.map(invert_img))  # new line
ds_train = ds_train.shuffle(1000)
ds_train = ds_train.batch(128)

ds_test = ds_test.map(normalize_img)
ds_test = ds_test.concatenate(ds_test.map(invert_img))  # new line
ds_test = ds_test.batch(128)

model.fit(
    ds_train,
    validation_data=ds_test,
    epochs=5,
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7eff982947c0>

In [172]:
probabilities = model.predict(images)



In [173]:
truths = list(range(1, 10)) + ["dog"]

table = []
for truth, probs in zip(truths, probabilities):
    prediction = probs.argmax()
    if truth == 'dog':
        print(f"{truth}. CNN thinks it's a {prediction} ({probs[prediction]*100:.1f}%)")
    else:
        print(f"{truth} at {probs[truth]*100:4.1f}%. CNN thinks it's a {prediction} ({probs[prediction]*100:4.1f}%)")
    table.append((truth, probs))

1 at 67.6%. CNN thinks it's a 1 (67.6%)
2 at 100.0%. CNN thinks it's a 2 (100.0%)
3 at 99.9%. CNN thinks it's a 3 (99.9%)
4 at 99.9%. CNN thinks it's a 4 (99.9%)
5 at 100.0%. CNN thinks it's a 5 (100.0%)
6 at 100.0%. CNN thinks it's a 6 (100.0%)
7 at 100.0%. CNN thinks it's a 7 (100.0%)
8 at 100.0%. CNN thinks it's a 8 (100.0%)
9 at  1.8%. CNN thinks it's a 8 (84.6%)
dog. CNN thinks it's a 8 (42.9%)
