<a href="https://colab.research.google.com/github/lyradsouza/BreakthroughAI_Practice/blob/main/module7(Lyra).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 7 - Deep Learning 2

The goal of this week's lab is to learn to use a widely-used neural
network modules: convolutional neural networks (CNNs). We can use
them to learn features from images and even text.

![image](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png)

Images and text are common data modalities we encounter in
classification tasks. While we can directly apply the
linear or multi-layer modules we learned in the past few weeks to
those modalities, there are neural network modules specifically
designed for processing them, namely CNNs and RNNs.

This week we will walk through the basics of CNNs and RNNs.

* **Review**: Training and Multi-Layer Models (NNs)
* **Unit A**: Image Processing and Convolutions
* **Unit B**: Convolution Neural Networks (CNNs)

## Review

Last time we took a look at what's happening inside training when we
call `model.fit`. We did this by implementing `model.fit` ourselves.

In [None]:
# For Tables
import pandas as pd
# For Visualization
import altair as alt
# For Scikit-Learn
import sklearn
from keras.wrappers.scikit_learn import KerasClassifier
# For Neural Networks
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense

We will also turn off warnings.

In [None]:
import warnings
warnings.filterwarnings('ignore')

As we have seen in past weeks we load our structured data in
Pandas format.

In [None]:
df = pd.read_csv("https://srush.github.io/BT-AI/notebooks/circle.csv")
all_df = pd.read_csv("https://srush.github.io/BT-AI/notebooks/all_points.csv")

Next, we need to define a function that creates our model. This will
determine the range or different feature shapes the model can learn.

[TensorFlow Playground](https://playground.tensorflow.org/)

Depending on the complexity of the data we may select a model that
is linear or one with multiple layers.

Here is what a linear model looks like.

In [None]:
def create_linear_model(learning_rate=1.0):
    # Makes it the same for everyone in class
    tf.random.set_seed(2)

    # Create model
    model = Sequential()
    model.add(Dense(1, activation="sigmoid"))

    # Compile model
    optimizer = tf.keras.optimizers.SGD(
        learning_rate=learning_rate
    )
    model.compile(loss="binary_crossentropy",
                  optimizer=optimizer,
                  metrics=["accuracy"])
    return model

Here is what a more complex multi-layer model looks like.

In [None]:
def create_model(learning_rate=0.05):
    tf.random.set_seed(2)
    # create model
    model = Sequential()
    model.add(Dense(8, activation="relu"))
    model.add(Dense(8, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))
    # Compile model
    optimizer = tf.keras.optimizers.SGD(
        learning_rate=learning_rate
    )

    model.compile(loss="binary_crossentropy",
                  optimizer=optimizer,
                  metrics=["accuracy"])
    return model

Generally, we will use "ReLU" for the inner layers and "sigmoid" for
the final layer. The reasons for this are beyond the class, and mostly
have to do with computational simplicity and standard practice.

Once we have described the shape of our model we can turn it into a classifier
and train it on data.

In [None]:
model = KerasClassifier(build_fn=create_model,
                        epochs=50,
                        batch_size=20,
                        verbose=0)

The neural network is used just like the classifiers from Week 4 and 5. The only
difference is that we got to design its internal shape.

In [None]:
model.fit(x=df[["feature1", "feature2"]],
          y=(df["class"] == "red"))
df["predict"] = model.predict(df[["feature1", "feature2"]])

The output of the `model.fit` command tells us a lot of information about
how the approach is doing.

In particular if it is working the `loss` should go down and the `accuracy` should
go up. This implies that the model is learning to fit to the data that we provided it.

We can view how the model determines the separation of the data.

In [None]:
chart = (alt.Chart(df)
    .mark_point()
    .encode(
        x="feature1",
        y="feature2",
        color="class",
        fill="predict",
    ))
chart

In [None]:
all_df["predict"] = model.predict(all_df[["feature1", "feature2"]])
chart = (alt.Chart(all_df)
    .mark_point()
    .encode(
        x="feature1",
        y="feature2",
        color="predict",
        fill="predict",
    ))
chart

### Review Exercise

Change the model above to have three inner layers with the first having size 10, the second size 5, and the third having size 5. How well does this do on our problem?

In [None]:
#📝📝📝📝 FILLME
def create_model(learning_rate=0.5):
    tf.random.set_seed(2)
    # create model
    model = Sequential()

    model.add(Dense(10, activation="relu"))
    model.add(Dense(8, activation="relu"))
    model.add(Dense(5, activation="relu"))

    model.add(Dense(1, activation="sigmoid"))
    # Compile model
    optimizer = tf.keras.optimizers.SGD(
        learning_rate=learning_rate
    )

    model.compile(loss="binary_crossentropy",
                  optimizer=optimizer,
                  metrics=["accuracy"])
    return model

In [None]:
model = KerasClassifier(build_fn=create_model,
                        epochs=100,
                        batch_size=20,
                        verbose=1)

out = model.fit(x=df[["feature1", "feature2"]],
          y=(df["class"] == "red"))
df["predict"] = model.predict(df[["feature1", "feature2"]])


In [None]:
history = pd.DataFrame(out.history)

In [None]:
history = history.reset_index().rename(columns={"index": "epoch"})

In [None]:
chart = (alt.Chart(history)
    .mark_line()
    .encode(
        x="epoch",
        y="loss"
    ))
chart

In [None]:
chart = (alt.Chart(df)
    .mark_point()
    .encode(
        x="feature1",
        y="feature2",
        color="class",
        fill="predict",
    ))
chart

## Unit A

### Image Classification

Today's focus will be the problem of image classification. The goal is to take an
image and predict the image class. So instead of predicting whether a point is
red or blue we will now be predicting whether an image is a cat or a dog, or a
house or a plane.

We are going to start with a famous simple image classification tasks known as
MNist. This dataset consists of pictures of hand-written numbers. The goal is to
look at the handwriting and determine what the number is.

Let's start with an image classification task. We will be using the [MNIST dataset](http://yann.lecun.com/exdb/mnist/), where the goal is to recognize handwritten digits.

In [None]:
df_train = pd.read_csv('https://srush.github.io/BT-AI/notebooks/mnist_train.csv.gz', compression='gzip')
df_test = pd.read_csv('https://srush.github.io/BT-AI/notebooks/mnist_test.csv.gz', compression='gzip')
df_train[:100]

In [None]:
df_train.columns

Index(['class', '1x1', '1x2', '1x3', '1x4', '1x5', '1x6', '1x7', '1x8', '1x9',
       ...
       '28x19', '28x20', '28x21', '28x22', '28x23', '28x24', '28x25', '28x26',
       '28x27', '28x28'],
      dtype='object', length=785)

This data is in the same format that we have been using so far.

The column `class` stores the class of each image, which is a number between 0 and 9.

In [None]:
df_train[:100]["class"].unique()

array([5, 0, 4, 1, 9, 2, 3, 6, 7, 8])

The rest of columns store the features. However there are many more features than before!

In particular the images are 28x28 pixels which means we have 784 features.
To make later processing easier, we store the names of pixel value columns in a list `features`.

In [None]:
features = []
for i in range(1, 29):
    for j in range(1, 29):
        features.append(str(i) + "x" + str(j))
len(features)

784

These features are the intensity at each pixel : for instance, the column "3x4" stores the pixel value at the 3rd row and the 4th column. Since the size of each image is 28x28, there are 28 rows and 28 columns.

We can use pandas apply to graph these values for one image.

Convert feature to x, y, and value.

In [None]:
def position(row):
    y, x = row["index"].split("x")
    return {"x":int(x),
            "y":int(y),
            "val":row["val"]}

Draw a heat map showing the image.

In [None]:
def draw_image(i, shuffle=False):
    t = df_train[i:i+1].T.reset_index().rename(columns={i: "val"})
    out = t.loc[t["index"] != "class"].apply(position, axis=1, result_type="expand")

    label = df_train.loc[i]["class"]
    title = "Image of a " + str(label)
    if shuffle:
        out["val"] = sklearn.utils.shuffle(out["val"], random_state=1234).reset_index()["val"]
        title = "Shuffled Image of a " + str(label)
        
    return (alt.Chart(out)
            .mark_rect()
            .properties(title=title)
            .encode(
                x="x:O",
                y="y:O",
                fill="val:Q",
                color="val:Q",
                tooltip=("x", "y", "val")
            ))

Here are some example images.

In [None]:
im = draw_image(0)
im

In [None]:
im = draw_image(4)
im

In [None]:
im = draw_image(15)
im

How can we solve this task? The challenge is that the are many different
aspects that can make a digit look unique.

👩🎓**Student question: What are some features that you use to tell apart digits?**

We can use the NN classifier we learned
last week, with a few modifications to change from binary
classification to multi-class (10-way in this case) classification.

First, the final layer needs to output 10 different values.
Also we need to switch `sigmoid` to a  `softmax`.
Lastly, we need to change the loss function to
`sparse_categorical_crossentropy`.

The practical change is quite small, it should look very similar to
what we have seen already

In [None]:
def create_model():
    # Makes it the same for everyone in class
    tf.random.set_seed(2)

    # Create model
    model = Sequential()
    model.add(Dense(64, activation="relu"))
    model.add(Dense(10, activation="softmax")) # size 10 w/ softmax

    # Compile model
    model.compile(loss="sparse_categorical_crossentropy", # new loss fn
                  optimizer="adam",
                  metrics=["accuracy"])
    return model

While these terms are a bit technical, the main thing to know
is that they team up to change the `loss` function from last
week to score 10 different values instead of 2.

In [None]:
# Create model
model = KerasClassifier(build_fn=create_model,
                        epochs=2,
                        batch_size=20,
                        verbose=False)
# Fit model
out = model.fit(x=df_train[features].astype(float),
          y=df_train["class"])

In [None]:
history = pd.DataFrame(out.history)
history

Unnamed: 0,loss,accuracy
0,2.225875,0.79065
1,0.469122,0.871183


Now that it is fit we can print a summary

In [None]:
print (model.model.summary())

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (20, 64)                  50240     
_________________________________________________________________
dense_8 (Dense)              (20, 10)                  650       
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________
None


And predict on test set

In [None]:
df_test["predict"] = model.predict(df_test[features])
correct = (df_test["predict"] == df_test["class"])
accuracy = correct.sum() / correct.size
print ("accuracy: ", accuracy)

accuracy:  0.8979


This simple MLP classifier was able to get 90% accuracy!

👩🎓**Student question: what is the size of the input to the model??**

In [None]:
#📝📝📝📝 the size of the feature list, which is 784

It is hard to visualize the behavior of the classifier under such a
high dimensionality. Instead we can look at some examples that the
classifier gets wrong.

In [None]:
wrong = (df_test["predict"] != df_test["class"])
examples = df_test.loc[wrong]
num = 0
charts = alt.vconcat()
for idx, example in examples.iterrows():
    label = example["class"]
    predicted_label = example["predict"]
    charts &= draw_image(idx)
    num += 1
    if num > 10:
        break

In [None]:
charts

### What does a NN see?

While NN classifiers reach a decent accuracy on this task, it
doesn't take into account locations in the model. To see this, let's
shuffle each image in MNIST using the **same** shuffling order.

In [None]:
im = draw_image(0, shuffle=True)
im

In [None]:
im = draw_image(4, shuffle=True)
im

In [None]:
im = draw_image(15, shuffle=True)
im

Can you recognize what those images are? Let's train an MLP
classifier on the shuffled images and report the test
accuracy. 

In [None]:
# Create model
model = KerasClassifier(build_fn=create_model,
                        epochs=2,
                        batch_size=20,
                        verbose=0)

In [None]:
shuffled_features = sklearn.utils.shuffle(features, random_state=1234)

In [None]:
# Fit model
model.fit(x=df_train[shuffled_features].astype(float),
          y=df_train["class"])

<keras.callbacks.History at 0x7f98fc06c050>

In [None]:
# Predict on test set
df_test["predict"] = model.predict(df_test[shuffled_features])
correct = (df_test["predict"] == df_test["class"])
accuracy = correct.sum() / correct.size
output = "accuracy: ", accuracy
output

('accuracy: ', 0.8878)

If you implemented correctly, you should see that the test accuracy
on the shuffled images is similar to that on the original
images. Think for a moment why shuffling doesn't change the accuracy
much.

NNs do not inherently take into account locations in the input as humans
do. For instance, the model is not aware that the feature at position
(4, 4) is closer to the feature at position (5, 4) compared to
position (14, 4).

So we can see that an NN classifier does not take into account the
spatial information: it's simply maintaining a input feature, and
there is no sense of "closeness" between pixels that are spatially
close to each other.

### Convolutions

Instead of standard NNs we are going to instead use Convolutional
Neural Networks (CNNS). CNNs are commonly used to learn
features from images and time series. They takes into account the
spatial information allowing for locality.

Let's first define a CNN layer for images. For now let's assume that
the input of the CNN layer is a image, and the output of the
CNN layer is also an image, usually of a smaller size compared
to the input (without input padding). The `parameters` of a CNN layer
consist of a little NN that is trying to draw a separator of a region
of the image. This is called the `filter`.

To get the output of the CNN layer, we overlay the filter on
top of the input such that it covers part of the input without
crossing the image boundary. We start from the upperleft corner.

![image](https://srush.github.io/BT-AI/notebooks/imgs/1.png)

At each overlay position, we apply the filter with the corresponding
portion of the input. The filter is converting the region it looks
at into a new feature, the same way that we saw last class.

In this illustration, the input shown in blue is of size 4x4, and
the filter shown in pink is of size 2x2. The output size 3x3 is
determined by the input size and the filter size.

Now we shift the filter to the right. 

![image](https://srush.github.io/BT-AI/notebooks/imgs/2.png)

We shift the filter to the right again. 

![image](https://srush.github.io/BT-AI/notebooks/imgs/3.png)

We can't shift the filter to the right any more since doing
so would cross the boundary. Therefore, we start from the first
column of the second row

![image](https://srush.github.io/BT-AI/notebooks/imgs/4.png)

Moving right again.

![image](https://srush.github.io/BT-AI/notebooks/imgs/5.png)

### Pattern Matching

Last week we saw how neural networks were able to learn to match
patterns in the training data. For instance they were able to spot
certain features in order to distinguish between red and blue points.

CNNs can be used to do the same thing. However, instead of looking at
all the features at once they look at small groups of feature. 

CNNs can match patterns with filter weights. For instance, if
we set the filter to be $\begin{bmatrix}-0.5 & 0.5 \\ -0.5 &
0.5\end{bmatrix}$, it can detect vertical edges.

In the below example, the first column of the output takes values 1,
corresponding to a line in the original image.

![image](https://srush.github.io/BT-AI/notebooks/imgs/edge1.png)

What if we shift the edge in the original input to the right by 1
pixel? We can see that the 1's in the output also shift to the right
by 1.

![image](https://srush.github.io/BT-AI/notebooks/imgs/edge2.png)

Similarly, if we shift the edge to the right by 2 pixels, the 1's in
the output shift to the right by 2.

![image](https://srush.github.io/BT-AI/notebooks/imgs/edge3.png)

# Group Exercise A

## Question 0

Icebreakers

Who are other members of your group today?

📝📝📝📝 Connie and Minjung

* What's their favorite flower or plant?

📝📝📝📝 Connie likes lavender, Minjung likes daisies

* Do they prefer to work from home or in the office?

📝📝📝📝 Both prefer working from the office

## Question 1

Brainstorm ways that you might write a program tell apart the digits 5 and 6.

In [None]:
#📝📝📝📝 FILLME
# Create a filter to identify a horizontal bar; 
# this identifies the five from a six which has only curves.

Do these methods need to look at the whole image? Which parts do they need to look at?

In [None]:
#📝📝📝📝 FILLME
# Our method only needs to look at the top of each number, not the whole thing.

 Would these approaches still work if the digits were bigger or smaller? What if they moved around on the page? 

In [None]:
#📝📝📝📝 FILLME
# Size would not have much effect, but it might be hard to find the 
# top of an image if it's been moved on the page.

## Question 2

Now it is your turn to finish the result of the CNN computations. Write
down the full result of applying this CNN layer to the given
input.

In [None]:
#📝📝📝📝 FILLME
# Applying the 
# 2 2 3 
# 1 2 1
# 1 1 0

If we increase the height of the input by 1, how would the size of the output change?
If we increase the width of the filter by 1, how would the size of the output change?**

In [None]:
#📝📝📝📝 FILLME
# If we increase the height of the input by 1,
# the output would be one row longer, so 4 x 3.
#  If we increase the width of the filter by 1,
# the output would be one column smaller, so 3 x 2.

In the above illustrations, it seems that a CNN layer processes the input image in a sequential order. Are computations at different positions dependent upon each other? Can we use a different order such as starting from the bottom right corner? 

In [None]:
#📝📝📝📝 FILLME
# Yes we could start in a different order.
# Each computation is independent of the order of computations,
# since it only takes into account the 4 squares of the input it is 
# currently filtering (as long as the results of the filter application 
# still correspond to the correct output square).

## Question 3

Design a filter to detect edges in the horizontal direction.**

In [None]:
#📝📝📝📝 FILLME
# A filter with either
#         0.5 0.5
#        -0.5 -0.5
# or
#         -0.5 -0.5
#          0.5 0.5
# would detect edges in the horizontal direction.

Design a filter to detect edges along a diagonal direction.**

In [None]:
#📝📝📝📝 FILLME
# A filter with 
#         0.5 -0.5
#        -0.5 0.5
# or 
#        -0.5 0.5
#        0.5 -0.5
# would detect edges along diagonal direction.

Is it possible to design a filter to detect other edges such as curves?

In [None]:
#📝📝📝📝 FILLME
# Perhaps with a filter bigger than 2 x 2, it would be possible to
# identify curves, since there are more pixels per row available to 
# create smoother arcs. The bigger the filter, the smoother the 
# curve would be, so with a 5 x 5 filter, there would be a sequence 
# of pixels through the grid to design a curve with.

## Unit B

### Multiple Filters

In practice, we want to go beyond only being able to detect a single
type of edge. Therefore, instead of only using a single
filter, we use multiple "output channels", each with a
different filter, such that each output channel can detect a
different kind of pattern.

The output shape is thereby augmented by
an output channel dimension, forming a 3-D shape of size (output
height, output width, output channels).

Similarly, the input can also have multiple channels: for example,
the input may be color image split into red, green, and blue channels.
Or it can be the output of a
previous CNN layer with multiple output channels.

Instead of using a single filter, we use one filter per input
channel, then apply the convolution operation to each input channel
independently. This process is illustrated below, where for
simplicity we only use a single output channel.

![image](https://srush.github.io/BT-AI/notebooks/imgs/multi_input_channels.png)

One of the tricky parts of CNNs is keeping track of all of the
elements grouped together.

1. input: (num images, input height, input width, input channels).
2. output: (num images, output height, output width, output channels).
3. filter: (filter height, filter width, input channels, output channels).

### CNN Visualization 

In the above discussions, we can see that a CNN layer can detect
edges in the input image. By stacking multiple layers of CNNs
together, it can learn to build up more and more sophisticated
pattern matchers, detecting not only edges, but also mid-level and
high-level patterns.

To get a sense of how a CNN works in practice
and the types of patterns it can match, let us look at the CNN
Explainer. For now let's just play with the demo on the top of the
website.

[CNN Explainer](https://poloclub.github.io/cnn-explainer/)

There are several things to look at in the tool.

1. What's the height and width of the input image?
2. What's the number of the output channels of conv_1_1?
3. What kind of patterns does the model use for each image?

### CNN in Keras

Let's take a look at how to create a CNN layer in Keras.
```
Conv2D(
   filters,
   kernel_size,
   strides=(1, 1),
   padding="valid",
)
```

The argument `filters` specifies the number of output channels. The
argument `kernel_size` is a tuple (height, width) specifying the
size of the filter.

Now let's discuss what `strides` does.  In the above convolution
layer examples, when we shift the filter to the right, we move by 1
pixel; similarly, when we move the filter down, we move down 1
pixel.

We can generalize the step size of movements using strides. For
example, we can use a stride of 2 along the width dimension, so we
move by two pixels each time we move right (note that we still move
down by 1 pixel since the stride along the height dimension is 1),
resulting an output of size 3 x 2.

![image](https://srush.github.io/BT-AI/notebooks/imgs/stride1.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/stride2.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/stride3.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/stride4.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/stride5.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/stride6.png)

👩🎓**Student question: What would the output be if we use a stride of 2 both along the width dimension and the height dimension?**

In [None]:
#📝📝📝📝 FILLME
# it would be 2 x 2:
#    2 3
#    1 0

### Convolution Layers in Keras

Now we are ready to implement a CNN layer in Keras! First, we need to import `Conv2D` from `keras.layers`.

In [None]:
from keras.layers import Conv2D

Now let's use Keras to verify the convolution results we calculated before.

![image](https://srush.github.io/BT-AI/notebooks/imgs/9.png)

Note that we need to do lots of reshapes to add the sample
dimension, or the input/output channel dimension. To recap, the
relevant shapes are:

1. input: (num samples, input height, input width, input channels).
2. output: (num samples, output height, output width, output channels).
3. filter: (filter height, filter width, input channels, output channels).

In [None]:
input = [[1, 0, 1, 0],
         [0, 1, 1, 1],
         [0, 1, 0, 0],
         [1, 0, 0, 0]]
filter = [[1, 0.],
          [1, 1.]] 

In [None]:
def cnn(input, filter):
    # Code to shape the correct
    filter = tf.convert_to_tensor(filter, dtype=tf.float32)
    filter = tf.reshape(filter, (2, 2, 1, 1))
    input = tf.convert_to_tensor(input, dtype=tf.float32)
    input = tf.reshape(input, (1, 4, 4, 1))

    # Call Keras 
    cnn_layer = Conv2D(filters=1, kernel_size=(2, 2))
    cnn_layer(input)
    cnn_layer.set_weights((filter, tf.convert_to_tensor([0.])))
    output = cnn_layer(input)

    # Output
    return tf.reshape(output, (3, 3))

In [None]:
print(cnn(input, filter))

tf.Tensor(
[[2. 2. 3.]
 [1. 2. 1.]
 [1. 1. 0.]], shape=(3, 3), dtype=float32)


Nice, our calculations were correct!

### Pooling

In a convolution layer, we took the convolution between the filter
and a portion of the input to calculate the output. However sometimes
these areas are very small. What if we want a feature over a larger
area?

If we simply take the max value of that portion of input instead of
using the convolution, we get a max pooling layer, as illustrated
below.

![image](https://srush.github.io/BT-AI/notebooks/imgs/max1.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max2.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max3.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max4.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max5.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max6.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max7.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max8.png)

![image](https://srush.github.io/BT-AI/notebooks/imgs/max9.png)

Let's take a look at how to create a max pooling layer in Keras.
```
MaxPool2D(
    pool_size=(2, 2),
    strides=None,
    **kwargs
)
```
Notice how similar it is to convolution layers? Can you infer what `strides` does here?

Now let's use Keras to verify the max pooling results above.

In [None]:
from keras.layers import MaxPool2D

In [None]:
input = [
    [1, 0, 1, 0],
    [0, 1, 1, 1],
    [0, 1, 0, 0],
    [1, 0, 0, 0]
]

In [None]:
input_shape = (1, 4, 4, 1)
input = tf.convert_to_tensor(input, dtype=tf.float32)
input = tf.reshape(input, input_shape)
pooling_layer = MaxPool2D(pool_size=(2, 2),
                          strides=(1, 1))
output = pooling_layer(input)
print (tf.reshape(output, (3, 3)))

tf.Tensor(
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 0.]], shape=(3, 3), dtype=float32)


In the above example we used `strides=(1,1)`. 

### Putting Everything Together

![image](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png)

Now we can put everything together to build a full CNN classifier
for MNIST classification. Below shows an example model, where we
need to use a `Reshape` layer to reshape the input into a
single-channel 2-D image, as well as a `Flatten` layer to flatten
the feature map back to a vector.

In [None]:
from keras.layers import Flatten, Reshape

In [None]:
def create_cnn_model():
    # create model
    input_shape = (28, 28, 1)
    model = Sequential()
    model.add(Reshape(input_shape))
    model.add(Conv2D(32, kernel_size=(3, 3), activation="relu"))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Conv2D(64, kernel_size=(3, 3), activation="relu"))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(10, activation="softmax")) # output a vector of size 10
    # Compile model
    model.compile(loss="sparse_categorical_crossentropy",
                   optimizer="adam",
                   metrics=["accuracy"])
    return model
 #
 # create model
model = KerasClassifier(build_fn=create_cnn_model,
                         epochs=2,
                         batch_size=20,
                         verbose=0)
# fit model
out=model.fit(x=df_train[features].astype(float),
          y=df_train["class"])
df_test["predict"] = model.predict(df_test[features])
correct = (df_test["predict"] == df_test["class"])
accuracy = correct.sum() / correct.size
print ("accuracy: ", accuracy)

accuracy:  0.9831


We are able to get much better accuracy than using basic NNs!

# Group Exercise B

## Question 1

Apply the CNN model to the shuffled MNIST dataset. What accuracy do
you get? Is that what you expected?

In [None]:
#📝📝📝📝 FILLME
model.fit(x=df_train[shuffled_features].astype(float),
          y=df_train["class"])

df_test["predict"] = model.predict(df_test[shuffled_features])
correct = (df_test["predict"] == df_test["class"])
accuracy = correct.sum() / correct.size
output = "accuracy: ", accuracy
output

('accuracy: ', 0.9294)

📝📝📝📝 We definitely got a slightly lower accuracy, but not by as much as I expected. We got 84%, which is still fairly high.

## Question 2

For this question we will use the CNN Explainer website.

[CNN Explainer](https://poloclub.github.io/cnn-explainer/)

* Use the tool under the section "Understanding Hyperparameters" to figure out the output shape of each layer in the above CNN model.

In [None]:
#📝📝📝📝 FILLME
# initial input: 28 x 28
#  model.add(Conv2D(32, kernel_size=(3, 3), activation="relu")) output shape: 26 x 26
#  model.add(MaxPool2D(pool_size=(2, 2))) output shape: 13 x 13
#  model.add(Conv2D(64, kernel_size=(3, 3), activation="relu")) output shape: 11 x 11
#  model.add(MaxPool2D(pool_size=(2, 2))) output shape: 5 x 5

* Use `print (model.model.summary())` to print the output shape of each layer. Did you get the same results as above?

In [None]:
#📝📝📝📝 FILLME
print (model.model.summary())

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
reshape_1 (Reshape)          (20, 28, 28, 1)           0         
_________________________________________________________________
conv2d_3 (Conv2D)            (20, 26, 26, 32)          320       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (20, 13, 13, 32)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (20, 11, 11, 64)          18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (20, 5, 5, 64)            0         
_________________________________________________________________
flatten_1 (Flatten)          (20, 1600)                0         
_________________________________________________________________
dense_12 (Dense)             (20, 10)                 

We did get the same results for the shape of the output. For the Conv2D, the output was smaller by 2 dimensions than the input, while the MaxPooling2 essentially halved the size of the input for the output.

## Question 3

Let's apply our model to a different dataset, [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist), where the goal is to classify an image into one of the below 10 classes:
```
Label 	Description
0 	T-shirt/top
1 	Trouser
2 	Pullover
3 	Dress
4 	Coat
5 	Sandal
6 	Shirt
7 	Sneaker
8 	Bag
9 	Ankle boot
```

Some examples from the dataset are shown below, where each class takes three rows.

![image](https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png)

We have processed the dataset into the same format as MNIST:

In [None]:
df_train = pd.read_csv('https://srush.github.io/BT-AI/notebooks/fashion_mnist_train.csv.gz', compression='gzip')
df_test = pd.read_csv('https://srush.github.io/BT-AI/notebooks/fashion_mnist_test.csv.gz', compression='gzip')
df_train

Unnamed: 0,class,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,1x10,1x11,1x12,1x13,1x14,1x15,1x16,1x17,1x18,1x19,1x20,1x21,1x22,1x23,1x24,1x25,1x26,1x27,1x28,2x1,2x2,2x3,2x4,2x5,2x6,2x7,2x8,2x9,2x10,2x11,...,27x17,27x18,27x19,27x20,27x21,27x22,27x23,27x24,27x25,27x26,27x27,27x28,28x1,28x2,28x3,28x4,28x5,28x6,28x7,28x8,28x9,28x10,28x11,28x12,28x13,28x14,28x15,28x16,28x17,28x18,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,6,0,0,0,0,0,0,0,5,0,0,0,105,92,101,107,100,132,0,0,2,4,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,150,...,211,220,214,74,0,255,222,128,0,0,0,0,0,0,0,0,0,44,12,0,0,40,134,162,191,214,163,146,165,79,0,0,0,30,43,0,0,0,0,0
3,0,0,0,0,1,2,0,0,0,0,0,114,183,112,55,23,72,102,165,160,28,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,24,188,163,93,...,171,249,207,197,202,45,0,3,0,0,0,0,0,0,0,0,0,0,1,0,0,0,22,21,25,69,52,45,74,39,3,0,0,0,0,1,0,0,0,0
4,3,0,0,0,0,0,0,0,0,0,0,0,0,46,0,21,68,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25,187,189,...,230,237,229,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,68,116,112,136,147,144,121,102,63,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59995,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
59996,1,0,0,0,0,0,0,0,0,0,0,83,155,136,116,148,110,118,67,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,118,...,199,165,108,108,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,32,159,6,91,0,0,85,159,0,73,0,0,0,0,0,0,0,0,0
59997,8,0,0,0,0,0,0,0,0,0,0,1,0,0,87,114,77,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,...,228,229,231,231,218,230,255,0,0,0,0,0,0,0,0,0,0,0,116,140,147,166,176,174,173,173,174,173,177,164,160,162,163,135,94,0,0,0,0,0
59998,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's visualize some examples first

In [None]:
draw_image(1)

In [None]:
draw_image(4)

In [None]:
draw_image(10)

Apply the CNN model to this dataset and print out the accuracy.

In [None]:
#📝📝📝📝 FILLME
features = []
for i in range(1, 29):
    for j in range(1, 29):
        features.append(str(i) + "x" + str(j))
                      
# Fit model
out = model.fit(x=df_train[features].astype(float),
          y=df_train["class"])

df_test["predict"] = model.predict(df_test[features])
correct = (df_test["predict"] == df_test["class"])
accuracy = correct.sum() / correct.size
print ("accuracy: ", accuracy)



accuracy:  0.8784
