# Bird or plane?

According to the literature square loss in conjunction with a logistic layer leads to small gradients and thus to slow learning. Cross-entropy loss remedies that. I could not, however, verify that. Thus, it's your turn now!

The idea is to take a more demanding dataset, the [CIFAR10](https://www.cs.toronto.edu/%7Ekriz/cifar.html) dataset.

First, we do a couple of necessary imports. 

In [None]:
import tensorflow as tf
import tensorflow.keras as keras
from sklearn.utils import shuffle
import numpy as np
import time
import matplotlib.pyplot as plt
plt.style.use("seaborn-v0_8")

We load the dataset with a convenient in-built method of tensorflow. We put all data in <code>X</code>, all classes in <code>y</code> and then scale so that the data has range $[0,1]$.

In [None]:
data = tf.keras.datasets.cifar10
(x1, y1), (x2, y2) = data.load_data()
X=np.vstack([x1,x2])
y=np.vstack([y1,y2]).reshape(-1,)
X=X/255. # normalise so that all values in [0,1]
X.shape,y.shape

We restrict to a binary classification problem: is it a bird or an airplane?

In [None]:
X_bin=X[(y==0) | (y==2)]
y_bin=y[(y==0) | (y==2)]/2 # divide by 2, in order to get classes 0,1

We further simplify by turning the colour images into gray scale images. Let's visualise some samples.

In [None]:
def to_grayscale(img):
    return 1-(0.299*img[:,:,0]+0.587*img[:,:,1]+0.114*img[:,:,2])
X_bin=np.array([to_grayscale(img) for img in X_bin])

def show_samples(X,N=10):
    rows=[]
    for j in range(N):
        data=[X[i] for i in range(j*N,(j+1)*N)]
        rows.append(np.hstack(data))
    block=np.vstack(rows)
    
    # now we plot the array
    fig,ax=plt.subplots(figsize=(9,9))
    ax.imshow(block, cmap = "binary")
    ax.axis("off")
    plt.show()
show_samples(X_gray)
# let's see whether data is in the form we think it should be in
X_bin.shape,y_bin.shape

Next, let's write a method that draws a training and a test set from the data.

In [None]:
train_size=7000
def get_datasets():
    XX,yy=shuffle(X_bin,y_bin)
    x_bin_train,x_bin_test=XX[:train_size],XX[train_size:]
    y_bin_train,y_bin_test=yy[:train_size],yy[train_size:]
    return x_bin_train,y_bin_train,x_bin_test,y_bin_test

We also write a method to set up a very simple neural network. (Not the most appropriate for this task!)

In [None]:
def get_model():
    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(32, 32)),
      tf.keras.layers.Dense(80, activation='relu'),
      tf.keras.layers.Dense(40, activation='relu'),
      tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    return model

Now it's up to you!

### Task: Compare square loss and cross-entropy loss
* Train a neural network as defined in <code>get_model</code> with square loss, and finetune the learning rate. That is, try out different learning rates (perhaps in a half-way systematic manner) and pick the best one. (How do you measure what's best?)
* Do the same for a neural network with cross-entropy loss.
* With the finetuned learning rate, train a square loss network and a cross-entropy network for a number of epochs that seems appropriate.
* Repeat the previous step five (or ten, or twenty, depending on how much time you're willing to invest) times.
* Take the median of training and test accuracy and plot the result. (Don't bother with the gradients.)

You may find it useful to borrow code from [loss_compare.ipynb](https://colab.research.google.com/github/henningbruhn/math_of_ml_course/blob/main/neural_networks/loss_compare.ipynb)