<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Programu Operacyjnego Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

# Softmax regression

In this exercise you will train a softmax regression model to recognize handwritten digits.
  
The general setup is as follows:
* we are given a set of pairs $(x, y)$, where $x \in R^D$ is a vector of real numbers representing the features, and $y \in \{1,...,c\}$ is the target (in our case we have ten classes, so $c=10$),
* for a given $x$ we model the probability of $y=j$ by $$h(x)_j=p_j = \frac{e^{w_j^Tx}}{\sum_{i=1}^c e^{w_i^Tx}},$$
* to find the right $w$ we will optimize the so called multiclass log loss:
$$L(y,p) = \log{p_y},$$
$$J(w) = -\frac{1}{n}\sum_{i=1}^n L(y_i,h(x)),$$
* with the loss function in hand we can improve our guesses iteratively:
    * $w_{ij}^{t+1} = w_{ij}^t - \text{step_size} \cdot \frac{\partial J(w)}{\partial w_{ij}}$,
* we can end the process after some predefined number of epochs (or when the changes are no longer meaningful).

Let's start with importing the MNIST dataset.

In [None]:
!wget -O mnist.npz https://s3.amazonaws.com/img-datasets/mnist.npz
!pip install plotly==5.3.1

In [None]:
import numpy as np

def load_mnist(path='mnist.npz'):
    with np.load(path) as f:
        x_train, _y_train = f['x_train'], f['y_train']
        x_test, _y_test = f['x_test'], f['y_test']

    x_train = x_train.reshape(-1, 28 * 28) / 255.
    x_test = x_test.reshape(-1, 28 * 28) / 255.

    y_train = np.zeros((_y_train.shape[0], 10))
    y_train[np.arange(_y_train.shape[0]), _y_train] = 1

    y_test = np.zeros((_y_test.shape[0], 10))
    y_test[np.arange(_y_test.shape[0]), _y_test] = 1

    return (x_train, y_train), (x_test, y_test)

(x_train, y_train), (x_test, y_test) = load_mnist()

Let's take a look at the data. In the "x" arrays you'll find the images (encoded as pixel intensities) and in the "y" ones you'll find the labels (one-hot encoded).

In [None]:
print(x_train.shape)
print(y_train.shape)

print(x_train[:10])
print(y_train[:10])

Now let us see the data in a more human way.

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px

num_samples = 10
plots = make_subplots(rows=1, cols=num_samples)

for i in range(num_samples):
  a = x_train[i, :].reshape(28,28)
  img = go.Heatmap(z=a, colorscale='gray')
  plots.add_trace(img, row=1, col=i+1)

plots.update_yaxes(autorange='reversed', scaleanchor='x', constrain='domain')
plots.update_xaxes(constrain='domain')
plots.update_traces(showscale=False)
plots.show()


Next, we prepare $X$ and $y$ variables

In [None]:
X = x_train[:4000]
y = y_train[:4000]

print(X.shape)
print(y.shape)

To train the model we will (obviously) use gradient descent. Inside the loop we need a method to compute the gradients. Let's start with implementing it, together with some helper functions.

In [None]:
# We will store the weights in a D x c matrix, where D is the number of features, and c is the number of classes
#weights = (...) # TODO: Fill in, be sure to have the right shape!
weights = np.zeros([X.shape[1], 10])


def softmax(z):
    ########################################
    # TODO: implement the softmax function #
    ########################################


def predict(weights, X):
    ###################################
    # TODO: compute the probabilities #
    ###################################

def compute_loss_and_gradients(weights, X, y, l2_reg):
    #############################################################################
    # TODO: compute loss and gradients, don't forget to include regularization! #
    #############################################################################

We are now in position to complete the training pipeline.

If you have problems with convergence, be sure to check the gradients numerically.

In [None]:
l2_reg = 0.5
n_epochs = 250
lr = 0.05
t = 0.99

losses = []
for i in range(n_epochs):
    loss, grad = compute_loss_and_gradients(weights, X, y, l2_reg)
    losses.append(loss)

    weights -= lr * grad
    lr *= t

fig = px.line(x=range(1,n_epochs+1), y=losses)
layout = go.Layout(xaxis_title="Epoch", yaxis_title='Loss')
fig.update_layout(layout)

fig.show()

Now compute your accuracy on the training and test sets.

In [None]:
acc = np.mean(predict(weights, x_train).argmax(axis=1) == y_train.argmax(axis=1))
print("Train accuracy: ", acc)
acc = np.mean(predict(weights, x_test).argmax(axis=1) == y_test.argmax(axis=1))
print("Test accuracy: ", acc)

We can also visualize the weights learned by our algorithm. Try to anticipate the result before executing the cell below.

In [None]:
num_samples = 10
plots = make_subplots(rows=1, cols=num_samples)

for i in range(num_samples):
  a = weights[:, i].reshape(28,28)
  img = go.Heatmap(z=a, colorscale='gray')
  plots.add_trace(img, row=1, col=i+1)

plots.update_yaxes(autorange='reversed', scaleanchor='x', constrain='domain')
plots.update_xaxes(constrain='domain')
plots.update_traces(showscale=False)
plots.show()

Note that we only used a small portion of the data to develop the model. Now, implement the training on full data. Also, validate your model properly and find a good value for `l2_reg` hyperparameter. Try to experiment with `batch_size`.


In [None]:
################################################
# TODO: implement the proper training pipeline #
################################################

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>