<a href="https://colab.research.google.com/github/kenpu-uoit/beirami-thesis/blob/master/CSCI_4050U_Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
%tensorflow_version 2.x

In [0]:
import numpy as np
from sklearn.datasets import load_iris
import tensorflow as tf
import matplotlib.pyplot as pl

# Classification of flowers

## The Dataset: Iris classification

In [0]:
iris = load_iris()

We have 150 samples which are the measurements of three different types of the _iris_ flower.

Each sample consists of four measurements.  They correspond to the dimensions of (length and width) of different parts of the flower.

The features are stored in a tensor `X` with shape (150, 4).

Each sample also has an integer (0, 1, 2) that indicates the species of the sample: (0: setosa, 1: versicolor, 2: virginica).  We use one-hot encoding.  Namely, we have:

$$\begin{eqnarray}
\mathrm{enc}(0) &=& [1, 0, 0] \\
\mathrm{enc}(1) &=& [0, 1, 0] \\
\mathrm{enc}(2) &=& [0, 0, 1] \\
\end{eqnarray}$$

In [0]:
# ---------------------
# Load the data
# ---------------------
n_samples = len(iris.data)
n_species = len(iris.target_names)
X = tf.constant(iris.data, dtype=tf.float64)

I = iris.target
Y = np.zeros((n_samples, n_species))
Y[np.arange(n_samples), I] = 1
Y = tf.constant(Y, dtype=tf.float64)

In [0]:
# --------------------------
# Inspect the data
# --------------------------
print("Features of first ten samples.")
print(X[:10, :])
print()
print("Species as one-hot vectors")
print(Y[:10, :])

## Multi-class logistic regression.

Consider the feature vector $x\in\mathbf{R}^4$ of a single sample.  You need to build a model that predicts the species as a probability distribution $p = [p_0, p_1, p_2] \in\mathbf{R}^3$.

In order to build the model, we will need two _transformations_ applied to $x$.

### Linear transformation
---

$$ z = W\cdot x + b $$
where $W\sim (3, 4)$ and $b\sim(1, 3)$

The linear transformation produces a vector $z\in\mathbf{R}^3$.  This is known as the _logit_ vector.

#### Batch processing
When we have a batch of samples $X\sim(150, 4)$, we can perform
the linear transformation as:

$$ Z = X\cdot W^T + b $$
where $Z\sim (150, 3)$.

### Softmax transformation
---

Next we will map the logit vector to a proper probability distribution.  This is done by taking the exponentiation of each $z_i$, and renormalize by the sum.

$$ p_i = \frac{\exp(z_i)}{\sum_{k=0}^2 \exp(z_i)} $$

This is known as the _softmax_ transformation:

$$ p = \mathrm{softmax}(z) $$

### The final model
---

Together, we have the following model:

$$ \mathrm{model}(X|\theta) = \mathrm{softmax}\left(X\cdot W^T + b\right) $$

The model parameter $\theta = (W, b)$.

_Hint_

Tensorflow comes with a built-in softmax function:

```python
P = tf.nn.softmax(Z)
```

**Implement the model function
that accepts a batch of feature vectors, and
returns the batch of probabilities.**

In [0]:
def model(X, theta):
  # complete
  pass

In [0]:
# -------------------------------------
# Testing your model
# -------------------------------------
W0 = tf.Variable(np.random.randn(3, 4))
b0 = tf.Variable(np.random.randn(1, 3))

P = model(X, [W0, b0])

assert(P.shape == (150,3))
assert(np.all(0 <= P))
assert(np.all(P <= 1))

## The loss function

Once a prediction probability, $p = [p_0, p_1, p_2]$, is made by the model, but the true output is given by $y = [y_0, y_1, y_2]$, where only one $y_i = 1$.

We measure the accuracy of the prediction using _cross entropy_ between $y$ and $p$, as defined as:

$$\mathrm{crossentropy}(y, p) = - \sum_{i=0}^2 y_i\cdot\log(p_i)$$

When a batch of prediction probabilities $Y$, and true categories $Y_\mathrm{true}$ is given, the loss can be the mean average of the cross entropy:

$$ \mathrm{loss}(Y^\mathrm{true}, Y)=\frac{1}{n}\sum_{i=1}^n
\mathrm{crossentropy}(y^\mathrm{true}_i, y_i)$$

**Implement the loss function that accepts as input:**

1. A batch of the true categories in one-hot encoding.
2. A batch of prediction probabilities.

The loss function is to return a **single** scalar tensor as the loss.

*Hint*:

Tensorflow comes with a built-in cross-entropy loss function:

```python
tf.losses.categorical_crossentropy(Y_true, Y_pred)
```

which returns a tensor of losses with shape `(150,)`.  You will need to further reduce this to a scalar tensor using:

```python
tf.reduce_mean(...)
```

In [0]:
def loss(Y_true, Y_pred):
  # complete
  pass

In [0]:
# --------------------------------
# Test the loss function
# --------------------------------

assert(loss(Y, Y).shape == ())
assert(loss(Y, Y).numpy() < 1E-5)

assert(loss(Y, model(X, [W0, b0])) > 1E-5)

## Training by optimization

Recall the objective is to perform model parameter estimation to minimize the loss.  This is done with gradient descent.

**Implement a generic _train_ function as follows.**

**Input arguments**

1. model: a model function
2. theta: a list of tensorflow variables which will be the model parameters
3. loss: a loss function
4. X: a batch of inputs
5. Y: a batch of true outputs
6. epochs: the _total_ number of epochs to run
7. alpha: the learning rate

**Output**

- A _numpy array_ of shapes `(epochs,)` which represents the loss _after_ each epochs.


In [0]:
def train(model, theta, loss, X, Y, epochs, alpha):
  # complete
  pass

**Choose the epochs and learning rate accordingly**


In [0]:
epochs = # complete
alpha = # complete

In [0]:
# ---------------------------
# Testing training loss
# ---------------------------

assert(training_losses.shape == (epochs,))

# Make sure the training loss is sufficient
assert(training_losses[-1] < 0.5)

# Accuracy

To see how the model can predict the species, we will evaluate the accuracy as the percentage of _correct_ prediction.

Suppose that we have a prediction `Y_pred` as probabilities of shape `(150,3)`.
We can use the Numpy argmax to find the index with the greatest probability.

```python
I_pred = np.argmax(Y_pred, axis=1)
```

where `I_pred` has a shape of `(150,)`.  We can compute the percentage of _correct_ guesses by element comparison with `I_true` and then count the number of `1`s using `np.sum(...)`.

```python
total_correct = np.sum(I_true == I_pred)
```

**Implement a function `evaluate` that will evaluate the accuracy of a model with respect to a given model parameter `theta`** . 

**Input parameters**

1. model: a model to be evaluated
2. theta: the model parameter to use
3. X: a batch of input
4. I_true: the true categories (as integers) for the batch of input

**Output**

- An float number between 0 and 1.0 which is the pecentage of _correct_ guesses by the model.

In [0]:
def evaluate(model, theta, X, I_true):
  # complete
  pass

In [0]:
# ----------------------------------
# Test the accuracy
# ----------------------------------

accuracy = evaluate(model, [W, b], X, I)
print("Accuracy: %.2f %%" % (accuracy * 100))

assert(accuracy.shape == ())
assert(0.90 < accuracy <= 1.0)