### Tensorflow Intro

In [None]:
%pylab inline
import tensorflow as tf
import numpy as np
import random

Populating the interactive namespace from numpy and matplotlib


#### Task 1 - secret function
For this exercise, we'll figure out some simple linear function:

\begin{equation}
y = x \cdot 10 - 7
\end{equation}

We won't tell anyone what a function is, but let's calculate a lot of sample values ​​based on it. Implement the above function in Python.

Create a one-dimensional matrix of 100 real random numbers and save as `data_X`. Then use these values ​​to compute the corresponding values ​​of the above function and store them in the variable `data_y`.

The goal of the entire exercise is to recreate this function on the basis of the sample data we generated. For this we will use a "model" that will adequately reflect this function:
\begin{equation} 
y = a \cdot x + b 
\end{equation}
So our task is to guess the parameters `a` and `b` from the above formula, which minimize the error in the example data.

In [None]:
def secret_function(x):
    return x*10-7

secret_function(2)

13

In [None]:
data_X = [random.random() for _ in range(100)]
print(data_X[:5])
print(len(data_X))

[0.18959615400962226, 0.5737476473072114, 0.38706755629790135, 0.5479210011088037, 0.3908151890246858]
100


In [None]:
data_y = [secret_function(x) for x in data_X]
print(data_y[:5])
print(len(data_y))

[-5.104038459903777, -1.262523526927886, -3.1293244370209865, -1.5207899889119627, -3.091848109753142]
100


**First calculations in TF**

If we want to compute the same function as above, but using Tensorflow, we need to "wrap" the variables into objects of type `tf.Tensor`. We use the `tf.constant` and` tf.Variable` types to create these objects. For our exercises, the data generated above will be treated as constants, and the parameters that we want to calculate (i.e. the values ​​`a` and` b` of the secret formula) will be treated as variables.

In [None]:
X = tf.constant(data_X)
a = tf.Variable(random.random())
b = tf.Variable(random.random())

y = a*X+b
print(y.numpy())

[0.2625681  0.5949797  0.43344283 0.5726316  0.4366857  0.6756811
 0.56136215 0.22880295 0.62075955 0.42304033 0.8743818  0.6698616
 0.13008392 0.282427   0.64867926 0.93044966 0.92213756 0.6211486
 0.94888145 0.8978431  0.12581812 0.30665818 0.4610039  0.67896944
 0.5444479  0.23899212 0.5159611  0.09888064 0.2804938  0.42077947
 0.49380583 0.15012571 0.6560946  0.89138997 0.79809594 0.84857136
 0.7439016  0.26249546 0.9497034  0.65600944 0.42802373 0.47483072
 0.7985245  0.47811154 0.12221623 0.84198964 0.757128   0.5420726
 0.3844432  0.73500794 0.49203324 0.42180052 0.26893508 0.26385188
 0.3100156  0.509469   0.099258   0.17322232 0.5160661  0.7596749
 0.11239491 0.54184306 0.27329782 0.36647502 0.31828266 0.3931451
 0.73497075 0.45783016 0.9467328  0.21319373 0.86251915 0.53358394
 0.37009996 0.7094109  0.7465915  0.65925646 0.7769643  0.8069711
 0.5985586  0.56070316 0.16441506 0.2699795  0.8475094  0.46097845
 0.58759373 0.5781816  0.23081604 0.34044996 0.19528046 0.68137884
 0

**Error calculation**

Calculate Mean Square Error (`MSE`).

In [None]:
differences = y - data_y
differences.numpy()

array([ 5.366606  ,  1.8575032 ,  3.5627673 ,  2.0934215 ,  3.528534  ,
        1.0055796 ,  2.2123868 ,  5.723048  ,  1.5853592 ,  3.672581  ,
       -1.0920037 ,  1.0670128 ,  6.7651753 ,  5.1569653 ,  1.2906246 ,
       -1.6838844 , -1.5961375 ,  1.5812521 , -1.8784595 , -1.339673  ,
        6.8102074 ,  4.9011693 ,  3.271819  ,  0.97086626,  2.3909426 ,
        5.615486  ,  2.6916633 ,  7.0945725 ,  5.1773734 ,  3.696448  ,
        2.9255455 ,  6.553604  ,  1.2123442 , -1.2715505 , -0.28669214,
       -0.81953627,  0.28540972,  5.367373  , -1.887136  ,  1.2132437 ,
        3.619974  ,  3.125856  , -0.29121637,  3.0912223 ,  6.8482304 ,
       -0.7500564 ,  0.14578599,  2.416018  ,  4.080032  ,  0.37929615,
        2.9442575 ,  3.6856694 ,  5.299393  ,  5.353054  ,  4.865727  ,
        2.7601976 ,  7.090589  ,  6.309785  ,  2.690555  ,  0.11889881,
        6.951909  ,  2.4184406 ,  5.253338  ,  4.269713  ,  4.7784553 ,
        3.9881701 ,  0.37968868,  3.3053224 , -1.8557773 ,  5.88

In [None]:
y_true = tf.constant(data_y)
MSE = tf.reduce_mean((y-y_true)**2)
MSE_v2 = tf.reduce_mean(tf.math.squared_difference(y,y_true))

print('MSE: ', MSE.numpy(), '=', MSE_v2.numpy())

MSE:  13.555616 = 13.555616


**Gradient**

We assume that the learning coefficient `alpha` is 0.1.

In a new context `tf.GradientTape`:
1. compute `y` function  based on its formula, as above;
2. compute the `MSE error` function, also like above;
3. write the value of the error function to the screen;
4. compute the gradient of the error function in relation to the variables `a` and` b` - you can put these variables together in a list and you will get the same list of gradient values ​​as a result;
5. from the individual variables, subtract the appropriate gradient values ​​multiplied by the learning coefficient `alpha`;

For the last step, use the  component function `assign_sub`, for example:` a.assign_sub (grad [0] * alpha) `.

List the values ​​of variables `a` and` b` before and after modifying the gradient and recalculate the error function (for this you have to recalculate both the `y` function and the error function itself). Has it decreased?

In [None]:
X = tf.constant(data_X)
y_true = tf.constant(data_y)
a = tf.Variable(random.random())
b = tf.Variable(random.random())

alpha = tf.constant(0.1)

with tf.GradientTape() as g:
    y = a*X + b
    MSE = tf.reduce_mean(tf.math.squared_difference(y, y_true))
    print('Przed modyfikacją gradientu')
    print('MSE: ', MSE.numpy())
    print('y = {} * X + {}'.format(a.numpy(), b.numpy()))
    print('a = ', a.numpy())
    print('b = ', b.numpy())
    
    grad = g.gradient(MSE, [a,b])
    a.assign_sub(grad[0]*alpha)
    b.assign_sub(grad[1]*alpha)
    
    print('\n')
    print('Po modyfikacji gradientu')
    y = a*X + b
    MSE = tf.reduce_mean(tf.math.squared_difference(y, y_true))
    print('MSE: ', MSE.numpy())
    print('y = {} * X + {}'.format(a.numpy(), b.numpy()))
    print('a = ', a.numpy())
    print('b = ', b.numpy())
    

Przed modyfikacją gradientu
MSE:  15.056806
y = 0.44411394000053406 * X + 0.4655159115791321
a =  0.44411394
b =  0.4655159


Po modyfikacji gradientu
MSE:  12.111424
y = 0.3187028765678406 * X + -0.09838008880615234
a =  0.31870288
b =  -0.09838009


**Training loop**

Now copy all this code into the block below, but remove the error recalculation and printing the parameters `a` and `b` from it. Instead, put the entire gradient context in a `for` loop counting down the epoch numbers from 0 to 1000. In each iteration of the loop, print the epoch number, print the current value of the` a` and `b` parameters, and the value of the error function.

In [None]:
for i in range(2):
    print(i)

0
1


In [None]:
X = tf.constant(data_X)
y_true = tf.constant(data_y)
a = tf.Variable(random.random())
b = tf.Variable(random.random())

alpha = tf.constant(0.1)
epochs = 1000

for i in range(epochs+1):
    with tf.GradientTape() as g:
        y = a*X + b
        MSE = tf.reduce_mean(tf.math.squared_difference(y,y_true))
        grad = g.gradient(MSE,[a,b])
        a.assign_sub(grad[0]*alpha)
        b.assign_sub(grad[1]*alpha)
    print('Epoch ',i, ':  a = ',a.numpy(), ' b = ',b.numpy(), ' MSE = ',MSE.numpy())

Epoch  0 :  a =  0.11100329  b =  -0.189084  MSE =  14.029425
Epoch  1 :  a =  0.070179045  b =  -0.58966863  MSE =  11.623186
Epoch  2 :  a =  0.07087291  b =  -0.90616655  MSE =  10.178733
Epoch  3 :  a =  0.1022992  b =  -1.1594324  MSE =  9.276977
Epoch  4 :  a =  0.15637791  b =  -1.365101  MSE =  8.682379
Epoch  5 :  a =  0.22705716  b =  -1.5348943  MSE =  8.262542
Epoch  6 :  a =  0.3098052  b =  -1.6776018  MSE =  7.943019
Epoch  7 :  a =  0.40122983  b =  -1.7998142  MSE =  7.6819263
Epoch  8 :  a =  0.4987928  b =  -1.9064742  MSE =  7.4556804
Epoch  9 :  a =  0.6005961  b =  -2.0012891  MSE =  7.2509794
Epoch  10 :  a =  0.7052213  b =  -2.0870404  MSE =  7.0603023
Epoch  11 :  a =  0.81160986  b =  -2.165815  MSE =  6.8793736
Epoch  12 :  a =  0.91897255  b =  -2.2391798  MSE =  6.7057467
Epoch  13 :  a =  1.0267221  b =  -2.3083117  MSE =  6.538001
Epoch  14 :  a =  1.1344225  b =  -2.3740945  MSE =  6.375293
Epoch  15 :  a =  1.2417513  b =  -2.4371934  MSE =  6.2171082


The error value drops to 0, and the values ​​of `a` and `b` converge to those at the very beginning of this exercise.

#### Task 2 - multi-class classification

From the `sklearn.datasets` module use the `load_wine()` function and save to the variable `data`. Also import the `label_binarize` and `scale` methods from the `sklearn.preprocessing` module.

In [None]:
import sklearn.datasets
from sklearn.preprocessing import label_binarize, scale, StandardScaler
import pandas as pd

##data = pd.DataFrame(data=load_wine().data, columns=load_wine().feature_names)
data = sklearn.datasets.load_wine()

Load the `data['data']` matrix into the `data_X` variable and additionally convert it to the `float32` type using the `.astype ()` component function. Normalize everything by passing through the function `scale ()`.
Copy the table `data['target']` to the `data_y_lab` variable. To the `data_y` variable save the same table processed with the function `label_binarize`. For the second parameter of this function, give the argument `classes = [0,1,2]`.

In [None]:
##data_X = scale(data['data'].astype('float32')) <- wyskakuje warning
data_X = data['data'].astype('float32')
data_X = StandardScaler().fit_transform(data_X)

data_y_lab = data['target']
data_y = label_binarize(data['target'], classes=[0,1,2])

Copy the training loops above and make the following changes to it:

- Instead of the `a` parameter, create a two-dimensional `W` matrix with random values ​​of size equal to the number of samples from `data_X` (i.e. `data_X.shape[1]`) in one axis and a value of 3 (i.e. the number of output classes) on the other axis. Remember to convert this random number matrix to `float32` using the `.astype()` method.

- The product of `X` (size $178\times13$) and `W` (size $13\times3$) will result in a table of size $178\times3$, which is the value of each output class for each sample.

- Remove the `y` count and compute `logits = tf.matmul(X,W) + b`.

- Let's compute the `y_pred` decision function for our classifier. To do this, we'll pass our `logits` through the `tf.nn.softmax()` function and then through `tf.argmax` (in a single line) passing the `axis = 1` argument to that function in second place.

- To calculate **accuracy** we first need to compare `y_pred.numpy()` and `data_y_lab` with the `==` operator, and then we need to sum the resulting list of `True` / `False` values (with the normal `sum` function) and divide the sum by the number of samples (i.e. `data_y_lab.size`). Put accuracy on the screen in each era.

- To calculate the **error** function, we will use something more adequate than MSE (which is used more often for regression rather than classification), that is, **cross entropy**. We will use the `tf.softmax_cross_entropy_with_logits` function, giving it as arguments the constant `y_true` and the result of the calculation of the function `logits`. Additionally, we will sum the result of this function using `tf.reduce_sum`.

- The **gradient** counting is the same - you just need to remember to change the parameter `a` to` W`, both in calculating the gradient and modifying this parameter later.

In [None]:
X = tf.constant(data_X)
y_true = tf.constant(data_y)
W = tf.Variable(np.random.rand(data_X.shape[1],3).astype('float32'))
b = tf.Variable(random.random())

alpha = tf.constant(0.1)
epochs = 40

for i in range(epochs+1):
    with tf.GradientTape() as g:
        logits = tf.matmul(X,W)+b
        
        y_pred = tf.argmax(tf.nn.softmax(logits),axis=1)
        accuracy = (y_pred.numpy() == data_y_lab).sum() / data_y_lab.size
        error = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(data_y, logits))
        
        grad = g.gradient(error,[W,b])
        W.assign_sub(grad[0]*alpha)
        b.assign_sub(grad[1]*alpha)
    print('Epoch ',i, ':  accuracy = ',accuracy, ',  error = ',error.numpy())

Epoch  0 :  accuracy =  0.0449438202247191 ,  error =  408.0134
Epoch  1 :  accuracy =  0.9213483146067416 ,  error =  285.81607
Epoch  2 :  accuracy =  0.9382022471910112 ,  error =  132.3474
Epoch  3 :  accuracy =  0.9719101123595506 ,  error =  62.362972
Epoch  4 :  accuracy =  0.9775280898876404 ,  error =  41.027676
Epoch  5 :  accuracy =  0.9831460674157303 ,  error =  22.524153
Epoch  6 :  accuracy =  0.9943820224719101 ,  error =  11.679804
Epoch  7 :  accuracy =  0.9943820224719101 ,  error =  8.858484
Epoch  8 :  accuracy =  0.9943820224719101 ,  error =  6.72938
Epoch  9 :  accuracy =  0.9943820224719101 ,  error =  4.749464
Epoch  10 :  accuracy =  0.9943820224719101 ,  error =  2.8998759
Epoch  11 :  accuracy =  0.9943820224719101 ,  error =  1.4729161
Epoch  12 :  accuracy =  1.0 ,  error =  0.812167
Epoch  13 :  accuracy =  1.0 ,  error =  0.56509995
Epoch  14 :  accuracy =  1.0 ,  error =  0.44149107
Epoch  15 :  accuracy =  1.0 ,  error =  0.36706188
Epoch  16 :  accur

#### Task 3 - MLP

Extend the above model with an additional hidden layer of the multilayer perceptron. To do this, add another variable (similar to `W`) named `H` (as hidden) with size $178\times10$ (where 10 is the number of hidden units - this can be changed to any value) and a bias of a hidden layer named `hb`. Also, change the size of the variable `W` to $10\times3$ to reflect the size of the new hidden layer.

Before computing `logits`, first compute the product of `X` and `H` (biased` hb`) into the variable `hidact`, and then pass them through an activation function, e.g. `tf.sigmoid`. Then use `hidact` instead of `X` in counting `logits`.

Remember to extend the gradient with new variables.

In [None]:
X = tf.constant(data_X)
y_true = tf.constant(data_y)
W = tf.Variable(np.random.rand(10,3).astype('float32'))
b = tf.Variable(random.random())
H = tf.Variable(np.random.rand(data_X.shape[0],10).astype('float32')) #hidden
hb = tf.Variable(random.random())

alpha = tf.constant(0.1)
epochs = 40

for i in range(epochs+1):
    with tf.GradientTape() as g:
        hidact = tf.sigmoid(tf.matmul(tf.transpose(X),H)+hb)
        logits = tf.matmul(hidact,W)+b
        
        y_pred = tf.argmax(tf.matmul(X,tf.nn.softmax(logits)), axis=1)
        accuracy = (y_pred.numpy() == data_y_lab).sum() / data_y_lab.size
        error = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(y_true, tf.matmul(X,logits)))
        
        grad = g.gradient(error,[W,b,H,hb])
        W.assign_sub(grad[0]*alpha)
        b.assign_sub(grad[1]*alpha)
        H.assign_sub(grad[2]*alpha)
        hb.assign_sub(grad[3]*alpha)
        
    print('Epoch ',i, ':  accuracy = ',accuracy, ',  error = ',error.numpy())

Epoch  0 :  accuracy =  0.7134831460674157 ,  error =  127.464355
Epoch  1 :  accuracy =  0.21348314606741572 ,  error =  11663.475
Epoch  2 :  accuracy =  0.6348314606741573 ,  error =  47944.094
Epoch  3 :  accuracy =  0.6348314606741573 ,  error =  19913.041
Epoch  4 :  accuracy =  0.6348314606741573 ,  error =  15346.343
Epoch  5 :  accuracy =  0.7471910112359551 ,  error =  19113.012
Epoch  6 :  accuracy =  0.6348314606741573 ,  error =  17062.914
Epoch  7 :  accuracy =  0.6404494382022472 ,  error =  15380.766
Epoch  8 :  accuracy =  0.7752808988764045 ,  error =  12883.766
Epoch  9 :  accuracy =  0.7752808988764045 ,  error =  10822.113
Epoch  10 :  accuracy =  0.7752808988764045 ,  error =  8601.948
Epoch  11 :  accuracy =  0.8089887640449438 ,  error =  8451.815
Epoch  12 :  accuracy =  0.8089887640449438 ,  error =  7046.2925
Epoch  13 :  accuracy =  0.8089887640449438 ,  error =  5664.157
Epoch  14 :  accuracy =  0.8089887640449438 ,  error =  4289.2417
Epoch  15 :  accuracy