# <center> Dealing With Data </center>
## <center> Linear Separatrix -   Making it online  </center>
### <center> Prof. Laxmidhar Behera </center>
### <center> IIT Kanpur </center>



- Given data and linear model, data can be represented in terms of the following expression - <br>
$Aw = y$ &nbsp; $A \in R^{1\times n}$ &nbsp; $w \in R^{n\times m}$ &nbsp; $y \in R^{1\times m}$ <br><br>
- The methods like least square solution and minimum norm solution are generally offline techniques.
-   To learn the unknown parameters in an online way , i.e., to update $w$ as we see the new data, we define a cost or loss function and then minimize it using an algorithm called gradient descent.
- **Cost(or Loss) function**:
It is a function of the difference between estimated and true values for an instance of data.-<br>
<center>$E = \frac{1}{2} \sum_{i=1}^m (y_i^d - y_i)^2$<br></center>
  where $y_i^d$ is the desired $i^{th}$output and $y_i$ is the calculated $i^{th}$ output for a single data point. 


- To minimize this loss function, we use a method called <b> Gradient Descent</b> 


## <center>  Gradient Descent  </center>
- In this , we update the unknown parameters iteratively for finding the minimum of a function .
-  In this algorithm, we first compute the slope or gradient of the function at the current point.
- The algorithm then takes steps in the direction opposite to the direction of the gradient at the current point.


<img src="https://drive.google.com/uc?export=view&id=1z3HXUPpzx8fTjkYXf9i9sxTQidE-C6Wx" alt="Drawing" style="width: 200px;"/>


### <center>Loading Data in python</center>
Sklearn.datasets  is a library of many standard datasets including iris data <br>
Numpy is a numerical processing library in python <br>
Keras is a python library for building deep neural networks

In [0]:
from sklearn.datasets import load_iris
import numpy as np

np.random.seed(10) # For random weight initialization

iris = load_iris() # iris is a python dictionary with key-value pairs

#### Pre-process Data
Convert labels to categorical - One-hot encoding

In [0]:
X = iris['data']
Y = iris['target']
from keras.utils import to_categorical

Ny = len(np.unique(Y)) # Ny is number of categories/classes


Y = to_categorical(Y[:], num_classes = Ny) # 

Using TensorFlow backend.


#### Train - Test split
#### Data Normalization/Scaling

In [0]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20, random_state = 1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train) # Computes the mean and standard deviation

X_train = scaler.transform(X_train) # Perform transformation: x = (x-mean)/std
X_test = scaler.transform(X_test)

In [0]:
# define function to add column of 1's
addlcol = lambda x: np.concatenate((x, np.ones((x.shape[0], 1))), axis = 1)
A = addlcol(X_train)


In [0]:
A.shape

(120, 5)

# Weight update using gradient descent



### $w_{new} = w_{old} - \eta \ \frac{\partial E}{\partial w} $

### $E = \frac{1}{2} \sum_{j=1}^3 (y_j^d - y_j)^2$

### ${\frac{\partial E}{\partial w}}_{5\times3} = - \sum_{j=1}^3 (y_j^d - y_j) A = - A^T_{5\times1} e_{1\times3}  $

### $w_{new} = w_{old} + \eta A^T e \ \ \ \ \ \ \ \ (1)$

### Algorithm

- Initialize random weights between (-0.5 , 0.5)

For each sample, 
- First calculate y = Aw 
- Find error(e) = $y^d - y$
- Update weights using (1).



In [0]:
w = (2*np.random.rand(5,3) - 1)/2

eta = 0.06
for i in range(A.shape[0]):
  y = (A[i].reshape(1,5)).dot(w)
  yd = Y_train[i]
  e = yd-y
  w = w + eta*(A[i].reshape(5,1)).dot(e.reshape(1,3))
  
print('weights are \n', w)

weights are 
 [[ 0.11000672 -0.09270634  0.07104678]
 [ 0.10879821 -0.19875927  0.04753876]
 [-0.27045262  0.20677142 -0.23675619]
 [-0.20867913 -0.14909211  0.55711492]
 [ 0.30148813  0.34708777  0.35471917]]


In [0]:
def evaluate(X, W, Yd, transform_X_a):
  a = transform_X_a(X)
  yd = np.argmax(Yd, axis = 1)
  y = np.argmax(a.dot(W), axis = 1)
  print('Confusion Matrix:')
  print(confusion_matrix(yd, y))


evaluate(X_train, w, Y_train, addlcol)
evaluate(X_test, w, Y_test, addlcol)

Confusion Matrix:
[[38  1  0]
 [ 0 27 10]
 [ 0  5 39]]
Confusion Matrix:
[[11  0  0]
 [ 0  7  6]
 [ 0  0  6]]


# Second Model


<table>
  <tr>
    <th colspan="0">X</th>
      <th colspan="4">Y</th>
      <th colspan="6"></th>
      </tr> 
  <tr>
      <td>0.31</td><td>-0.04</td><td>0.45</td><td>0.23</td><td>0</td><td>1</td><td>0</td>
    </tr>
    <tr>
      <td>2.24</td><td>-0.46</td><td>1.30</td><td>1.40</td><td>0</td><td>0</td><td>1</td>
      </tr>
</table>
\begin{align}
[l_s\: b_s\: l_p\: b_p\: l_s^2\: b_s^2\: l_p^2\: b_p^2\: 1] \begin{bmatrix}
          w_{00} &w_{01} &w_{02}\\
          w_{10} &w_{11} &w_{12}\\
          \vdots &\vdots &\vdots \\
          w_{80} &w_{81} &w_{82}
         \end{bmatrix} = [y_0\: y_1\: y_2]
\end{align}
<br><br>
<center>Aw = y</center>

In [0]:
addSqlcol = lambda x: np.concatenate((x, x**2, np.ones((x.shape[0], 1))), axis = 1)

A = addSqlcol(X_train)
Y = Y_train

A.shape

(120, 9)

In [0]:
eta = 0.05
w = (2*np.random.rand(9,3) - 1)/2


for i in range(A.shape[0]):
  y = (A[i].reshape(1,9)).dot(w)
  yd = Y_train[i]
  e = yd-y
  w = w + eta*(A[i].reshape(9,1)).dot(e.reshape(1,3))


evaluate(X_train, w, Y_train, addSqlcol)
evaluate(X_test, w, Y_test, addSqlcol)

Confusion Matrix:
[[39  0  0]
 [ 0 34  3]
 [ 0  4 40]]
Confusion Matrix:
[[11  0  0]
 [ 0  8  5]
 [ 0  1  5]]
