Exercise 1 Simple neural network
Implement a simple neural network that predicts the class of iris plant basing on its parameters. 
Use 2 hidden layers. The layers should be fully connected with RELu activation function, except for the output layer, which should use softmax function.
 No regularisation or optimalisation are needed. No batching is needed. 


In [218]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()


In [219]:
print(iris.DESCR)


.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

                Min  Max   Mean    SD   Class Correlation
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fis

In [220]:
X = iris.data
y = iris.target

In [221]:
X.shape

(150, 4)

In [227]:
X_train , X_test , y_train, y_test = train_test_split(X,y , test_size=0.25 , random_state=42,shuffle=True)
train_labels = np.eye(3)[y_train] # one hot encoding 3 type of flowers  # full set 
test_labels = np.eye(3)[y_test]


In [223]:
class Neuron():
    def __init__(self,in_features,out_features):
        self.weights = np.random.uniform(low = - ((1 / in_features)**0.5) ,high= (1 / in_features)**0.5 , size = (in_features,out_features))    # as in PyTorch
        self.bias = np.random.uniform(low = - ((1 / in_features)**0.5) ,high= (1 / in_features)**0.5 , size = (1,out_features))
    
        
    
    


    

In [None]:
def activation_Relu(X):
    return np.maximum(X,0)

def Relu_derivative(X):
    # basicly kill gradient if <=0
    X[X<=0] = 0
    X[X>0] = 1
    
    return X

def Softmax_forward(X):
    X_exp = np.exp(X - np.max(X, axis=1, keepdims=True))
    row_sum = np.sum(X_exp,axis=1,keepdims=True)
    return X_exp/row_sum  
def Logloss(X,labels):
    # matrix after softmax 
    return -np.mean(labels * np.log(X))

def CrossEntropyBackward(Softmax_output,labels):
    # combining log loss and soft max gives nice derivative 
    # X is output of softmax and labels are true labels
    return Softmax_output - labels     




http://neuralnetworksanddeeplearning.com/chap2.html


In [229]:
# following process described in article 

# initializing neurons for layers
N_1 = Neuron(4,10)
N_2 = Neuron(10,10)
N_3 = Neuron(10,3)
lr = 0.001
for _ in range(1000):
   
    X_1 = np.dot(X_train,N_1.weights) + N_1.bias
    A_1 = activation_Relu(X_1)

    X_2 = np.dot(A_1,N_2.weights) + N_2.bias
    A_2 = activation_Relu(X_2)

    X_3 = np.dot(A_2,N_3.weights) + N_3.bias
    A_3 = Softmax_forward(X_3)
    loss = Logloss(A_3, train_labels)


    N_3_error = A_3 - train_labels
    N_3_weight_grad = np.dot(A_2.T,N_3_error)  # X_3 to co wpada do funckji aktywacji 
    N_3_bias_grad = np.mean(N_3_error,axis=0,keepdims=True)

    N_2_error = np.multiply(np.dot(N_3_error,N_3.weights.T),Relu_derivative(X_2))   # hadamard product 
    N_2_weight_grad = np.dot(N_2_error.T,A_1).T
    N_2_bias_grad = np.mean(N_2_error,axis=0,keepdims=True)

    N_1_error = np.multiply(np.dot(N_2_error,N_2.weights.T),Relu_derivative(X_1)).T
    N_1_weight_grad = np.dot(N_1_error,X_train).T
    N_1_bias_grad = np.mean(N_1_error , axis = 1, keepdims= True).T


    # update

    N_1.weights -= lr * N_1_weight_grad
    N_1.bias  -= lr * N_1_bias_grad
    N_2.weights -= lr * N_2_weight_grad
    N_2.bias -= lr  * N_2_bias_grad
    N_3.weights -= lr * N_3_weight_grad
    N_3.bias -= lr * N_3_bias_grad

    print(f"Step number : {_} and value of loss function = {loss}")






Step number : 0 and value of loss function = 0.5013695615303843
Step number : 1 and value of loss function = 0.36690055921872516
Step number : 2 and value of loss function = 0.3553402446399771
Step number : 3 and value of loss function = 0.35070012067602746
Step number : 4 and value of loss function = 0.3451583558632433
Step number : 5 and value of loss function = 0.34192258201796594
Step number : 6 and value of loss function = 0.3378079659666734
Step number : 7 and value of loss function = 0.33132456796741067
Step number : 8 and value of loss function = 0.3248417698517937
Step number : 9 and value of loss function = 0.31787667299964456
Step number : 10 and value of loss function = 0.3104036845039264
Step number : 11 and value of loss function = 0.30210613869628455
Step number : 12 and value of loss function = 0.29307956732956586
Step number : 13 and value of loss function = 0.2831921313150416
Step number : 14 and value of loss function = 0.272999084980271
Step number : 15 and value of

In [251]:
# now let's go to prediction

def accuracy(y_pred,y_true):
    # without 1-hot encoding

    return np.sum(np.equal(y_pred,y_true)) / len(y_pred)


In [253]:
def predict(X_test,y_test,N_1,N_2,N_3,accuracy_f = accuracy):
    X_1 = np.dot(X_test,N_1.weights) + N_1.bias
    A_1 = activation_Relu(X_1)

    X_2 = np.dot(A_1,N_2.weights) + N_2.bias
    A_2 = activation_Relu(X_2)

    X_3 = np.dot(A_2,N_3.weights) + N_3.bias
    A_3 = Softmax_forward(X_3)
    preds = np.argmax(A_3,axis = 1)
    acc = accuracy_f(y_pred = preds, y_true = y_test)
    return preds , acc


In [257]:
y_pred, acc  = predict(X_test=X_test,y_test=y_test,N_1=N_1,N_2=N_2,N_3=N_3)

In [258]:
acc

0.9473684210526315