# Introduction
In this notebook I am going to implement a Batch Gradient Descent with Early Stopping using Softmax Regression model <b>without Scikit-Learn</b> (or at least try to do it :D).

### Dataset
I will use Iris dataset from sklearn datasets

In [41]:
from sklearn import datasets

iris = datasets.load_iris()
X = iris["data"][:, 2:]
y = iris["target"]

1

As we can see, this dataset is sorted. If we would be using SGD there will be a need to shuffle it, but as we use BGD it may remain as it is.

### Softmax regression using BGD

In [53]:
import numpy as np

X_with_bias = np.c_[np.ones([len(X), 1]), X]
np.random.seed(2042)

In [None]:
theta1 = np.random.randn(2,1)
theta2 = np.random.randn(2,1)
theta3 = np.random.randn(2,1)

m = X.shape[0]
epochs = 100
eta = 0.1

for i, x in enumerate(X):
    s1 = x.T.dot(theta1)
    s2 = x.T.dot(theta2)
    s3 = x.T.dot(theta3)
    
    s = np.c_[s1, s2, s3].ravel()
    # print(s)
    
    p1 = np.exp(s1) / sum(np.exp(s))
    p2 = np.exp(s2) / sum(np.exp(s))
    p3 = np.exp(s3) / sum(np.exp(s))
    
    # print(p1, p2, p3)
    
    y_up = np.argmax(s)
    # print(y_up)
    
    yk1, yk2, yk3 = 0, 0, 0
    
    if y[i] == 0:
        yk1 = 1
    elif y[i] == 1:
        yk2 = 1
    elif y[i] == 2:
        yk3 = 1
    
    grad1 = 1/m * sum(p1 - yk1)*x
    grad2 = 1/m * sum(p2 - yk2)*x
    grad3 = 1/m * sum(p3 - yk3)*x
    
    theta1 = theta1 - eta*grad1
    theta2 = theta2 - eta*grad2
    theta3 = theta3 - eta*grad3
       

print(eta*grad1)
print(theta2)
print(theta3)
    
    
    