## Exploring Gradient Descent, Softmax Regression and Regularization Techniques using IRIS Data Set

IRIS is a simple dataset that comprises the `sepal` and `petal` dimensions of 3 types of IRIS flowers. 

I'll be using this dataset to test out a couple of algorithms for **classification** i.e. `Logistic Regression` and `Softmax Regression`.

I'll be exploring 3 versions of **Gradient Descent** i.e. `Batch`, `Stochastic` and `Mini Batch`

I'll be also be experimenting with a few techniques for **regularization** i.e. `Ridge`, `Lasso`, `Elastic Net`. I'll also be creating a version of the regularization that stops the training process as soon as the generalization error starts increasing i.e. `Early Stopping` 

<br>

Given the large set of things on agenda, the table of contents below should be helpful.



**Table of Contents**
1. Exploring IRIS Data
2. Preparing the Data for Logistic *(k=2)* and Multiclass *(k=3)* usecases
3. (2 Class) Logistic Regression with three varients of Gradient Descent - Batch, Stochastic and Mini-Batch


<br>

## Exploring Data



In [5]:
from sklearn import datasets
iris = datasets.load_iris()

list(iris.keys())

['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename']

In [7]:
print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [18]:
X = iris["data"]
type(X)

print(X.shape,"\n")

print(X[0:5,:])

(150, 4) 

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


In [31]:
y = iris["target"]
type(y)

print(y.shape,"\n")

print(y[0:5])

y = y.reshape((150,1))

print(y.shape)

(150,) 

[0 0 0 0 0]
(150, 1)


## Preparing Data for k=2 and k = 3 use cases

In [36]:
import numpy as np
iris = np.hstack([X,y])

print(iris[48:52,:])

# splitting data into training and validation sets
from sklearn.model_selection import train_test_split

train, val = train_test_split(iris,test_size = 0.2, random_state = 42)

print("training set: ",train.shape)
print('\n')
print("test set: ",val.shape)

[[5.3 3.7 1.5 0.2 0. ]
 [5.  3.3 1.4 0.2 0. ]
 [7.  3.2 4.7 1.4 1. ]
 [6.4 3.2 4.5 1.5 1. ]]
training set:  (120, 5)


test set:  (30, 5)


In [44]:
# quickly checking of the training set is representative across the three class
import pandas as pd
train_df = pd.DataFrame(train, columns = ['a','b','c','d','class'])

pd.pivot_table(train_df, index = ['class'],aggfunc=len, margins = True )  #Looks good, lets move on.

Unnamed: 0_level_0,a,b,c,d
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0.0,40.0,40.0,40.0,40.0
1.0,41.0,41.0,41.0,41.0
2.0,39.0,39.0,39.0,39.0
All,120.0,120.0,120.0,120.0


In [60]:
X_train = train[:,0:4]
y_train_3cls = np.array(train[:,4],dtype = int)

X_val = val[:,0:4]
y_val_3cls = np.array(val[:,4],dtype = int)


print("sample data: ",y_train_3cls[48:52])
print("sample data: ",X_train[48:52,:])


print("X_train Shape: ",X_train.shape)
print("Y_train Shape: ",y_train_3cls.shape)


y_train_2cls = (y_train_3cls == 2)
y_val_2cls = (y_val_3cls == 2)


sample data:  [0 1 2 0]
sample data:  [[5.4 3.9 1.7 0.4]
 [5.  2.3 3.3 1. ]
 [6.4 2.7 5.3 1.9]
 [5.  3.3 1.4 0.2]]
X_train Shape:  (120, 4)
Y_train Shape:  (120,)


In [78]:
sample = np.ones((5,1),dtype = int)
sample

array([[1],
       [1],
       [1],
       [1],
       [1]])

#### Quick Summary

We have created the test and validation datsets. There are 2 varients of the labels, one for the 2class implementation of IRIS and the other for the 3class implementation i.e. `X_train`, `X_val`, `y_train_2cls`, `y_train_3cls`, `y_val_2cls`, `y_val_3cls` 

## (2 Class) Logistic Regression with three varients of Gradient Descent - Batch, Stochastic and Mini-Batch

<br>

#### Creating Pipeline for Automating Data Prep

In [81]:

# a simple function to add a column of ones (this is needed to implement gradient descent)
def one_adder(an_array):
    m = an_array.shape[0]
    pad_one = np.ones((m,1),dtype = int)
    an_array = np.hstack([pad_one,an_array])
    return an_array

#pipeline to automate data prep
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import FunctionTransformer


dataprep_2cls = Pipeline([
        ("std_scaler", StandardScaler()),
        ("one_adder", FunctionTransformer(one_adder)),
    ])



#### Implementing and Testing Pipeline

In [82]:
print(X_train.shape)
print(X_train[0:5,:])

print("\n")

X_train_t = dataprep_2cls.fit_transform(X_train) 

print(X_train_t.shape)
print(X_train_t[0:5,:])


(120, 4)
[[4.6 3.6 1.  0.2]
 [5.7 4.4 1.5 0.4]
 [6.7 3.1 4.4 1.4]
 [4.8 3.4 1.6 0.2]
 [4.4 3.2 1.3 0.2]]


(120, 5)
[[ 1.         -1.47393679  1.20365799 -1.56253475 -1.31260282]
 [ 1.         -0.13307079  2.99237573 -1.27600637 -1.04563275]
 [ 1.          1.08589829  0.08570939  0.38585821  0.28921757]
 [ 1.         -1.23014297  0.75647855 -1.2187007  -1.31260282]
 [ 1.         -1.7177306   0.30929911 -1.39061772 -1.31260282]]




#### Implement Batch Gradient Descent
Measure acuracy via `MSE` on training and validation sets

In [None]:
theta_BGD = np.random.randn(5,1)
eta = 0.00001