<h1><center>  lab 8 : ML Overview: Supervised Learning algorithms </center>
    
<img src="https://files.realpython.com/media/NLP-for-Beginners-Pythons-Natural-Language-Toolkit-NLTK_Watermarked.16a787c1e9c6.jpg" width="400">


```Created by Jinnie Shin (jinnie.shin@coe.ufl.edu)```\
```Date: ```

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQmNf86oJnfhpkPA9LnrFnAbfwF2VywPYpB_w&usqp=CAU" align="left" width="70" height="70" align="left"> 

 ### Required Packages or Dependencies

In [294]:
#!pip install { } ! in case you run into the `package not avaialble` error
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt


## **REVIEW**: Dataset

> We will use the coh-metrix indices introduced in Week 6, `features.xlsx`

In [295]:
data= pd.read_excel('./data/features.xlsx')

############################### MINI TASKS ####################################
# Q1. The total number of rows?

# Q2. How many coh-metrix features? 
# (excluding, `TextID`, `domain1_score`, `domain2_score`, `essay_id`, and `essay_set`)

###############################################################################

X = data.drop(columns=['TextID','domain1_score', 'domain2_score', 'essay_id', 'essay_set'])
y = data.domain1_score


## 1. Regression and Classification Problems

> Our task is to predict the `domain1_score` column using the given coh-metrix features. 
> We will implement and use the two algorithms, linear and logistic regression, as our main prediction/classification models. Before we construct the algorithms next week, we will take a look at how the model weights are learned using **the gradient descent algorithms**. 

### 1.1 Gradient Descent 
<img src="https://miro.medium.com/proxy/1*fBxEzbzP1KkqR7PTexJZdw.png" width="250">

> The objective of the learning algorithm is to determine the best possible values for the parameters (`w` and `b`), such that the overall loss (squared error loss) of the model is minimized as much as possible. \
> Let's solve this regression problem: `y = 4.0+(3.0𝑥0)+(1.0𝑥1)+(3.0𝑥2)+(0.5𝑥3)+(1.5𝑥4)`

In [387]:
x0 = 3.0 + np.random.standard_normal(num_samples)
x1 = 1.0 + np.random.standard_normal(num_samples)
x2 = -8.0 + np.random.standard_normal(num_samples)
x3 = -2.0 + np.random.standard_normal(num_samples)
x4 = 0.5 + np.random.standard_normal(num_samples)
y = 4.0 + 3.0 * x0 + 1.0 * x1 + 3.0 * x2 + 0.5 * x3 + 1.5 * x4 + np.random.standard_normal(num_samples)

X = np.column_stack((x0, x1, x2, x3, x4))
Y = y 

#### 1.1.1 Batch Gradient Descent (BGD)
> Partial derivates of `b` and `w` in linear regression with the squared loss is: 
<img src="https://eli.thegreenplace.net/images/math/aef02f077919896478d0456619f934dcc5809142.png" width="250">


In [388]:
def BGD(X, Y, b, w, alpha=0.005): # alpha is a learning rate, we will set it as 0.005 for now
   
    num_feat = X.shape[1]
    
    num_sample = X.shape[0] # This indicates the total number of data points (rows)

    b_grad = 0 #Intercept 
    
    w_grad = np.zeros(num_feat) # weight vector 
    
    for i in range(num_sample): # BGD first calculates the `b_grad` or `w_grad` 
                                # from the total sample N
        y = Y[i] # one sample, y
        x = X[i] # one sample, x 
        b_grad += -(2./float(num_sample)) * (y - (b + w.dot(x)))

        for j in range(num_feat):
            x_ij = x[j]
            w_grad[j] += -(2./float(num_sample)) * x_ij * (y - (b + w.dot(x)))

    b_new = b - alpha * b_grad
    w_new = np.array([w[i] - alpha * w_grad[i] for i in range(num_feat)])
    return b_new, w_new

In [389]:
def BGD_train(X, Y, alpha=0.005):
    b = 0
    w = np.zeros(X.shape[1])
    print('===== Start Training ====')
    for i in range(10000):
        b_new, w_new = BGD(X, Y, b, w, alpha=alpha)
        b = b_new
        w = w_new
        if i % 1000 == 0:
            print('{}: b = {}, w = {}'.format(i, np.round(b_new, 2), np.round(w_new, 2)))

    print('final: b = {}, w = {}'.format(np.round(b, 2), np.round(w, 2)))
    return b, w

> *Let's explore!*

In [390]:
BGD_train(X, Y)

===== Start Training ====
0: b = -0.1, w = [-0.27 -0.11  0.87  0.22 -0.04]
1000: b = 0.29, w = [3.21 1.12 2.67 0.33 1.5 ]
2000: b = 0.69, w = [3.19 1.11 2.71 0.34 1.5 ]
3000: b = 1.05, w = [3.18 1.1  2.75 0.36 1.51]
4000: b = 1.37, w = [3.17 1.09 2.78 0.37 1.51]
5000: b = 1.67, w = [3.16 1.08 2.81 0.38 1.52]
6000: b = 1.93, w = [3.15 1.08 2.84 0.39 1.52]
7000: b = 2.18, w = [3.14 1.07 2.86 0.4  1.52]
8000: b = 2.39, w = [3.14 1.06 2.88 0.4  1.52]
9000: b = 2.59, w = [3.13 1.06 2.9  0.41 1.53]
final: b = 2.77, w = [3.13 1.05 2.92 0.42 1.53]


(2.766278150852741,
 array([3.12549199, 1.05189554, 2.91675839, 0.41842552, 1.52997421]))

#### 1.1.1 Stochastic Gradient Descent (SGD)
> Shuffles the data and randomly sample one data point to update the gradient

In [391]:
def SGD(x, y, b, w, num_feat, num_sample, alpha=0.005):
    
    b_grad = -(2./float(num_sample)) * (y - (b + w.dot(x)))
    w_grad = np.zeros(num_feat)
    
    for i in range(num_feat):
        w_grad[i] += -(2./float(num_sample)) * x[i] * (y - (b + w.dot(x)))

    b_new = b - alpha * b_grad
    w_new = np.array([w[i] - alpha * w_grad[i] for i in range(num_feat)])
    return b_new, w_new

In [392]:
def SGD_train(X, Y, alpha =0.005):
    
    import random 

    b = 0
    w = np.zeros(X.shape[1])

    num_sample = X.shape[0] 
    num_feat = X.shape[1]

    for i in range(5000):
        indices = list(range(num_sample))
        random.shuffle(indices)

    for j in indices:
        b_new, w_new = SGD(X[j], Y[j], b, w, num_feat, num_sample,  alpha=alpha)
        b = b_new
        w = w_new

    if i % 1000 == 0:
        print('{}: b = {}, w = {}'.format(i, np.round(b_new, 2), np.round(w_new, 2)))

    print('final: b = {}, w = {}'.format(np.round(b,2), np.round(w, 2)))
    

> *Let's explore!*

<img src="https://i.pinimg.com/736x/2e/aa/7d/2eaa7d5021ca7c3c98bc93b98b9646fe.jpg" align="left" width="70" height="70" align="left">

 ## Task 1: Training & Testing data
>  Q1. In order to analyze large dataset efficiently, we will use the package `scikit-learn` to implement regression models. 
>> **Step 1**: Download the package `!pip install sklearn` \
>> **Step 2**: Import models ` from sklearn.linear_model import LinearRegression`\
>> **Step 3**: Call the module `lr = LinearRegression()` \
>> **Step 4**: Fit the dataset using `lr.fit({input}, {output})` and check the intercept and the coefficients using `lr.intercept_` and `lr.coef_`

> More information about the package is available at: https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares

> Q2. Compare the results with our findings. 

In [393]:
################################### YOUR CODE HERE #############################
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
print(lr.intercept_)
print(lr.coef_)

###############################################################################

4.434067716690402
[3.07146115 1.00311739 3.08024028 0.47881489 1.55235906]
