<h1><center>  lab 3 : ML Overview: Supervised Learning algorithms </center>
    
<img src="https://i.pinimg.com/736x/02/66/bd/0266bd22348df67f4cf04ab83bf38e5a.jpg" width="600">


```Created by Jinnie Shin (jinnie.shin@coe.ufl.edu)```\
```Date: May 19th 2022```

```Image source: https://i.pinimg.com/736x/02/66/bd/0266bd22348df67f4cf04ab83bf38e5a.jpg```

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQmNf86oJnfhpkPA9LnrFnAbfwF2VywPYpB_w&usqp=CAU" align="left" width="70" height="70" align="left"> 

 ### Required Packages or Dependencies
 We learned how to download packages in python in our last lecture -- e.g., pip install `numpy`

In [12]:
#!pip install { } ! in case you run into the `package not avaialble` error
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt ### ----- pip install matplotlib

## Data

The data consisted of the labeled chats from an online collaborative problem-solving study (Hao et al., 2015). Each turn of the chats has been coded into **one of the four categories** of collaborative problem-solving skills (Link: https://repository.isls.org/bitstream/1/462/1/297.pdf)
- *Macro level evience* - infer: Collaborative problem-solving skills

#### Goal
- To create an automated classifier using machine learning methods to label the chats automatically



In [62]:
real_X=pd.read_csv('./week3_data/chat_bigram_feature.csv.gz',compression='gzip').values
real_y=pd.read_csv('./week3_data/chat_label.csv').values.ravel()

In [63]:
real_X.shape # Let's take a look at the shape of the input data together 

(11113, 3222)

- Each turn of the chats is already converted into a numerical vector using a NLP vectorization approach (we will cover this in our NLP class :)) 
- So now one variable (or feature in CP) represents a uni-gram or a bi-gram token (e.g., "I", "I-HAVE")

In [65]:
real_y # Let's take a look at the output data together 

array([4, 4, 4, ..., 3, 3, 4])

- Like we explained earlier, the output variable is the hand-coded categorization of one of the four collaborative problem-solving skills

## 1. Regression OR Classification Problems?

> Our task is to predict the `output categories --y`  using the given set of features `X`. 

### Problem setting 1
> Let's assume that the complexity of the skill dimension is the **lowest in category 1** and the **highest in category 4**

$\rightarrow$ Predict whether the chat can be used to automatically predict the complexity of the problem solving skills

> We will implement and use a linear regression model, as our main prediction. We will take a look at how the model weights are learned using **the gradient descent algorithms**. 

### 1.1 Gradient Descent 
<img src="https://miro.medium.com/proxy/1*fBxEzbzP1KkqR7PTexJZdw.png" width="250">

> The objective of the learning algorithm is to determine the best possible values for the parameters (`w` and `b`), such that the overall loss (squared error loss) of the model is minimized as much as possible. \
> Let's solve this regression problem: `y = 4.0+(3.0𝑥0)+(1.0𝑥1)+(3.0𝑥2)+(0.5𝑥3)+(1.5𝑥4)`

In [35]:
num_samples = 200

x0 = 3.0 + np.random.standard_normal(num_samples)
x1 = 1.0 + np.random.standard_normal(num_samples)
x2 = -8.0 + np.random.standard_normal(num_samples)
x3 = -2.0 + np.random.standard_normal(num_samples)
x4 = 0.5 + np.random.standard_normal(num_samples)
y = 4.0 + 3.0 * x0 + 1.0 * x1 + 3.0 * x2 + 0.5 * x3 + 1.5 * x4 + np.random.standard_normal(num_samples)

X = np.column_stack((x0, x1, x2, x3, x4))
Y = y 

#### 1.1.1 Batch Gradient Descent (BGD)
> Partial derivates of `b` and `w` in linear regression with the squared loss is: 
<img src="https://eli.thegreenplace.net/images/math/aef02f077919896478d0456619f934dcc5809142.png" width="250">


In [36]:
def BGD(X, Y, b, w, alpha=0.005): # alpha is a learning rate, we will set it as 0.005 for now
   
    num_feat = X.shape[1]
    
    num_sample = X.shape[0] # This indicates the total number of data points (rows)

    b_grad = 0 #Intercept 
    
    w_grad = np.zeros(num_feat) # weight vector 
    
    for i in range(num_sample): # BGD first calculates the `b_grad` or `w_grad` 
                                # from the total sample N
        y = Y[i] # one sample, y
        x = X[i] # one sample, x 
        b_grad += -(2./float(num_sample)) * (y - (b + w.dot(x)))

        for j in range(num_feat):
            x_ij = x[j]
            w_grad[j] += -(2./float(num_sample)) * x_ij * (y - (b + w.dot(x)))

    b_new = b - alpha * b_grad
    w_new = np.array([w[i] - alpha * w_grad[i] for i in range(num_feat)])
    return b_new, w_new

In [58]:
def BGD_train(X, Y, alpha=0.005):
    b = 0
    w = np.zeros(X.shape[1])
    print('===== Start Training ====')
    for i in range(100000):
        b_new, w_new = BGD(X, Y, b, w, alpha=alpha)
        b = b_new
        w = w_new
        if i % 5000 == 0:
            print('{}: b = {}, w = {}'.format(i, np.round(b_new, 2), np.round(w_new, 2)))

    print('final: b = {}, w = {}'.format(np.round(b, 2), np.round(w, 2)))
    return b, w

> *Let's explore!*

In [59]:
BGD_train(X, Y)

===== Start Training ====
0: b = -0.1, w = [-0.28 -0.09  0.85  0.2  -0.04]
5000: b = 1.57, w = [3.14 0.93 2.78 0.42 1.51]
10000: b = 2.57, w = [3.1  0.92 2.88 0.44 1.49]
15000: b = 3.17, w = [3.08 0.9  2.94 0.46 1.48]
20000: b = 3.52, w = [3.07 0.9  2.97 0.47 1.48]
25000: b = 3.72, w = [3.07 0.89 2.99 0.47 1.48]
30000: b = 3.85, w = [3.06 0.89 3.   0.47 1.47]
35000: b = 3.92, w = [3.06 0.89 3.01 0.48 1.47]
40000: b = 3.96, w = [3.06 0.89 3.02 0.48 1.47]
45000: b = 3.99, w = [3.06 0.89 3.02 0.48 1.47]
50000: b = 4.0, w = [3.06 0.89 3.02 0.48 1.47]
55000: b = 4.01, w = [3.06 0.89 3.02 0.48 1.47]
60000: b = 4.02, w = [3.06 0.89 3.02 0.48 1.47]
65000: b = 4.02, w = [3.06 0.89 3.02 0.48 1.47]
70000: b = 4.02, w = [3.06 0.89 3.02 0.48 1.47]
75000: b = 4.02, w = [3.06 0.89 3.02 0.48 1.47]
80000: b = 4.03, w = [3.06 0.89 3.02 0.48 1.47]
85000: b = 4.03, w = [3.06 0.89 3.02 0.48 1.47]
90000: b = 4.03, w = [3.06 0.89 3.02 0.48 1.47]
95000: b = 4.03, w = [3.06 0.89 3.02 0.48 1.47]
final: b = 4.03

(4.026482762169681,
 array([3.05604783, 0.88702139, 3.02165789, 0.47876374, 1.47200593]))

<img src="https://i.pinimg.com/736x/2e/aa/7d/2eaa7d5021ca7c3c98bc93b98b9646fe.jpg" align="left" width="70" height="70" align="left">

 ## Task 1: Training & Testing data
>  **Q1.** In order to analyze large dataset efficiently, we will use the package `scikit-learn` to implement regression models. 
>> **Step 1**: Download the package `!pip install sklearn` \
>> **Step 2**: Import models ` from sklearn.linear_model import LinearRegression`\
>> **Step 3**: Call the module `lr = LinearRegression()` \
>> **Step 4**: Fit the dataset using `lr.fit({input}, {output})` and check the intercept and the coefficients using `lr.intercept_` and `lr.coef_`

> More information about the package is available at: https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares

> **Q2.** Compare the results with our findings. 

In [60]:
################################### YOUR CODE HERE #############################









###############################################################################

<img src="https://i.pinimg.com/736x/2e/aa/7d/2eaa7d5021ca7c3c98bc93b98b9646fe.jpg" align="left" width="70" height="70" align="left">

 ## Task 2: Training & Testing data - using our real data
>  Q3. Let's try to use our `real_X` and `real_y` to investigate whether the chat can sufficiently predict the complexity of the CPS skills. In order to analyze large dataset efficiently, we will use the package `scikit-learn` to implement regression models. 
>
>> **Step 1**: Import models ` from sklearn.linear_model import LinearRegression`\
>> **Step 2**: Call a new module `lr = LinearRegression()` \
>> **Step 3**: Fit the dataset using `lr.fit({input}, {output})` and check the intercept and the coefficients using `lr.intercept_` and `lr.coef_`\
>> **Step 4**: Using the model you fit in Step 3, let's predict the outcome and round it up using `pred = lr.predict(real_X)`\
>> **Step 5**: Let's evaluate the prediction accuracy. First import `from sklearn.metrics import accuracy_score, r2_score`
and going back to your script `r2_score(pred, {output})` `accuracy_score(pred, {output})`

> More information about the package is available at: https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares

> Q4. Any problems with this evaluation scheme?

In [72]:
################################### YOUR CODE HERE #############################










###############################################################################

<img src="https://scikit-learn.org/stable/_images/grid_search_cross_validation.png" width="600">

### Problem setting 2
> Let's classify the lables using the chat 

$\rightarrow$ classify the lables using the chat. 

> We will implement and use the support vector classifer and neural networks model (what we covered in the previous lecture :)) ), as our main prediction. We will take a look at how the model weights are learned using **the gradient descent algorithms**. 

### We will first define a few models we learned so far... 

In [77]:
from sklearn.svm import LinearSVC # for Support Vector Machine
model_SVM = LinearSVC()

from sklearn.neural_network import MLPClassifier #neural network 
nn = MLPClassifier()

from sklearn.linear_model import LogisticRegression #logistic regression
lr = LogisticRegression()

> *Let's explore!*