<a href="https://colab.research.google.com/github/latifahnl/modulesection/blob/kmmodule/Module4_Section2_Lab2_Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 4
## Section: Logistic Regression

## Lab 2: Logistic Regression and Evaluation Metrics

<br><br><br><br>
## Objective
***
- Approach
- Logistic Regression
    - Cost Function
    - Gradient Descent
    - Evaluation Metrics for Logistic Regression
        - Confusion Matrix
        - Precision and Recall
        - F-1 score
        - Area under ROC curve
        - Logarithmic Loss

<br><br><br><br>
## Approach
***
- After learning basics, lets see how the Loan Prediction data set could use the same techniques
- We want further insight to what this data set looks like and how we would go about implementing this 
- It would be smart if we split this data set into a *Training Set* and *Test Set* 

    - I'll leave it to you to figure out why this would be appropriate

<br><br><br><br>
## Logistic Regression - Cost Function and Gradient Descent
***
- Till now we studied the intuition behind the Sigmoid Function

- We also studied how Logistic Regression works to get outputs in the range of [0,1]

- We discussed the interpretation of the output too! 

## Cost Function 
***
 - Fit θ parameters
 - Define the optimization object for the cost function we use to fit the parameters
     - Training set consists of **"m"** training examples
         - Each example has a **n+1** length column vector
***      
 
<center>Training set:  $\{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(m)},y^{(m)})\}$<br/><br/></center>
<center>m examples $x \epsilon \begin{bmatrix}x_{0}\\x_{1}\\\cdots\\x_{n}\end{bmatrix}$</center><center>$x_{0}=1,y\epsilon \{0,1\}$</center> 

 $$ h_{\theta}(x) = \frac{1}{1+e^{-\theta^TX}} $$
 
 ***
* This is the situation: 
  - Set of m training examples
  - Each example is a feature vector which is n+1 dimensional
  - $x_0$ = 1
  - y ∈ {0,1}
  - Hypothesis is based on parameters (θ)
      - **Given the training set how to we chose/fit θ?**



Our cost function for "m" training examples is: 
***
  $$ J(\theta) =- \frac{1}{m}[\sum_{i=1}^my^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))] $$

## Gradient Descent
***
 <center>Repeat for all $\theta_j$ simultaneously { <br> $\theta_j := \theta_j - \alpha\sum_{i=1}^m(h_{\theta}(x^{(i)})- y^{(i)})x_{j}^{(i)}$</center>
 <center>}

In [None]:
# Importing Libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

In [None]:
#Splitting Data
# download dataset from - 
# https://raw.githubusercontent.com/bluedataconsulting/AIMasteryProgram/main/Lab_Exercises/Module4/loan_prediction.csv
dataframe = pd.read_csv('https://raw.githubusercontent.com/rasyidev/well-known-datasets/main/loan_prediction.csv')

In [None]:
dataframe.head()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Loan_Status
0,5849,0.0,0.0,360.0,1.0,1
1,4583,1508.0,128.0,360.0,1.0,0
2,3000,0.0,66.0,360.0,1.0,1
3,2583,2358.0,120.0,360.0,1.0,1
4,6000,0.0,141.0,360.0,1.0,1


In [None]:
dataframe.shape

(614, 6)

In [None]:
dataframe.iloc[:, -1]

0      1
1      0
2      1
3      1
4      1
      ..
609    1
610    1
611    1
612    1
613    0
Name: Loan_Status, Length: 614, dtype: int64

In [None]:
dataframe.iloc[:, :-1]

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
0,5849,0.0,0.0,360.0,1.0
1,4583,1508.0,128.0,360.0,1.0
2,3000,0.0,66.0,360.0,1.0
3,2583,2358.0,120.0,360.0,1.0
4,6000,0.0,141.0,360.0,1.0
...,...,...,...,...,...
609,2900,0.0,71.0,360.0,1.0
610,4106,0.0,40.0,180.0,1.0
611,8072,240.0,253.0,360.0,1.0
612,7583,0.0,187.0,360.0,1.0


In [None]:
X = dataframe.iloc[:,:-1]
y = dataframe.iloc[:,-1]


In [None]:
# Training a logistic regression model
logistic_regressor = LogisticRegression(penalty='elasticnet',max_iter=1000,solver='saga',l1_ratio=0.5)
pipeline = Pipeline(steps=[('add_poly_features', PolynomialFeatures()),
                           ('logistic_regression', logistic_regressor)])

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.3) 
X_train, X_val, y_train, y_val = train_test_split(X_train,y_train, test_size = 0.07)

In [None]:
print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

(398, 5)
(31, 5)
(185, 5)


In [None]:
pipeline.fit(X_train, y_train)



Pipeline(steps=[('add_poly_features', PolynomialFeatures()),
                ('logistic_regression',
                 LogisticRegression(l1_ratio=0.5, max_iter=1000,
                                    penalty='elasticnet', solver='saga'))])

In [None]:
y_pred = pipeline.predict(X_test)
y_pred

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1])

In [None]:
y_test

160    1
226    0
348    1
565    1
127    1
      ..
140    0
144    1
452    0
366    0
222    1
Name: Loan_Status, Length: 185, dtype: int64

*******
## Evalution Metrics for Logistic Regression
***

* As we already know, we use different metrics for regression and classification
* We know that we can use `MSE` for regression problems and `Accuracy` for classification problems
* However, these might not be the best metrics in every situation<br><br>

## Evalution Metrics for Logistic Regression
***

* Following are the types of Classification Metrics :
    * Confusion Matrix
    * Classification Matrix
    * F1 Score
    * Area under ROC curve
    * Classification Report
    * Logarithmic Loss

<br><br><br><br>

### Confusion Matrix
***
- The confusion matrix is a handy presentation of the accuracy of a model with two or more classes. Below is an example of a Confusion Matrix 
<br><br>


| Value | Fraud  | Not Fraud |
|---|---|---|
| Predicted Fraud | 1 | 1 |
| Predicted Not Fraud | 2 | 996 |


    True Positives (TP): These are predicted yes and actually yes (Top Left)
    True Negatives (TN): We predicted no, and actually no (Top Right) 
    False Positives (FP): We predicted yes, but actually no. (AKA "Type I error.") (Top Right) 
    False Negatives (FN): We predicted no, but yes. (AKA"Type II error.") (Bottom Left)


### Confusion Matrix
***
* Classification accuracy is the number of correct predictions **(TN + TP)** made as a ratio of all predictions made. **(TN + TP +FN + FP)**<br><br>
It is suitable when :
* There are an equal number of observations in each class
* That all predictions and prediction errors are equally important,which is often not the case.

In [None]:
# Applying confusion matrix on above data
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred)

array([[  2,  58],
       [  1, 124]])

### Precision
***

$$Precision = \frac {(True +ves)} {(True +ves  +  False +ves)}$$



In [None]:
from sklearn.metrics import confusion_matrix, precision_score, recall_score
precision_score(y_test,y_pred)

0.6813186813186813

### Recall 
***

$$Recall = \frac {(True +ves)} {(True +ves  +  False -ves)}$$

In [None]:
recall_score(y_test,y_pred)

0.992

## Trade - Off: Precision Vs. Recall 
***
- This is more of a in-class activity! 
- Think about this: What happens if we get an increased value of Precision? Do you think that would lower Recall? And vice-versa? 

- Think of an example! And use easy numerical calculations too. You can just use a pencil and paper, no need for code! 

- [**Hint**: There is a trade-off!] 

### F1 Score
***
 - To deal with this Trade-off we calculate something known as the F-1 Score: F1 score is a good approach to minimize a bias towards either the Precision or the Recall

 $$F1 Score = \frac {2PR} {P + R} $$

***
F1 Score is defined as 

<center>$2*\frac{precision*recall}{precision+recall}$</center>

* tp = true positive
* tn = true negative
* fp = false positive
* fn = false negative


***
- Using this intuition, we want to calculate the F-1 Score to better understand the evaluation of our model

- Let's see how to implement this in Python! 

In [None]:
## code for f-1 score

from sklearn.metrics import f1_score
f1_score(y_test,y_pred)

0.781144781144781

## Area under ROC Curve
***
ROC (Receiver Operating Characteristic) Curve tells us about how good the model can distinguish between two things (e.g If a patient has a disease or no). Better models can accurately distinguish between the two. Whereas, a poor model will have difficulties in distinguishing between the two.
Area under ROC Curve (or AUC for short) is a performance metric for binary classification problems.
- The AUC represents a model’s ability to discriminate between positive and negative classes.
 - An area of 1.0 represents a model that made all predictions perfectly. An area of 0.5 represents a model as good as random.<br>
**Brain Teaser**: What does area < 0.5 signify?


***
ROC can be broken down into sensitivity and specificity. Let's understand these concepts.


## Sensitivity and Specificity.
Let us take an example of patients having a disease.

In simple terms, the proportion of patients that were identified correctly to have the disease (i.e. True Positive) upon the total number of patients who actually have the disease is called as Sensitivity or Recall.


Similarly, the proportion of patients that were identified correctly to not have the disease (i.e. True Negative) upon the total number of patients who do not have the disease is called as Specificity.


Trade-off between Sensitivity and Specificity
When we decrease the threshold, we get more positive values thus increasing the sensitivity. Meanwhile, this will decrease the specificity.

Similarly, when we increase the threshold, we get more negative values thus increasing the specificity and decreasing sensitivity.

## ROC 
ROC is nothing but a plot of sensitivity also known as True Positive Rate against (1-specficity) also known as False Positive Rate for different values of threshold.

In [None]:
from sklearn import metrics
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred, pos_label=1)
metrics.auc(fpr, tpr)

0.5126666666666666

## Logarithmic Loss
***
Logarithmic loss (or logloss) is a performance metric for evaluating the predictions of probabilities of membership to a given class


Where,
* N is the number of samples or instances,
* M is the number of possible labels,
* y<sub>ij</sub> is a binary indicator of whether or not label j is the correct classification for instance i,
* p<sub>ij</sub> is the model probability of assigning label j to instance i.

 $$ Logloss=- \frac{1}{n}\sum_{i=1}^n[y_ilogp_{i}+(1-y_{i})log(1-p_{i})] $$



***
* The scalar probability between 0 and 1 can be seen as a measure of confidence for a prediction by an algorithm.<br>
* Predictions that are correct or incorrect are rewarded or punished proportionally to the confidence of the prediction.<br>
* logloss nearer to 0 is better, with 0 representing a perfect logloss. 

In [None]:
from sklearn.metrics import log_loss
log_loss(y_test,y_pred)

12.135496444973763

### Thank you!

In [None]:
from PIL import Image, ImageOps

open_image = image.open("C:\Users\ASUS\Pictures\Arsip Sindoro\IMG_20210626_063606.jpg")

SyntaxError: ignored