<b><font size="6">Logistic Regression</font><a class="anchor"><a id='toc'></a></b><br>

__`Step 1`__ - Import the data and pandas

In [None]:
import pandas as pd
tugas = pd.read_csv('final_tugas.csv')
tugas

__`Step 2`__ - Data partition
- Assign all the variables excluding the DepVar to the object `data`
- Assign the dependent variable to the object `target`
- Import the needed library to make the partition of the dataset
- Split the data and the target to X_train, X_test, y_train, y_test, where `test_size` should be equal to 0.2, `random_state` equal to 5 the `stratify` equal to `target`

In [None]:
data = tugas.drop(['DepVar'], axis=1)
target = tugas['DepVar']

In [None]:
#make the split here
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(data,target, test_size=0.2, random_state=5, stratify=target)

__`Step 3`__ - Import the model and create an instance

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html'>sklearn.linear_model.LogisticRegression(fit_intercept=True,...)</a>

__Definition:__ <br>
Applies Logistic Regression classifier.

__Parameters:__ <br>
*fit_intercept*: whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations; <br>
...
</div>

In [None]:
from sklearn.linear_model import LogisticRegression
log_model = LogisticRegression()

__`Step 4`__ - Fit the model to the train data

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html'>sklearn.linear_model.LogisticRegression().fit(X,y,...)</a>

__Definition:__ <br>
Fit logistic model in the training data.

__Parameters:__ <br>
X : The regressors in my training dataset; <br>
y : The target in my training dataset; <br>
...
</div>

In [None]:
log_model.fit(X_train,y_train)

__`Step 5`__ - Use the model to predict the labels of the test data. Assign them to **y_pred**.

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html'>sklearn.linear_model.LogisticRegression().predict(X)</a>

__Definition:__ <br>
Predict class labels for samples in X.

__Parameters:__ <br>
X : Samples to predict; <br>
...

</div>

In [None]:
y_pred = log_model.predict(X_test)
y_pred

***Note:*** You can get the actual probabilities of each sample instead of the assigned class using the method predict_proba()

In [None]:
pred_prob = log_model.predict_proba(X_test)
pred_prob

***Note:*** In the same way as for the linear regression, you can get the coefficients and intercept

In [None]:
log_model.coef_

__`Step 6`__ - Evaluate the model

***Note:*** Since we are predicting a categorical target (classification) we use other metrics to evaluate our model than if we were solving a regression problem. Also, for the logistic regression the R-squared cannot be obtained in the same way as we obtain it in the linear case.

### The confusion matrix

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix'>sklearn.metrics.confusion_matrix(y_true, y_pred, ...)</a>

__Definition:__ <br>
Compute confusion matrix to evaluate the accuracy of a classification

__Parameters:__ <br>
_y_true_: Ground truth (correct) target values.; <br>
_y_pred_: Estimated targets as returned by a classifier.; <br>
...
</div>

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
cm = confusion_matrix(y_test, y_pred)
cm

The confusion matrix in sklearn is presented in the following format: <br>
[ [ TN  FP  ] <br>
    [ FN  TP ] ]

### The accuracy score
<img src="img/accuracy.png" alt="Drawing" style="width: 300px;"/>

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score'>sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True,...)</a>

__Definition:__ <br>
Accuracy classification score.

__Interpretation:__ <br>
If normalize is True, then the best performance is 1. When normalize = False, then the best performance is the number of samples.

__Parameters:__ <br>
_y_true_: Ground truth (correct) target values.; <br>
_y_pred_: Estimated targets as returned by a classifier.; <br>
_normalize_: If False, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples. <br>
...
</div>

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
accuracy = accuracy_score(y_test, y_pred)
accuracy

### The precision
<img src="img/precision.png" alt="Drawing" style="width: 200px;"/>

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score'>sklearn.metrics.precision_score(y_true, y_pred, ...)</a>

__Definition:__ <br>
Compute the precision.

__Interpretation:__ <br>
The best value is 1, and the worst value is 0.

__Parameters:__ <br>
_y_true_: Ground truth (correct) target values.; <br>
_y_pred_: Estimated targets as returned by a classifier.; <br>
...
</div>

In [None]:
from sklearn.metrics import precision_score

In [None]:
precision = precision_score(y_test, y_pred)
precision

### The recall
<img src="img/recall.png" alt="Drawing" style="width: 180px;"/>

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html#sklearn.metrics.recall_score'>sklearn.metrics.recall_score(y_true, y_pred, ...)</a>

__Definition:__ <br>
Compute the recall.

__Interpretation:__ <br>
The best value is 1 and the worst value is 0.

__Parameters:__ <br>
_y_true_: Ground truth (correct) target values.; <br>
_y_pred_: Estimated targets as returned by a classifier.; <br>
...
</div>

In [None]:
from sklearn.metrics import recall_score

In [None]:
recall_score(y_test, y_pred)

### The F1 Score
<img src="img/f1.png" alt="Drawing" style="width: 270px;"/>

<div class="alert alert-block alert-info">
<a href = 'https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score'>sklearn.metrics.f1_score(y_true, y_pred, ...)</a>

__Definition:__ <br>
Compute the F1 score, also known as balanced F-score or F-measure.

__Interpretation:__ <br>
F1 score reaches its best value at 1 and worst score at 0.

__Parameters:__ <br>
_y_true_: Ground truth (correct) target values.; <br>
_y_pred_: Estimated targets as returned by a classifier.; <br>
...
</div>

In [None]:
from sklearn.metrics import f1_score

In [None]:
f1 = f1_score(y_test, y_pred)
f1