# Logisitic Regression

Logistic regressions are a supervised machine learning algorithm where the target variable is categorical. 
To know if a logistic regression performs well, we look at the accuracy result.

For this notebook, I use the UniversalBank.csv dataset available in the dataset folder

### SKlearn Perspective

Importing Librairies and loading data

In [12]:
from sklearn.linear_model import LogisticRegression
import pandas as pd

In [13]:
df = pd.read_csv("UniversalBank.csv")

Dummifying Variables

In [14]:
df = pd.get_dummies(df, columns = ['Education'], drop_first=True)

Setting Target and Predictors

In [15]:
X = df.iloc[:,2:14]
y = df["Personal Loan"]

Splitting the data into test set and train set

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 5)

Running the regression

In [17]:
lr = LogisticRegression(max_iter=5000) 
model = lr.fit(X_train,y_train) 

Results

In [18]:
model.intercept_
model.coef_

array([[-1.59501286e-02,  1.93645203e-02,  5.35063664e-02,
         6.01109865e-01,  1.47261071e-01,  2.17532229e-04,
        -4.02879830e-01,  3.11724630e+00, -5.08154080e-01,
        -8.92305788e-01,  3.22653589e+00,  3.32614996e+00]])

Prediction based on the test set

In [19]:
y_test_pred = model.predict(X_test)


Accuracy Score

In [21]:
from sklearn import metrics
metrics.accuracy_score(y_test, y_test_pred)

0.9545454545454546

Confusion matrix

In [23]:
print(pd.DataFrame(metrics.confusion_matrix(y_test, y_test_pred, labels=[0,1]), index=['true:0', 'true:1'], columns=['pred:0', 'pred:1']))

        pred:0  pred:1
true:0    1475      10
true:1      65     100


Confusion Matrix are used to evaluate the classification task. We use it to calculate the Recall & Precision

In [27]:
print('Precision',metrics.precision_score(y_test, y_test_pred))

Precision 0.9090909090909091


In [28]:
print('Recall',metrics.recall_score(y_test, y_test_pred))

Recall 0.6060606060606061


In [29]:
print('F1 score',metrics.f1_score(y_test, y_test_pred)
 )

F1 score 0.7272727272727273
