##**Machine Learning 🖥**
---
#**Logistic Regression**
**𝑩𝒚 ⟹ 𝑷𝑹𝑰𝑵𝑪𝑬💗**

---

###**Formula For Accuracy in Classification**


* **Accuracy = (TP + TN) / (TP + TN + FP + FN)**
* **Precision = TP / (TP + FP)**
* **Recall = TP / (TP + FN)**

where,

TP = True Positive

TN = True Negative

FP = False Positive

FN = False Negative

⟶ F1 Score is the **Harmonic Mean** of **Precision** and **Recall.**
* **F1 Score = (2 * Precision * Recall) / (Precision + Recall)** 


####**Mounting Google Drive**

In [2]:
# from google.colab import drive
# drive.mount('/content/drive')

**Normalized Data ⟶ Mean and Median are equivalent or almost equal.** 

###***Python Implementation For Logistic Regression ⟹***

**Importing Some Important Libraries**

In [3]:
# Importing required Libraries
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [6]:
# Upload and Read the Data
df = pd.read_csv('/Bank.csv')

FileNotFoundError: [Errno 2] No such file or directory: '/Bank.csv'

In [None]:
# First Five Rows
df.head()

In [None]:
# Last Five Rows
df.tail()

In [None]:
# Info
df.info()

In [None]:
# Shape => (Rows, Columns)
df.shape

In [None]:
# Checking For Null Values
df.isna().apply(pd.value_counts).T

In [None]:
# Describe
df.describe().T

###**Visualization**

In [None]:
from matplotlib import pyplot as plt
import seaborn as sns

####**Distplot**

In [None]:
sns.distplot(df['Age'], color = 'red')

In [None]:
sns.distplot(df['Experience'], color ='g')

**'Age' and 'Experience' have equally distributed data (Normalized Graph).**

###**'ID' and "ZIP Code' are not needed because they have not training feature.** 

In [None]:
#'ID' and "ZIP Code' are not needed because they have not training feature.
df.drop(['ID', 'ZIP Code'], axis = 1, inplace = True)
df.head()

In [None]:
df.shape

###**Pairplot**

In [None]:
sns.pairplot(df)

In [None]:
sns.pairplot(df, hue = 'Personal Loan')

###**Boxplot**

In [None]:
sns.boxplot(df['Age'])

In [None]:
sns.boxplot(df['Experience'])

###**Displot**

In [None]:
sns.displot(x = 'Age', data = df, color = 'maroon')

In [None]:
sns.displot(x = 'Experience', data = df, color = 'darkgreen')
plt.show()

In [None]:
sns.displot(x = 'Experience', data = df, hue = 'Personal Loan', color = 'red')
plt.show()

In [None]:
sns.displot(x = 'Experience', data = df, hue = 'Education')
plt.show()

###**Correlation**

In [None]:
corr = df.corr()
corr

In [None]:
corr[['Personal Loan']]

####**Plot the Correlation Matrix**

In [None]:
plt.figure(figsize = (10,6))
plt.title('Correlation Matrix of Personal Loan')
sns.heatmap(corr[['Personal Loan']],vmax = 1.0, vmin = -1.0, annot = True, cmap = 'inferno', fmt = 'g')
plt.show()

###**Define Independent and Dependent Variable**
**To define independent variable, drop dependent variable.**

In [None]:
# Define Independent Variable
# To define independent variable, drop dependent variable.

In [None]:
x = df.drop('Personal Loan', axis  = 1)

In [None]:
# Define Dependent Variable
y = df[['Personal Loan']]

###**Train Test Split**

In [None]:
# Train Test Split at ration 70:30
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 1 )

###**Third Machine Learning Model**

In [None]:
from sklearn.linear_model import LogisticRegression
# Our Third ML Model
model = LogisticRegression()

###**Train**

In [None]:
# Train
model.fit(x_train, y_train)

###**Test or Prediction**

In [None]:
# Test or predict
y_pred = model.predict(x_test)
y_pred

###**Accuracy**
**Test Accuracy and Training Accuracy should be closer to each other and test accuracy will  be taken under consideration, not the training accuracy**

In [None]:
# **Test Accuracy and Training Accuracy should be closer to each other and 
#test accuracy will  be taken under consideration, not the training accuracy.**
model.score(x_train, y_train)     # Training Accuracy

In [None]:
model.score(x_test, y_test)     #Accuracy => Test Accuracy

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

###**Confusion Matrix** 

In [None]:
from sklearn.metrics import confusion_matrix, classification_report
cf = confusion_matrix(y_test, y_pred)
cf

**TP = 1334  |  FP = 17**

**FN = 70    |   TN = 79**

####**Plotting the Confusion Matrix**


In [None]:
# Plotting the confusion matrix
plt.figure(figsize = (10,5))
plt.title('Confusion Matrix of Logistic Regression', fontsize = 16)
sns.heatmap(cf, annot = True, cmap = 'spring', fmt = 'g')
plt.show()


**Prove Accuracy by Confusion Matrix**

In [None]:
(1334+79)/(1334+79+70+17)    # Should be equal to 0.942

###**Classification Report**

In [None]:
# Classification Report : Use print keyword for proper (better) format.
print(classification_report(y_test, y_pred))

##**How KNN Fails ?**

###***Drawbacks ⟶***

* **KNN mistreats or misclassifies outliers.**
* **The misclassification will reduce our accuracy.**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(x_train,y_train)

In [None]:
# Prediction
knn_pred = knn.predict(x_test)
knn_pred

In [None]:
# ACcuracy 
accuracy_score(y_test, knn_pred)

In [None]:
knn.score(x_train, y_train)     # Training Accuracy

**In KNN training accuracy = 0.9551 and test accuracy or accuracy = 0.9033**

**Difference between training accuracy and test accuracy should be minimal, hardly 0.8 but In KNN this difference is almost 5 .**

####**So, K-Nearest Neighbors Model fails here.**


####**Confusion Matrix of KNN Model**

In [None]:
# Confusion Matrix of KNN
cf_knn = confusion_matrix(y_test, knn_pred)
cf_knn

In [None]:
plt.figure(figsize = (10,5))
plt.title('Confusion Matrix of KNN Model', fontsize = 14)
sns.heatmap(cf_knn, annot = True, cmap = 'cividis', fmt = 'g')
plt.show()

####**Classification Report of KNN Model**

In [None]:
# Classification Report of KNN Model
print(classification_report(y_test, knn_pred))