**Title: Bank Churn Project**

**Objective:** Bank Customer Churn Prediction Using Machine Learning and Understating Data Encoding, Feature Scaling, Handling Imblance Data, Support Vector Machine Classifier, Grid Search for Hyperparameter Tunning

Data Source: YBI Foundation

**Import Library**

In [117]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

**Data Presprocessing Start**

**Import Data**

In [118]:
df = pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/Bank%20Churn%20Modelling.csv')

**Describe Data**

In [206]:
df.head()

In [205]:
df.info()

In [207]:
df.duplicated('CustomerId').sum()

In [122]:
df = df.set_index('CustomerId')

In [123]:
df.info()

**Describe data**

In [124]:
df['Geography'].value_counts()

In [125]:
df.replace({'Geography': {'France': 2, 'Germany': 1, 'Spain': 0}}, inplace = True)

In [126]:
df['Gender'].value_counts()

In [127]:
df.replace({'Gender': {'Male': 0, 'Female': 1}}, inplace = True)

In [128]:
df['Num Of Products'].value_counts()

In [129]:
df.replace({'Num of Products': {1:0, 2:1, 3:1, 4:1}}, inplace = True)

In [130]:
df['Has Credit Card'].value_counts()

In [131]:
df['Is Active Member'].value_counts()

In [132]:
df.loc[(df['Balance']==0), 'Churn'].value_counts()

In [133]:
df['Zero Balance'] = np.where(df['Balance']>0, 1, 0)

**Data Visualiztion**

In [134]:
df['Zero Balance'].hist()

In [135]:
df.groupby(['Churn', 'Geography']).count()

Define Target Variable and Feature Variables

In [136]:
df.columns

In [137]:
x = df.drop(['Surname', 'Churn'], axis = 1)

In [138]:
y = df['Churn']

In [139]:
x.shape, y.shape

In [140]:
df['Churn'].value_counts()

In [141]:
sns.countplot(x = 'Churn', data = df)

In [142]:
x.shape, y.shape

**Data Preprocessing Ends**

Random under sampling


In [143]:
from imblearn.under_sampling import RandomUnderSampler

In [144]:
rus = RandomUnderSampler(random_state = 2529)

In [145]:
x_rus , y_rus = rus.fit_resample(x, y);

In [146]:
x_rus.shape, y_rus.shape, x.shape, y.shape

In [147]:
y.value_counts()

In [148]:
y_rus.value_counts()

In [149]:
y_rus.plot(kind = 'hist')

Random Over Sampling

In [150]:
from imblearn.over_sampling import RandomOverSampler

In [151]:
ros = RandomOverSampler(random_state = 2529)

In [152]:
x_ros, y_ros = ros.fit_resample(x,y)

In [153]:
x_ros.shape, y_ros.shape, x.shape, y.shape

In [154]:
y.value_counts()

In [155]:
y_ros.value_counts()

In [156]:
y_ros.plot(kind = 'hist')

Train Test Split

In [157]:
from sklearn.model_selection import train_test_split

In [158]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 2539)

In [159]:
x_train_rus, x_test_rus, y_train_rus, y_test_rus = train_test_split(x_rus, y_rus, test_size = 0.3, random_state = 2539)

In [160]:
x_train_ros, x_test_ros, y_train_ros, y_test_ros = train_test_split(x_ros, y_ros, test_size = 0.3, random_state = 2539)

Standardize Features

In [161]:
from sklearn.preprocessing import StandardScaler

In [162]:
sc = StandardScaler()

In [163]:
x_train[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary']] = sc.fit_transform(x_train[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary' ]])

In [164]:
x_test[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary']] = sc.fit_transform(x_test[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary' ]])

In [165]:
x_train_rus[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary']] = sc.fit_transform(x_train_rus[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary' ]])

In [166]:
x_test_rus[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary']] = sc.fit_transform(x_test_rus[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary' ]])

In [167]:
x_train_ros[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary']] = sc.fit_transform(x_train_ros[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary' ]])

In [168]:
x_test_ros[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary']] = sc.fit_transform(x_test_ros[['CreditScore', 'Age', 'Tenure', 'Balance', 'Estimated Salary' ]])

Support Vector Machine Classifier

In [169]:
from sklearn.svm import SVC

In [170]:
svc = SVC()

In [171]:
svc.fit(x_train, y_train)

In [172]:
y_pred = svc.predict(x_test)

**Model** **Evalution** **&** **Accuracy**

In [173]:
from sklearn.metrics import confusion_matrix, classification_report

In [174]:
confusion_matrix(y_test, y_pred)

In [175]:
print(classification_report(y_test, y_pred))

Hyperparameter Tunning

In [176]:
from sklearn.model_selection import GridSearchCV

In [177]:
param_grid = {'C': [0.1,1,10], 'gamma':[1,0.1,0.01], 'kernel': ['rbf'], 'class_weight': ['balanced']}

In [178]:
grid = GridSearchCV(SVC(),param_grid, refit = True, verbose = 2, cv = 2 )
grid.fit(x_train, y_train)

In [179]:
print(grid.best_estimator_)

In [180]:
grid_predictions = grid.predict(x_test)

In [181]:
confusion_matrix(y_test, grid_predictions)

**Prediction**

In [182]:
print(classification_report(y_test, grid_predictions))

Model with Random Under Sampling

In [183]:
svc_rus = SVC()

In [184]:
svc_rus.fit(x_train_rus, y_train_rus)

In [185]:
y_pred_rus = svc_rus.predict(x_test_rus)

In [186]:
confusion_matrix(y_test_rus, y_pred_rus)

In [187]:
print(classification_report(y_test_rus, y_pred_rus))

In [188]:
param_grid = {'C': [0.1,1,10], 'gamma':[1,0.1,0.01], 'kernel': ['rbf'], 'class_weight': ['balanced']}

In [189]:
grid_rus = GridSearchCV(SVC(),param_grid, refit = True, verbose = 2, cv = 2 )
grid_rus.fit(x_train_rus, y_train_rus)

In [190]:
print(grid_rus.best_estimator_)

In [191]:
grid_predictions_rus = grid_rus.predict(x_test_rus)

In [192]:
confusion_matrix(y_test_rus, grid_predictions_rus)

**Prediction**

In [193]:
print(classification_report(y_test_rus, grid_predictions_rus))

Model with Random Over Sampling

In [194]:
svc_ros = SVC()

In [195]:
svc_ros.fit(x_train_ros, y_train_ros)

In [196]:
y_pred_ros = svc_ros.predict(x_test_ros)

In [197]:
confusion_matrix(y_test_ros, y_pred_ros)

In [198]:
print(classification_report(y_test_ros, y_pred_ros))

In [199]:
param_grid = {'C': [0.1,1,10], 'gamma':[1,0.1,0.01], 'kernel': ['rbf'], 'class_weight': ['balanced']}

In [200]:
grid_ros = GridSearchCV(SVC(),param_grid, refit = True, verbose = 2, cv = 2 )
grid_ros.fit(x_train_ros, y_train_ros)

In [201]:
print(grid_ros.best_estimator_)

In [202]:
grid_predictions_ros = grid_ros.predict(x_test_ros)

In [203]:
confusion_matrix(y_test_ros, grid_predictions_ros)

**Prediction**

In [204]:
print(classification_report(y_test_ros, grid_predictions_ros))

**Explanation**

In this project we use the Hyperparameter Tunning for getting the more accurate results as in our previous model the prediction of churn is less. So we use it and get better results.