# Bank Customer Churn Prediction 

Churn is a one of the biggest problem in the Banking industry. This solution focuses on identifying bank customers who are more likely to close their account and leave the bank based on factors such as credit core, tenure, salary etc. using a trainable ML model.

## Contents

1. Prequisites
2. Data Dictionary
3. Import Libraries
4. Load Input Data
5. Create Model
6. Predict Test Datapoints
7. Saving Prediction

## Prerequisites

To run this notebook you need to have install following packages:

- `pandas` to read/save csv files.
- `sklearn` to train model $\&$ generate prediction.

## Data Dictionary

- The input has to be a '.csv' file with 'utf-8' encoding. 
- PLEASE NOTE: If your input .csv file is not 'utf-8' encoded, model will not perform as expected.
- Required Features: `Exited`, `CreditScore`, `Age`, `Tenure`, `Balance`, `NumOfProducts`, `EstimatedSalary`, `BalanceSalaryRatio`, `TenureByAge`, `CreditScoreGivenAge`, `HasCrCard`, `IsActiveMember`, `Geography_Spain`, `Geography_France`, `Geography_Germany`, `Gender_Female`, `Gender_Male`.

## Import Libraries

In [3]:
import pandas as pd
from sklearn.svm import SVC 

## Load Input Data

In [4]:
# Read the data frame
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
train_df.head()

Unnamed: 0,Exited,CreditScore,Age,Tenure,Balance,NumOfProducts,EstimatedSalary,BalanceSalaryRatio,TenureByAge,CreditScoreGivenAge,HasCrCard,IsActiveMember,Geography_Spain,Geography_France,Geography_Germany,Gender_Female,Gender_Male
0,0,0.222,0.094595,0.6,0.0,0.333333,0.076118,0.0,0.432,0.323157,1,1,1,-1,-1,1,-1
1,0,0.538,0.22973,0.4,0.360358,0.0,0.102376,0.003317,0.205714,0.305211,1,1,-1,1,-1,1,-1
2,0,0.698,0.297297,0.8,0.486406,0.0,0.510225,0.000901,0.36,0.300198,1,-1,-1,1,-1,1,-1
3,0,0.416,0.310811,0.2,0.49513,0.0,0.555744,0.000843,0.087805,0.208238,1,1,-1,-1,1,-1,1
4,0,0.576,0.216216,0.5,0.532094,0.0,0.778145,0.000647,0.264706,0.330882,-1,1,-1,1,-1,-1,1


The Df has 1000 rows with 14 attributes. We review this further to identify what attributes will be necessary and what data manipulation needs to be carried out before Exploratory analysis and prediction modelling

In [5]:
train_df.columns

Index(['Exited', 'CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts',
       'EstimatedSalary', 'BalanceSalaryRatio', 'TenureByAge',
       'CreditScoreGivenAge', 'HasCrCard', 'IsActiveMember', 'Geography_Spain',
       'Geography_France', 'Geography_Germany', 'Gender_Female',
       'Gender_Male'],
      dtype='object')

In [6]:
# Check columns list and missing values
train_df.isnull().sum()

Exited                 0
CreditScore            0
Age                    0
Tenure                 0
Balance                0
NumOfProducts          0
EstimatedSalary        0
BalanceSalaryRatio     0
TenureByAge            0
CreditScoreGivenAge    0
HasCrCard              0
IsActiveMember         0
Geography_Spain        0
Geography_France       0
Geography_Germany      0
Gender_Female          0
Gender_Male            0
dtype: int64

## Create Model

In [7]:
X_train = train_df.drop(['Exited'], axis=1)
y_train = train_df['Exited']

# Fit SVM with RBF Kernel
SVM_RBF = SVC(C=100, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf', max_iter=-1, probability=True, 
              random_state=None, shrinking=True,tol=0.001, verbose=False)
              
SVM_RBF.fit(X_train, y_train)

SVC(C=100, gamma=0.1, probability=True)

## Predict Test Datapoints

In [8]:
predictions = SVM_RBF.predict(test_df)

## Saving Prediction

In [9]:
test_df['Exited_predictions'] = predictions

In [10]:
test_df.to_csv('output.csv', index=None)