Beta Bank customers are leaving: little by little, chipping away every month. The bankers figured out it’s cheaper to save the existing customers rather than to attract new ones.
We need to predict whether a customer will leave the bank soon. We have the data on clients’ past behavior and termination of contracts with the bank.
We will build a model with the maximum possible F1 score. We need an F1 score of at least 0.59. Check the F1 for the test set.
Additionally, we will measure the AUC-ROC metric and compare it with the F1.

Features

    RowNumber — data string index
    CustomerId — unique customer identifier
    Surname — surname
    CreditScore — credit score
    Geography — country of residence
    Gender — gender
    Age — age
    Tenure — period of maturation for a customer’s fixed deposit (years)
    Balance — account balance
    NumOfProducts — number of banking products used by the customer
    HasCrCard — customer has a credit card
    IsActiveMember — customer’s activeness
    EstimatedSalary — estimated salary
    
Target

    Exited — сustomer has left


# Download and prepare the data, and explain the procedure


In [114]:
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.linear_model import LogisticRegression
from joblib import dump
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import shuffle
from sklearn.metrics import roc_auc_score

In [115]:
df_bank = pd.read_csv('/datasets/Churn.csv')

In [116]:
df_bank.info()
df_bank.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB


Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0


There are a few columns which we will need to disregard (RowNumber, CustomerId, Surname) as they will have no effect on the target (Exited). There are almost a thousand null values in Tenure, which we will need to handle in order to later encode some of the data (Geography, Gender)

In [117]:
df_bank['Tenure'].fillna(df_bank['Tenure'].median(), axis = 0, inplace=True)
df_bank['Tenure'].value_counts()

5.0     1836
1.0      952
2.0      950
8.0      933
3.0      928
7.0      925
4.0      885
9.0      882
6.0      881
10.0     446
0.0      382
Name: Tenure, dtype: int64

Encoding the Geography and Gender columns

In [118]:
# encoder = LabelEncoder()
# df_bank['Geography'] = encoder.fit_transform(df_bank['Geography'])
# df_bank['Gender'] = encoder.fit_transform(df_bank['Gender'])
# df_bank.head()

df_bank = pd.get_dummies(df_bank, columns=['Geography', 'Gender'], drop_first = True)
df_bank.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_Germany,Geography_Spain,Gender_Male
0,1,15634602,Hargrave,619,42,2.0,0.0,1,1,1,101348.88,1,0,0,0
1,2,15647311,Hill,608,41,1.0,83807.86,1,0,1,112542.58,0,0,1,0
2,3,15619304,Onio,502,42,8.0,159660.8,3,1,0,113931.57,1,0,0,0
3,4,15701354,Boni,699,39,1.0,0.0,2,0,0,93826.63,0,0,0,0
4,5,15737888,Mitchell,850,43,2.0,125510.82,1,1,1,79084.1,0,0,1,0


# Examine the balance of classes. Train the model without taking into account the imbalance, and describe our findings

We will examine the balnce of classes and the effectiveness of different models using the loops from the previous project

In [119]:
train, validate, test = np.split(df_bank.sample(frac=1, random_state=12345), [int(.6*len(df_bank)), int(.8*len(df_bank))])

In [120]:
features_train = train.drop(['Exited', 'RowNumber', 'CustomerId', 'Surname'], axis=1)
target_train = train['Exited']
features_valid = validate.drop(['Exited', 'RowNumber', 'CustomerId', 'Surname'], axis=1)
target_valid = validate['Exited']
features_test = test.drop(['Exited', 'RowNumber', 'CustomerId', 'Surname'], axis=1)
target_test = test['Exited']

In [121]:
best_score = 0
for depth in range(1, 20):
    model = DecisionTreeClassifier(random_state=54321, max_depth=depth) # create a model with the given depth
    model.fit(features_train, target_train) # train the model
    predicted_valid = model.predict(features_valid) # get the model's predictions
    score = f1_score(target_valid, predicted_valid) # calculate accuracy score on validation set        
    if score > best_score:
        best_score = score
        best_depth = depth

        
print("F1 Score of the best model:", best_score)
print("Best depth:", best_depth)

F1 Score of the best model: 0.5371775417298937
Best depth: 7


The model yielded unsatisfactory result using decision tree on imbalanced data.

In [122]:
model = LogisticRegression(random_state=54321, solver='lbfgs') 
model.fit(features_train, target_train) 
predicted_valid = model.predict(features_valid)
score = f1_score(target_valid, predicted_valid)
print(score)

0.08067940552016985


Terrible results from the model using logistic regression, doubt we will use it again here

In [123]:
best_score = 0
best_est = 0
for est in range(1, 15): # choose hyperparameter range
    for depth in range(1, 20):
        model = RandomForestClassifier(random_state=54321, n_estimators=est, max_depth = depth) # set number of trees
        model.fit(features_train, target_train) # train model on training set
        predicted_valid = model.predict(features_valid) # get the model's predictions
        score = f1_score(target_valid, predicted_valid) # calculate accuracy score on validation set        
        if score > best_score:
            best_score = score
            best_est = est
            best_depth = depth

print("F1 Score of the best model on the validation set (n_estimators = {}): {}".format(best_est, best_score))
print("Best depth:", best_depth)

F1 Score of the best model on the validation set (n_estimators = 13): 0.532724505327245
Best depth: 18


Results from the model trained with random forest yielded slightly worse results than the decision tree

Since we couldn't train a model to bring us the needed result with the imbalanced data, we will jump straight to balancing the data, then repeat the training

# We will improve the quality of the model. We will train Upsampling and Downsampling to try and fix class imbalance. We will use the training set to pick the best parameters. We will train different models on training and validation sets. We will find the best one, briefly describe our findings

First we define an upsampling function

In [124]:
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=54321
    )

    return features_upsampled, target_upsampled

Now we train the model using upsampled data

In [125]:
best_score = 0
for repeat in range(1,5):
    features_upsampled, target_upsampled = upsample(features_train, target_train, repeat)
    for depth in range(1, 20):
        model = DecisionTreeClassifier(random_state=54321, max_depth=depth) 
        model.fit(features_upsampled, target_upsampled) 
        predicted_valid = model.predict(features_valid) 
        score = f1_score(target_valid, predicted_valid) 
        if score > best_score:
            best_score = score
            best_depth = depth
            best_repeat = repeat

        
print("F1 Score of the best model:", best_score)
print("Best depth:", best_depth)
print("Best repeat:", best_repeat)


F1 Score of the best model: 0.5551425030978934
Best depth: 9
Best repeat: 2


On to try logistic regression again

In [126]:
best_score = 0
best_repeat = 0
for repeat in range(1,5):
    features_upsampled, target_upsampled = upsample(features_train, target_train, repeat)
    model = LogisticRegression(random_state=54321, solver='lbfgs') 
    model.fit(features_upsampled, target_upsampled) 
    predicted_valid = model.predict(features_valid)
    score = f1_score(target_valid, predicted_valid)
    if score > best_score:
        best_score = score
        best_repeat = repeat

print("F1 Score of the best model:", best_score)
print("Best repeat:", best_repeat)

F1 Score of the best model: 0.4338537387017256
Best repeat: 4


On to random forest

In [127]:
best_score = 0
best_est = 0
best_repeat = 0
for repeat in range(1,5):
    features_upsampled, target_upsampled = upsample(features_train, target_train, repeat)
    for est in range(1, 15): # choose hyperparameter range
        for depth in range(1, 20):
            model = RandomForestClassifier(random_state=54321, n_estimators=est, max_depth = depth)
            model.fit(features_upsampled, target_upsampled)
            predicted_valid = model.predict(features_valid) 
            score = f1_score(target_valid, predicted_valid)  
            if score > best_score:
                best_score = score
                best_est = est
                best_depth = depth
                best_repeat = repeat

print("F1 Score of the best model on the validation set (n_estimators = {}): {}".format(best_est, best_score))
print("Best depth:", best_depth)
print("Best repeat:", best_repeat)

F1 Score of the best model on the validation set (n_estimators = 11): 0.599758162031439
Best depth: 10
Best repeat: 3


We have a very good result using the random forest with upsampled data

Now we define a downsampling function

In [128]:
def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat(
        [features_zeros.sample(frac=fraction, random_state=12345)]
        + [features_ones]
    )
    target_downsampled = pd.concat(
        [target_zeros.sample(frac=fraction, random_state=12345)]
        + [target_ones]
    )

    features_downsampled, target_downsampled = shuffle(
        features_downsampled, target_downsampled, random_state=12345
    )

    return features_downsampled, target_downsampled

In [129]:
best_score = 0
best_frac = 0
for frac in np.arange(0.1, 0.7, 0.05):
    features_downsampled, target_downsampled = downsample(features_train, target_train, frac)
    for depth in range(1, 20):
        model = DecisionTreeClassifier(random_state=54321, max_depth=depth) # create a model with the given depth
        model.fit(features_downsampled, target_downsampled) # train the model
        predicted_valid = model.predict(features_valid) # get the model's predictions
        score = f1_score(target_valid, predicted_valid) # calculate accuracy score on validation set        
        if score > best_score:
            best_score = score
            best_depth = depth
            best_frac = frac

        
print("F1 Score of the best model:", best_score)
print("Best depth:", best_depth)
print("Best fraction:", best_frac)

F1 Score of the best model: 0.5773955773955773
Best depth: 6
Best fraction: 0.5500000000000002


In [130]:
best_frac = 0
best_score = 0
for frac in np.arange(0.1, 0.7, 0.05):
    features_downsampled, target_downsampled = downsample(features_train, target_train, frac)
    model = LogisticRegression(random_state=54321, solver='lbfgs') 
    model.fit(features_downsampled, target_downsampled) 
    predicted_valid = model.predict(features_valid)
    score = f1_score(target_valid, predicted_valid)
    if score > best_score:
        best_score = score
        best_frac = frac
print("F1 Score of the best model:", best_score)
print("Best fraction:", best_frac)

F1 Score of the best model: 0.4282157676348548
Best fraction: 0.25000000000000006


In [131]:
best_score = 0
best_est = 0
best_frac = 0
for frac in np.arange(0.1, 0.7, 0.05):
    features_downsampled, target_downsampled = downsample(features_train, target_train, frac)
    for est in range(1, 15): # choose hyperparameter range
        for depth in range(1, 20):
            model = RandomForestClassifier(random_state=54321, n_estimators=est, max_depth = depth) # set number of trees
            model.fit(features_downsampled, target_downsampled) # train model on training set
            predicted_valid = model.predict(features_valid) # get the model's predictions
            score = f1_score(target_valid, predicted_valid) # calculate accuracy score on validation set        
            if score > best_score:
                best_score = score
                best_est = est
                best_depth = depth
                best_frac = frac

print("F1 Score of the best model on the validation set (n_estimators = {}): {}".format(best_est, best_score))
print("Best depth:", best_depth)
print("Best fraction:", best_frac)

F1 Score of the best model on the validation set (n_estimators = 14): 0.5927710843373494
Best depth: 18
Best fraction: 0.45000000000000007


Our best model was with random forest with 11 estimators and depth of 10 using upsampled data

# Final Test

We will now use our best model and use it on our test data

In [132]:
model = RandomForestClassifier(random_state=54321, n_estimators=11, max_depth = 10) 
model.fit(features_upsampled, target_upsampled)
predicted_test = model.predict(features_test)
result = f1_score(target_test, predicted_test) 
result

0.6546610169491526

Very satisfactory result! Onwards to the AUC-ROC test

In [133]:
probabilities_test = model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]
auc_roc = roc_auc_score(target_test, probabilities_one_test)
print(auc_roc)

0.8667102957259627


The AUC-ROC test yielded way higher result than the F1 result using our best model

# Final Conclusion

We made a pretty reliable churn-prediction model by Beta Bank's strandards (65% rate compared to 59% goal).
Additionally the same model produced an excellent AUC-ROC score of 85%. It is safe to say management will be pleased