Hello Elizabeth!

I’m happy to review your project today.
I will mark your mistakes and give you some hints how it is possible to fix them. We are getting ready for real job, where your team leader/senior colleague will do exactly the same. Don't worry and study with pleasure! 

Below you will find my comments - **please do not move, modify or delete them**.

You can find my comments in green, yellow or red boxes like this:

<div class="alert alert-block alert-success">
<b>Reviewer's comment</b> <a class="tocSkip"></a>

Success. Everything is done succesfully.
</div>

<div class="alert alert-block alert-warning">
<b>Reviewer's comment</b> <a class="tocSkip"></a>

Remarks. Some recommendations.
</div>

<div class="alert alert-block alert-danger">

<b>Reviewer's comment</b> <a class="tocSkip"></a>

Needs fixing. The block requires some corrections. Work can't be accepted with the red comments.
</div>

You can answer me by using this:

<div class="alert alert-block alert-info">
<b>Student answer.</b> <a class="tocSkip"></a>

Thank you so much for the feedback, I appreacaite it! I should have double checked before submitting. Thanks! 
</div>



# Beta Bank Customer Saving

The purpose of this project is to develop a model based on data from Beta Bank's clients' past behavior and terminatior of contracts with the bank, to help predict if a client will terminate their contract. Identifying these customers early can help the bank try to implement measures to save their existing business as that is cheaper than attracting new ones. 

The data that we will be using in this analysis is as follows: 

Features
- RowNumber — data string index
- CustomerId — unique customer identifier
- Surname — surname
- CreditScore — credit score
- Geography — country of residence
- Gender — gender
- Age — age
- Tenure — period of maturation for a customer’s fixed deposit (years)
- Balance — account balance
- NumOfProducts — number of banking products used by the customer
- HasCrCard — customer has a credit card
- IsActiveMember — customer’s activeness
- EstimatedSalary — estimated salary

Target
- Exited — сustomer has left

## Importing and Preparing the Data

First we will import our data, review it and prepare it so that it is useable for our analysis and to build our model.

In [1]:
#Import Libraries that will be Needed
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.utils import shuffle
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
import numpy as np

In [2]:
#Download Dataset

data = pd.read_csv('/datasets/Churn.csv')

In [3]:
#Briefly Review the Data

data.info()
print()
print(data.sample(5))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB

      RowNumber  CustomerId    Surname  CreditScore Geography  Gender  Age  \
4881       4882

The Tenure feature seemes to be the only one with missing values, but it is a relatively low percentage.  We can still review some of these values to determine if it worth dropping these missing values, filling them, or leaving them alone.

In [4]:
print(data['Tenure'].value_counts())
print()
print(data['Tenure'].isna().sum())
print()
print(data[data['Tenure'].isna()])
print()
print((data['Tenure'].isna().sum())/10000) #seeing what percentage of the data is NA

1.0     952
2.0     950
8.0     933
3.0     928
5.0     927
7.0     925
4.0     885
9.0     882
6.0     881
10.0    446
0.0     382
Name: Tenure, dtype: int64

909

      RowNumber  CustomerId    Surname  CreditScore Geography  Gender  Age  \
30           31    15589475    Azikiwe          591     Spain  Female   39   
48           49    15766205        Yin          550   Germany    Male   38   
51           52    15768193  Trevisani          585   Germany    Male   36   
53           54    15702298   Parkhill          655   Germany    Male   41   
60           61    15651280     Hunter          742   Germany    Male   35   
...         ...         ...        ...          ...       ...     ...  ...   
9944       9945    15703923    Cameron          744   Germany    Male   41   
9956       9957    15707861      Nucci          520    France  Female   46   
9964       9965    15642785    Douglas          479    France    Male   34   
9985       9986    15586914     Nepean          659    

About 9% of our data is missing a value for Tenure. There doesn't seem to be a clear reason as to why these values are missing and there is still valuable data about these individuals behaviors other than how long they were with the Bank. For this reason we will fill these values with the mean or median value. 

In [5]:
print(data['Tenure'].describe())

count    9091.000000
mean        4.997690
std         2.894723
min         0.000000
25%         2.000000
50%         5.000000
75%         7.000000
max        10.000000
Name: Tenure, dtype: float64


The mean and the median are relatively close, so we will fill the missing values with the mean.

In [6]:
data['Tenure'].fillna(data['Tenure'].mean(), inplace=True)
print(data['Tenure'].isna().sum())

0


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Correct

</div>

Next we will want to prepare the data for our model. There are 3 features that are object type data: Surname, Geography and Gender. We will want to use OHE as we will be doing a linear regression model and need all values to be numerical.  To do this we will use the pd.get_dummies fxn to encode our whole data set.

In [7]:
#Getting an idea of the unique values in each of these features so we can get a sense of how many more features will be added
print(data['Surname'].value_counts())
print(data['Geography'].value_counts())
print(data['Gender'].value_counts())

Smith        32
Scott        29
Martin       29
Walker       28
Brown        26
             ..
McMasters     1
Kincaid       1
Vessels       1
Gidney        1
Huie          1
Name: Surname, Length: 2932, dtype: int64
France     5014
Germany    2509
Spain      2477
Name: Geography, dtype: int64
Male      5457
Female    4543
Name: Gender, dtype: int64


<div class="alert alert-block alert-warning">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Do 'Surname' column is really important feature for our task? Can 'Surname' affect on our predictions? Is it worth to create 2931 columns due to this feature? Of course, no. It's not necessary to use all the features from the given dataset. You can always remove all excess features. It can not only save some computational resourses but also it can siginficantly improve model quality.

</div>

In [8]:
data_ohe = pd.get_dummies(data, drop_first=True)
target = data_ohe['Exited']
features = data_ohe.drop('Exited', axis=1)

print(data_ohe.sample(5))
data_ohe.info()

      RowNumber  CustomerId  CreditScore  Age  Tenure    Balance  \
3382       3383    15570629          655   72     5.0  138089.97   
669         670    15662397          640   42     5.0  176099.13   
7218       7219    15767231          757   36     7.0  144852.06   
334         335    15742668          626   37     6.0  108269.37   
2746       2747    15655794          620   36     8.0       0.00   

      NumOfProducts  HasCrCard  IsActiveMember  EstimatedSalary  ...  \
3382              2          1               1         99920.41  ...   
669               1          1               1          8404.73  ...   
7218              1          0               0        130861.95  ...   
334               1          1               0          5597.94  ...   
2746              2          1               1        145937.99  ...   

      Surname_Zotova  Surname_Zox  Surname_Zubarev  Surname_Zubareva  \
3382               0            0                0                 0   
669           

<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Correct

</div>

Now our data frame has many more columns, especially as there is a very large amount of Surnames. We also dropped the original column to help mitigate the risk of us falling into the dummytrap.

Next we will want to split our data into a training set, a validation set and a test set.  Then we will want to look at our numerical features that aren't 0,1 like our new encoded features (RowNumber, CustomerID, CreditScore, Age, Tenure, Balance, NumOfProducts, and EstimatedSalary) and standardize them so all the features are considered equally important before the algorithm execution.

In [9]:
#Creating our training, validation and test sets and identifying our numeric features
# 1- split into training (60%) and a temp set (40%)
data_train, data_temp = train_test_split(data_ohe, test_size=0.40, random_state=12345)

# 2- split the temp set into validation (20%) and test (20%)
data_valid, data_test = train_test_split(data_temp, test_size=0.50, random_state=12345)

data_train.info()
data_valid.info()
data_test.info()

# 3- create Training set features and target

features_train= data_train.drop(['Exited'], axis = 1)
target_train = data_train['Exited']

features_valid= data_valid.drop(['Exited'], axis = 1)
target_valid = data_valid['Exited']

features_test= data_test.drop(['Exited'], axis = 1)
target_test = data_test['Exited']

numeric = ['RowNumber', 'CustomerId', 'CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6000 entries, 7479 to 4578
Columns: 2945 entries, RowNumber to Gender_Male
dtypes: float64(3), int64(8), uint8(2934)
memory usage: 17.3 MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2000 entries, 8532 to 6895
Columns: 2945 entries, RowNumber to Gender_Male
dtypes: float64(3), int64(8), uint8(2934)
memory usage: 5.8 MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2000 entries, 7041 to 3366
Columns: 2945 entries, RowNumber to Gender_Male
dtypes: float64(3), int64(8), uint8(2934)
memory usage: 5.8 MB


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Correct

</div>

In [10]:
#Standardized Numeric Features
scaler = StandardScaler()
scaler.fit(features_train[numeric])

features_train[numeric] = scaler.transform(features_train[numeric])
features_valid[numeric] = scaler.transform(features_valid[numeric])
features_test[numeric] = scaler.transform(features_test[numeric])

print(features_train.shape)

(6000, 2944)


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Good job!

</div>

## Examine the Balance of Classes

Now that our data is prepared for model training we can look at the balance of our target (Exited) classes which are 0 and 1, representing if clients stayed or left. Once we determine the balance or imbalance of these classes we will train the model without taking it into account to see how things look

In [11]:
#print value counts for our target to get an idea of the distribution of our classes

print(data_ohe['Exited'].value_counts())

0    7963
1    2037
Name: Exited, dtype: int64


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Correct

</div>

Here we can see that there is a very significant imbalance in our classes. Based on this data it seems that significantly more clients stay than leave.  Now lets train our model without addressing this and assess it's quality based on it's F1 score and the Area Under the ROC curve.

In [12]:
model = LogisticRegression(random_state=12345, solver= 'liblinear')
model.fit(features_train, target_train)
predicted_valid = model.predict(features_valid)

f1 = f1_score(target_valid,predicted_valid)
print(f1)

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print(auc_roc)

0.35024549918166936
0.7424948130583902


Based on this model our F1 score is very low, meaning that our current model is fairly poor quality and won't be able to accurately predict if customers will plan on leaving soon. However, our area under the curve is fairly good as a random model would get a value of 0.5

<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Correct

</div>

## Improve Model Quality

In this step we will work on fixing the class imbalance with at least 2 different approaches and run through different iterations of our model to pick the best parameters. 

First we will start by balancing the class weights to see if this improves the quality of our model.

In [13]:
model = LogisticRegression(random_state=12345, solver='liblinear', class_weight = 'balanced')
model.fit(features_train, target_train)
predicted_valid = model.predict(features_valid)

print('F1:', f1_score(target_valid, predicted_valid))

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print('AUC-ROC:', auc_roc)

F1: 0.469406392694064
AUC-ROC: 0.736893521010894


Balancing the class weights did improve our F1 score by over 10%.  Our AUC-ROC did drop, but only slightly. This is a good sign that the overall quality of our model is improving.  Next we will try upsampling, as we are trying to find the patterns of customers who are going to leave, it will likely be benefit for our model to get more exposure to that class as it is less frequent in our current data.

In [14]:
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=12345
    )

    return features_upsampled, target_upsampled


features_upsampled, target_upsampled = upsample(
    features_train, target_train, 10
)

model = LogisticRegression (random_state = 12345, solver = 'liblinear', class_weight = 'balanced')
model.fit(features_upsampled,target_upsampled)
predicted_valid = model.predict(features_valid)

print('F1:', f1_score(target_valid, predicted_valid))

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)


print('AUC-ROC:', auc_roc)

F1: 0.4519774011299435
AUC-ROC: 0.7148618731059345


It seems that with upsampling both our F1 score and AUC-ROC go down. I even ran through multiple different iterations of the repeat number and it was highest at 1 (meaning it was not repeated. I left it at 10 for an example, but my assumption prior to running this model was incorrect and we likely will not want to incorporate upsampling into our final model. Next we will try downsampling to see if removing some of the more frequent values of clients who stayed will improve our model more.

In [15]:
def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat(
        [features_zeros.sample(frac=fraction, random_state=12345)]
        + [features_ones]
    )
    target_downsampled = pd.concat(
        [target_zeros.sample(frac=fraction, random_state=12345)]
        + [target_ones]
    )

    features_downsampled, target_downsampled = shuffle(
        features_downsampled, target_downsampled, random_state=12345
    )

    return features_downsampled, target_downsampled

features_downsampled, target_downsampled = downsample(
    features_train, target_train, 0.1
)

model = LogisticRegression(random_state = 12345, solver = 'liblinear', class_weight= 'balanced')
model.fit(features_downsampled, target_downsampled)
predicted_valid = model.predict(features_valid)

print('F1:', f1_score(target_valid, predicted_valid))

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)


print('AUC-ROC:', auc_roc)

F1: 0.47421093148575827
AUC-ROC: 0.7464054343420902


It seems that downsampling did improve our F1 Score and our AUC-ROC slightly, so we will keep that in our next model. Next we will try adjusting our threshold values.

In [16]:
model = LogisticRegression(random_state=12345, solver='liblinear', class_weight= 'balanced')
model.fit(features_downsampled, target_downsampled)
probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

for threshold in np.arange(0.5, 0.8, 0.02):
    predicted_valid = probabilities_one_valid > threshold
    precision = precision_score(target_valid,predicted_valid)
    recall = recall_score(target_valid,predicted_valid)
    f1 = f1_score(target_valid, predicted_valid)

    print(
            'Threshold = {:.2f} | Precision = {:.3f}, Recall = {:.3f}, F1 = {:.3f}'.format(
                threshold, precision, recall, f1
            )
        )

Threshold = 0.50 | Precision = 0.350, Recall = 0.737, F1 = 0.474
Threshold = 0.52 | Precision = 0.355, Recall = 0.713, F1 = 0.474
Threshold = 0.54 | Precision = 0.363, Recall = 0.684, F1 = 0.474
Threshold = 0.56 | Precision = 0.373, Recall = 0.658, F1 = 0.476
Threshold = 0.58 | Precision = 0.392, Recall = 0.648, F1 = 0.489
Threshold = 0.60 | Precision = 0.390, Recall = 0.617, F1 = 0.478
Threshold = 0.62 | Precision = 0.395, Recall = 0.593, F1 = 0.474
Threshold = 0.64 | Precision = 0.404, Recall = 0.572, F1 = 0.474
Threshold = 0.66 | Precision = 0.412, Recall = 0.545, F1 = 0.470
Threshold = 0.68 | Precision = 0.424, Recall = 0.507, F1 = 0.462
Threshold = 0.70 | Precision = 0.434, Recall = 0.481, F1 = 0.456
Threshold = 0.72 | Precision = 0.444, Recall = 0.457, F1 = 0.450
Threshold = 0.74 | Precision = 0.454, Recall = 0.435, F1 = 0.444
Threshold = 0.76 | Precision = 0.472, Recall = 0.416, F1 = 0.442
Threshold = 0.78 | Precision = 0.478, Recall = 0.390, F1 = 0.430
Threshold = 0.80 | Precis

<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Good job!

</div>

Here we can see that our model has the best f1 score at a threshold of 0.6. But we are still not achieving our minimum F1 score of 0.59. This tells us that logistic regression is likely not the best algorithm for our model. We will switch to RandomForestClassifier and see if we can get better results. 

In [17]:
model = RandomForestClassifier(random_state=12345)
model.fit(features_train, target_train)
predicted_valid = model.predict(features_valid)

print('F1:', f1_score(target_valid, predicted_valid))

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)


print('AUC-ROC:', auc_roc)

F1: 0.4844290657439447
AUC-ROC: 0.8240999219690417


Just switching to Random Forest model improved our F1 score and our AUC-ROC. Now lets try setting our class_weight parameter to balanced.

In [18]:
model = RandomForestClassifier(random_state=12345, class_weight='balanced')
model.fit(features_train, target_train)
predicted_valid = model.predict(features_valid)

print('F1:', f1_score(target_valid, predicted_valid))

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)


print('AUC-ROC:', auc_roc)

F1: 0.44563279857397503
AUC-ROC: 0.8314659839461889


In our Random Forest model this seems to lower our F1 score so we will not carry it through. Let's try to upsample with a Random Forest model. Since we created upsampled data sets we can just input those intout our training model.

In [19]:
model = RandomForestClassifier(random_state=12345)
model.fit(features_upsampled,target_upsampled)
predicted_valid = model.predict(features_valid)

print('F1:', f1_score(target_valid, predicted_valid))

probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)


print('AUC-ROC:', auc_roc)

F1: 0.5568181818181818
AUC-ROC: 0.8312081490935705


This did a great job increasing our F1 score to over 55%. Now we can see if changing our threshold can continue to improve our model.

In [20]:
model = RandomForestClassifier(random_state=12345)
model.fit(features_upsampled,target_upsampled)
probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

for threshold in np.arange(0.3, 0.6, 0.02):
    predicted_valid = probabilities_one_valid > threshold
    precision = precision_score(target_valid,predicted_valid)
    recall = recall_score(target_valid,predicted_valid)
    f1 = f1_score(target_valid, predicted_valid)

    print(
            'Threshold = {:.2f} | Precision = {:.3f}, Recall = {:.3f}, F1 = {:.3f}'.format(
                threshold, precision, recall, f1
            )
        )

Threshold = 0.30 | Precision = 0.479, Recall = 0.732, F1 = 0.579
Threshold = 0.32 | Precision = 0.492, Recall = 0.703, F1 = 0.579
Threshold = 0.34 | Precision = 0.512, Recall = 0.691, F1 = 0.589
Threshold = 0.36 | Precision = 0.534, Recall = 0.658, F1 = 0.589
Threshold = 0.38 | Precision = 0.561, Recall = 0.639, F1 = 0.597
Threshold = 0.40 | Precision = 0.570, Recall = 0.612, F1 = 0.591
Threshold = 0.42 | Precision = 0.580, Recall = 0.581, F1 = 0.581
Threshold = 0.44 | Precision = 0.598, Recall = 0.545, F1 = 0.571
Threshold = 0.46 | Precision = 0.632, Recall = 0.526, F1 = 0.574
Threshold = 0.48 | Precision = 0.661, Recall = 0.500, F1 = 0.569
Threshold = 0.50 | Precision = 0.685, Recall = 0.469, F1 = 0.557
Threshold = 0.52 | Precision = 0.708, Recall = 0.440, F1 = 0.543
Threshold = 0.54 | Precision = 0.739, Recall = 0.426, F1 = 0.540
Threshold = 0.56 | Precision = 0.769, Recall = 0.407, F1 = 0.532
Threshold = 0.58 | Precision = 0.773, Recall = 0.376, F1 = 0.506


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Correct

</div>

Here we can see that setting our threshold to 0.38 will give us our highest F1 score of 0.597 (enough to pass). Now we can tune our hyperparameters to see if we can improve our model quality even more before our final test. 

In [21]:
best_score = 0
best_est = 0
for est in range(10, 100, 10): # choose hyperparameter range
    model = RandomForestClassifier(random_state=12345, n_estimators= est) # set number of trees
    model.fit(features_upsampled,target_upsampled) # train model on training set
    predicted_valid = model.predict(features_valid)
    score = f1_score(target_valid, predicted_valid) # calculate F1 score on validation set
    if score > best_score:
        best_score = score# save best F1 score on validation set
        best_est = est# save number of estimators corresponding to best F1 score
        
print("F1 score of the best model on the validation set (n_estimators = {}): {}".format(best_est, best_score))

F1 score of the best model on the validation set (n_estimators = 90): 0.5633001422475106


In [22]:
best_score = 0
best_depth = 0
for depth in range(10, 100, 10): # choose hyperparameter range
    model = RandomForestClassifier(random_state=12345, max_depth=depth) # set number of trees
    model.fit(features_upsampled,target_upsampled) # train model on training set
    predicted_valid = model.predict(features_valid)
    score = f1_score(target_valid, predicted_valid) # calculate F1 score on validation set
    if score > best_score:
        best_score = score# save best F1 score on validation set
        best_depth = depth# save number of estimators corresponding to best F1 score
        
print("F1 score of the best model on the validation set (max_depth = {}): {}".format(best_depth, best_score))

F1 score of the best model on the validation set (max_depth = 90): 0.58952496954933


In [23]:
model = RandomForestClassifier(random_state=12345, n_estimators= 90, max_depth=90, class_weight = 'balanced')
model.fit(features_upsampled,target_upsampled)
predicted_valid = model.predict(features_valid)


probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

best_threshold = 0.38
predicted_valid = probabilities_one_valid > best_threshold

auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print('F1:', f1_score(target_valid, predicted_valid))

print('AUC-ROC:', auc_roc)

F1: 0.5957918050941308
AUC-ROC: 0.8326659367646791


In [24]:
model = RandomForestClassifier(random_state=12345, n_estimators= 90, max_depth=90, class_weight = 'balanced')
model.fit(features_upsampled,target_upsampled)
probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

for threshold in np.arange(0.3, 0.6, 0.02):
    predicted_valid = probabilities_one_valid > threshold
    precision = precision_score(target_valid,predicted_valid)
    recall = recall_score(target_valid,predicted_valid)
    f1 = f1_score(target_valid, predicted_valid)

    print(
            'Threshold = {:.2f} | Precision = {:.3f}, Recall = {:.3f}, F1 = {:.3f}'.format(
                threshold, precision, recall, f1
            )
        )

Threshold = 0.30 | Precision = 0.459, Recall = 0.746, F1 = 0.569
Threshold = 0.32 | Precision = 0.490, Recall = 0.727, F1 = 0.585
Threshold = 0.34 | Precision = 0.512, Recall = 0.701, F1 = 0.592
Threshold = 0.36 | Precision = 0.532, Recall = 0.675, F1 = 0.595
Threshold = 0.38 | Precision = 0.555, Recall = 0.644, F1 = 0.596
Threshold = 0.40 | Precision = 0.583, Recall = 0.629, F1 = 0.605
Threshold = 0.42 | Precision = 0.601, Recall = 0.598, F1 = 0.600
Threshold = 0.44 | Precision = 0.617, Recall = 0.567, F1 = 0.591
Threshold = 0.46 | Precision = 0.640, Recall = 0.548, F1 = 0.590
Threshold = 0.48 | Precision = 0.660, Recall = 0.514, F1 = 0.578
Threshold = 0.50 | Precision = 0.685, Recall = 0.478, F1 = 0.563
Threshold = 0.52 | Precision = 0.709, Recall = 0.455, F1 = 0.554
Threshold = 0.54 | Precision = 0.730, Recall = 0.414, F1 = 0.528
Threshold = 0.56 | Precision = 0.784, Recall = 0.400, F1 = 0.529
Threshold = 0.58 | Precision = 0.805, Recall = 0.385, F1 = 0.521


In [26]:
model = RandomForestClassifier(random_state=12345, n_estimators= 90, max_depth=90, class_weight = 'balanced')
model.fit(features_upsampled,target_upsampled)
predicted_valid = model.predict(features_valid)


probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]

best_threshold = 0.40
predicted_valid = probabilities_one_valid > best_threshold

auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print('F1:', f1_score(target_valid, predicted_valid))

print('AUC-ROC:', auc_roc)

F1: 0.6052934407364787
AUC-ROC: 0.8326659367646791


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Well done!

</div>

This final model gives us our best F1 score demonstrating acceptable quality with a good AUC-ROC that is well above a random model. I re-ran the threshold levels after tuning my hyperparameters, because I still wasn't getting a good enough result on the test set once we got there.  Now our model is of higher quality for our test. 

## Final Testing

In [27]:
model = RandomForestClassifier(random_state=12345, n_estimators= 90, max_depth=90, class_weight = 'balanced')
model.fit(features_upsampled,target_upsampled)
final_test = model.predict(features_test)


probabilities_test = model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]

best_threshold = 0.40
predicted_test = probabilities_one_test > best_threshold

auc_roc = roc_auc_score(target_test, probabilities_one_test)

print('F1:', f1_score(target_test, predicted_test))

print('AUC-ROC:', auc_roc)

F1: 0.5905420991926182
AUC-ROC: 0.8324166393082595


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Great work!

</div>

In conclusion, we now have a model that based on our F1 score can predict if customers are going to leave Beta Bank with relatively good recall and precision.  Additionally, with an AUC-ROC of well above 0.5 we can say that this model is better than a random model and we wouldn't just be guessing that customers may leave. 