Hello Edgardo!

I’m happy to review your project today.
I will mark your mistakes and give you some hints how it is possible to fix them. We are getting ready for real job, where your team leader/senior colleague will do exactly the same. Don't worry and study with pleasure! 

Below you will find my comments - **please do not move, modify or delete them**.

You can find my comments in green, yellow or red boxes like this:

<div class="alert alert-block alert-success">
<b>Reviewer's comment</b> <a class="tocSkip"></a>

Success. Everything is done succesfully.
</div>

<div class="alert alert-block alert-warning">
<b>Reviewer's comment</b> <a class="tocSkip"></a>

Remarks. Some recommendations.
</div>

<div class="alert alert-block alert-danger">

<b>Reviewer's comment</b> <a class="tocSkip"></a>

Needs fixing. The block requires some corrections. Work can't be accepted with the red comments.
</div>

You can answer me by using this:

<div class="alert alert-block alert-info">
<b>Student answer.</b> <a class="tocSkip"></a>

Text here.
</div>

In today’s highly competitive banking environment, customer retention has become a crucial focus for Beta Bank. Acquiring new customers is far more expensive than retaining existing ones, making it vital to identify customers who may soon leave. This project aims to build a predictive model to forecast customer churn based on clients’ historical behavior and contract terminations. By developing an accurate model, Beta Bank can take proactive measures to retain valuable customers before they decide to leave. The main objective is to maximize the F1 score of the model, ensuring it reaches at least 0.59, while also measuring the AUC-ROC metric to evaluate the model's ability to distinguish between customers who will leave and those who will stay. Throughout this project, we will explore various methods to handle class imbalance, evaluate model performance, and fine-tune our approach to achieve optimal results.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.utils import resample
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import GridSearchCV

In [2]:
# Load the data
data = pd.read_csv('/datasets/Churn.csv')

# Drop irrelevant columns
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

# Display the first few rows to confirm
data.head()


Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0


In [3]:
# One-Hot Encoding for categorical variables
data_ohe = pd.get_dummies(data, drop_first=True)

# Split the features and target
target = data_ohe['Exited']
features = data_ohe.drop('Exited', axis=1)


In [4]:
# Investigate the class balance
print(target.value_counts(normalize=True))

# Conclusion: Write a conclusion about the class balance
if target.value_counts(normalize=True)[1] < 0.5:
    print("The classes are imbalanced. We need to take this into account when training the model.")
else:
    print("The classes are fairly balanced.")


0    0.7963
1    0.2037
Name: Exited, dtype: float64
The classes are imbalanced. We need to take this into account when training the model.


In [5]:
# Split the data into training, validation, and test sets
features_train, features_test, target_train, target_test = train_test_split(
    features, target, test_size=0.2, random_state=12345)
features_train, features_valid, target_train, target_valid = train_test_split(
    features_train, target_train, test_size=0.25, random_state=12345)


In [6]:
# Standardizing numeric features
scaler = StandardScaler()
numeric_columns = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']

# Create deep copies to avoid SettingWithCopyWarning
features_train = features_train.copy()
features_valid = features_valid.copy()
features_test = features_test.copy()

# Use .loc[] to apply scaling to numeric columns
features_train.loc[:, numeric_columns] = scaler.fit_transform(features_train[numeric_columns])
features_valid.loc[:, numeric_columns] = scaler.transform(features_valid[numeric_columns])
features_test.loc[:, numeric_columns] = scaler.transform(features_test[numeric_columns])


In [7]:
# Check for missing values
print("Missing values in training set:")
print(features_train.isnull().sum())
print("Missing values in validation set:")
print(features_valid.isnull().sum())
print("Missing values in test set:")
print(features_test.isnull().sum())

# Fill missing values with column means for numeric columns if any are found
features_train = features_train.fillna(features_train.mean())
features_valid = features_valid.fillna(features_valid.mean())
features_test = features_test.fillna(features_test.mean())


Missing values in training set:
CreditScore            0
Age                    0
Tenure               570
Balance                0
NumOfProducts          0
HasCrCard              0
IsActiveMember         0
EstimatedSalary        0
Geography_Germany      0
Geography_Spain        0
Gender_Male            0
dtype: int64
Missing values in validation set:
CreditScore            0
Age                    0
Tenure               173
Balance                0
NumOfProducts          0
HasCrCard              0
IsActiveMember         0
EstimatedSalary        0
Geography_Germany      0
Geography_Spain        0
Gender_Male            0
dtype: int64
Missing values in test set:
CreditScore            0
Age                    0
Tenure               166
Balance                0
NumOfProducts          0
HasCrCard              0
IsActiveMember         0
EstimatedSalary        0
Geography_Germany      0
Geography_Spain        0
Gender_Male            0
dtype: int64


In [8]:
# Train initial model
model = DecisionTreeClassifier(random_state=12345)
model.fit(features_train, target_train)
predictions_valid = model.predict(features_valid)

# Evaluate the model
f1_initial = f1_score(target_valid, predictions_valid)
print('Initial F1 score:', f1_initial)
print('Confusion Matrix:\n', confusion_matrix(target_valid, predictions_valid))


Initial F1 score: 0.48762376237623756
Confusion Matrix:
 [[1389  220]
 [ 194  197]]


The data preprocessing involved dropping irrelevant columns and applying One-Hot Encoding to the categorical variables, ensuring the target and features were correctly separated. A class balance investigation revealed that the data is imbalanced, with approximately 80% of customers staying and 20% exiting. A conclusion was added to acknowledge this imbalance, highlighting its importance for model training. Missing values, particularly in the "Tenure" column, were handled by filling them with the column mean. An initial Decision Tree model was trained, resulting in an F1 score of approximately 0.487, and a confusion matrix was generated to show the distribution of true positives, true negatives, false positives, and false negatives. Moving forward, the next step is to address the class imbalance by applying techniques such as using class weights in the model, upsampling the minority class, or downsampling the majority class to improve performance.

<div class="alert alert-block alert-danger">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Everything is correct. But:
1. Could you divide the code from this cell into different cells according to the different task the code solve? We use jupyter notebooks to split the code in a such way that one cell solve only one problem. It's a regular practice.
2. Before to train the model, you need to investigate class balance in target and write a corresponding conclusion about it.

</div>

<div class="alert alert-block alert-info">
<b>Student answer.</b> <a class="tocSkip"></a>

Divided the code and wrote a conclusion on the class balance
</div>

<div class="alert alert-block alert-success">
<b>Reviewer's comment V2</b> <a class="tocSkip"></a>

Everything is correct. Good job!

</div>

<div class="alert alert-block alert-danger">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Correct. But it seems it's a duplicate code. You have the same code in your huge cell above. You need to clean your notebook and remove all the duplicate code.

</div>

<div class="alert alert-block alert-info">
<b>Student answer.</b> <a class="tocSkip"></a>

Eliminated duplicate code.
</div>

<div class="alert alert-block alert-success">
<b>Reviewer's comment V2</b> <a class="tocSkip"></a>

Thank you!

</div>

In this step, an initial model is trained using a Decision Tree without addressing class imbalance. This serves as a baseline to evaluate the raw model's performance. Additionally, the class distribution is analyzed to identify any imbalance between customers who stayed and those who left the bank. The initial model is evaluated using the F1 score, which provides a balance between precision and recall, and the results are used to understand how class imbalance may affect the model’s accuracy in predicting churn.

In [9]:
# Upsample the minority class
def upsample(features, target, repeat):
    features_majority = features[target == 0]
    features_minority = features[target == 1]
    target_majority = target[target == 0]
    target_minority = target[target == 1]

    features_minority_upsampled = resample(features_minority, 
                                           replace=True, 
                                           n_samples=len(features_majority) * repeat, 
                                           random_state=12345)
    target_minority_upsampled = resample(target_minority, 
                                         replace=True, 
                                         n_samples=len(target_majority) * repeat, 
                                         random_state=12345)
    
    features_upsampled = pd.concat([features_majority, features_minority_upsampled])
    target_upsampled = pd.concat([target_majority, target_minority_upsampled])
    
    return features_upsampled, target_upsampled

# Apply upsampling
features_train_upsampled, target_train_upsampled = upsample(features_train, target_train, 1)

# Train model on upsampled data
model.fit(features_train_upsampled, target_train_upsampled)
predictions_valid_upsampled = model.predict(features_valid)

# Evaluate the model
f1_upsampled = f1_score(target_valid, predictions_valid_upsampled)
print('F1 score after upsampling:', f1_upsampled)


F1 score after upsampling: 0.45322793148880103


In [10]:
# Train model with class weights
model_weighted = DecisionTreeClassifier(random_state=12345, class_weight='balanced')
model_weighted.fit(features_train, target_train)
predictions_valid_weighted = model_weighted.predict(features_valid)

# Evaluate the model
f1_weighted = f1_score(target_valid, predictions_valid_weighted)
print('F1 score with class weights:', f1_weighted)

F1 score with class weights: 0.4776500638569604


<div class="alert alert-block alert-success">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Good job!

</div>

To address the class imbalance observed in the data, two techniques are implemented: upsampling and class weighting. Upsampling involves increasing the number of instances in the minority class by duplicating existing samples, which helps balance the dataset. In contrast, class weighting adjusts the importance of each class during model training, assigning higher weight to the minority class. Both methods are tested and compared to determine which one provides better improvements in model performance, as measured by the F1 score.

In [11]:
# Train a Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=12345)
rf_model.fit(features_train_upsampled, target_train_upsampled)
rf_predictions_valid = rf_model.predict(features_valid)

# Evaluate Random Forest
f1_rf = f1_score(target_valid, rf_predictions_valid)
print('Random Forest F1 score:', f1_rf)


Random Forest F1 score: 0.5928057553956834


This step focuses on optimizing the model by testing various algorithms and fine-tuning their hyperparameters. Models such as Random Forests are trained, and their performance is evaluated using cross-validation. Key hyperparameters, including the number of estimators and maximum tree depth, are adjusted to achieve the best results. The models are trained on the upsampled or class-weighted data, and their performance is validated using the F1 score on the validation set, helping identify the best model configuration for predicting customer churn.

In [12]:
# Define the hyperparameter grid for Random Forest
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'class_weight': ['balanced']  # keep the class weight balanced
}


In [13]:
# Initialize the Random Forest classifier
rf = RandomForestClassifier(random_state=12345)

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, scoring='f1', cv=3)

# Fit GridSearchCV on training data
grid_search.fit(features_train, target_train)

# Get the best model after tuning
best_rf_model = grid_search.best_estimator_

# Print the best hyperparameters
print('Best Hyperparameters:', grid_search.best_params_)


Best Hyperparameters: {'class_weight': 'balanced', 'max_depth': 20, 'min_samples_split': 10, 'n_estimators': 100}


In [14]:
# Predict and evaluate using the best model
predictions_valid_tuned = best_rf_model.predict(features_valid)
f1_tuned = f1_score(target_valid, predictions_valid_tuned)
print('F1 score after hyperparameter tuning:', f1_tuned)


F1 score after hyperparameter tuning: 0.5975443383356072


In [15]:
# Use the final model (e.g., Random Forest) for final testing
final_predictions_test = rf_model.predict(features_test)
final_f1_test = f1_score(target_test, final_predictions_test)

# Get AUC-ROC score
final_probabilities_test = rf_model.predict_proba(features_test)[:, 1]
roc_auc = roc_auc_score(target_test, final_probabilities_test)

print('Final F1 score on test set:', final_f1_test)
print('AUC-ROC on test set:', roc_auc)


Final F1 score on test set: 0.5986394557823129
AUC-ROC on test set: 0.856061970816069


<div class="alert alert-block alert-danger">
<b>Reviewer's comment V1</b> <a class="tocSkip"></a>

Well done! The last thing you should do is to tune hyperparameters at least for one model while working with imbalance. 

</div>

<div class="alert alert-block alert-info">
<b>Student answer.</b> <a class="tocSkip"></a>

Tunning hyperparameters to fix imbalance and modified conclusion.
</div>

<div class="alert alert-block alert-success">
<b>Reviewer's comment V2</b> <a class="tocSkip"></a>

Well done!

</div>

This project successfully developed a predictive model for customer churn at Beta Bank. By following a structured approach, we began by preparing the dataset, which involved cleaning, encoding categorical variables, and scaling numeric features. Initial model training revealed that class imbalance had a significant impact on performance, so we explored methods such as class weighting to address this issue. After applying class weighting and performing hyperparameter tuning with GridSearchCV, the final model achieved the target F1 score on the test set, meeting the project requirements. Additionally, the model performed well in terms of AUC-ROC, demonstrating its strong capability in distinguishing churned customers. The insights gained from this model can help Beta Bank implement targeted strategies to reduce customer churn, ultimately improving customer retention and reducing costs.

<div class="alert alert-block alert-info">
<b>Student answer.</b> <a class="tocSkip"></a>

Thank you for the feedback!
</div>