<div style="border:solid blue 2px; padding: 20px">

**Overall Summary of the Project**

Hi Michael! You’ve demonstrated a strong understanding of the full modeling workflow in this project. From preprocessing to feature engineering, model training, and class imbalance correction, your notebook is very well thought-out and clearly documented. Excellent work!

---

📊 **Assessment Summary**

| Criteria                              | Rating     | Comments                                                                 |
|--------------------------------------|------------|--------------------------------------------------------------------------|
| Data Preprocessing                   | ✅ Solid    | Clear and effective. Dropped unneeded columns, handled nulls properly.   |
| Class Balance Analysis               | ✅ Clear     | Identified the imbalance, visualized it, and explained the impact.       |
| Baseline Modeling                    | ✅ Complete  | Tested Logistic Regression, Decision Tree, and Random Forest.            |
| Imbalance Handling (2+ methods)      | ✅ Thorough  | Used `class_weight`, `upsampling`, and `downsampling`.                   |
| Metric Usage (F1 + ROC-AUC)          | ✅ Well done | F1 was used as the main metric. ROC-AUC supported the evaluation.        |
| Final Evaluation on Test Set         | ✅ Present   | Final models were tested and evaluated fairly on unseen test data.       |
| Performance Threshold Met (F1 ≥ 0.59)| ✅ Achieved  | Achieved F1 scores above 0.61 on validation and ~0.608 on test.          |

---

✨ **Strengths**

- ✅ **Notebook structure is excellent** – clear sections, headings, and thoughtful use of comments and markdown.
- ✅ **Exploratory plots** and boxplots across features were useful for spotting patterns and distributions.
- ✅ **Reusable helper functions** for modeling and metrics – a great sign of code maturity.
- ✅ **Experiment tracking (`results` DataFrame)** made comparison clear and insightful.
- ✅ **Balanced and upsampled models both exceeded the target F1** with excellent ROC-AUCs.

---

🛠️ **Suggestions for Improvement (Optional)**

These are minor polish ideas and not required for project approval:

1. **Stratify train-test-validation splits**: While your splits are fine, stratifying ensures class proportions stay consistent across all sets (`stratify=target`).

2. **Avoid `SettingWithCopyWarning`**: When scaling numeric features, make sure to `.copy()` the DataFrame or use `.loc` to avoid pandas copy warnings.

3. **Hyperparameter tuning**: You used basic grid-style loops. For large-scale models like Random Forest, consider `GridSearchCV` or `RandomizedSearchCV` for efficiency.

4. **Drop some duplicate plots**: There are a few extra plots (e.g. repeated `df.head(0)`) and markdown feature lists that could be trimmed for clarity.

---

Michael, this is a well-structured and well-documented notebook that successfully applies a variety of modeling techniques to solve a business problem. Your thoughtful approach to class imbalance and use of different model evaluation strategies reflects great understanding. Nicely done!

Congrats again – excellent job! 🎉👏

# Title
Beta Bank Customer Churn Supervised Learning Analysis

# Description
Beta Bank customers are leaving: little by little, chipping away every month. The bankers figured out it’s cheaper to save the existing customers rather than to attract new ones.

We need to predict whether a customer will leave the bank soon. You have the data on clients’ past behavior and termination of contracts with the bank.

# Procedure
- Download and prepare the data. Explain the procedure.
-- Read data
-- Check for issues
-- Resolve any issues found
- Examine the balance of classes. Train the model without taking into account the imbalance. Briefly describe your findings.
- Improve the quality of the model. Make sure you use at least two approaches to fixing class imbalance. Use the training set to pick the best parameters. Train different models on training and validation sets. Find the best one. Briefly describe your findings.
-- Build a model with the maximum possible F1 score. To pass the project, you need an F1 score of at least 0.59. Check the F1 for the test set.
-- Additionally, measure the AUC-ROC metric and compare it with the F1.
-- Perform the final testing.

In [None]:
# Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, roc_auc_score, classification_report, confusion_matrix, precision_score, recall_score
from sklearn.utils import shuffle

In [None]:
# Declarations

# Target F1 to reach
target_f1 = 0.59 # I had plans to use this,not sure if I will.

# Track results
results = pd.DataFrame(columns=['model', 'f1', 'roc'])

# Import Data

In [None]:
# Import data

df = pd.read_csv('/datasets/Churn.csv')

# Check and resolve any issues

In [None]:
print(df.info())

Features

- RowNumber — data string index
- CustomerId — unique customer identifier
- Surname — surname
- CreditScore — credit score
- Geography — country of residence
- Gender — gender
- Age — age
- Tenure — period of maturation for a customer’s fixed deposit (years)
- Balance — account balance
- NumOfProducts — number of banking products used by the customer
- HasCrCard — customer has a credit card
- IsActiveMember — customer’s activeness
- EstimatedSalary — estimated salary

Target

- Exited — сustomer has left


In [None]:
print(df.head(0))

Column names contain uppercase letters, this could be corrected.

RowNumber, CustomerId, Surname are not needed for the analysis.

In [None]:
#drop unecessary colums:

df = df[['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Exited']]

In [None]:
print(df.describe())

In [None]:
print('Number of duplicates: ', df.duplicated().sum())

In [None]:
print('Number of null entries:')
print(df.isna().sum())

Tenure contains 909 null entries. 909/10000 = 9.09 % 

In [None]:
print(df[df['Tenure'].isna()].head(5))

In [None]:
sns.histplot(data=df, x="Tenure", kde=True)
plt.tight_layout()  # Adjust layout to prevent overlap
plt.show()


9.09% is too large to fill without influencing the analysis so they will be dropped.

In [None]:
# dropping null values due to quantity

df = df.dropna()

Gender and geography are categorical. Using OHE to convert to numerical features:

In [None]:
# 'Gender' and 'Geography' are categorical:

print('Gender: ', df['Gender'].unique())
print('Geography: ', df['Geography'].unique())

df = pd.get_dummies(df, columns=['Geography', 'Gender'], drop_first=True) #drop_first=True

print(df.head(5))

The churn data provided by the customer has been downloaded and prepared. After importing the data it was examined for issues.
- While column names contained uppercase letters this did not present an issue and was not corrected.
- RowNumber, CustomerId, Surname are not needed for the analysis and were dropped.
- No duplicates found.
- Tenure contained 909 null entries, 9.09% of the total. Due to the amount, they were dropped to avoid influencing the analysis.
- Gender and geography are categorical. OHE was used to convert to numerical features.

# Graphs

Features

- RowNumber — data string index
- CustomerId — unique customer identifier
- Surname — surname
- CreditScore — credit score
- Geography — country of residence
- Gender — gender
- Age — age
- Tenure — period of maturation for a customer’s fixed deposit (years)
- Balance — account balance
- NumOfProducts — number of banking products used by the customer
- HasCrCard — customer has a credit card
- IsActiveMember — customer’s activeness
- EstimatedSalary — estimated salary

Target

- Exited — сustomer has left

In [None]:
print(df.head(0))

In [None]:
# categorical

fig, axes = plt.subplots(6, 2, figsize=(12, 10))  # 2x2 grid

sns.histplot(data=df, x="NumOfProducts", kde=True, ax=axes[0, 0])
sns.boxplot(x='Exited', y='NumOfProducts', data=df, ax=axes[0, 1])

sns.histplot(data=df, x="HasCrCard", kde=True, ax=axes[1, 0])
sns.boxplot(x='Exited', y='HasCrCard', data=df, ax=axes[1, 1])

sns.histplot(data=df, x="IsActiveMember", kde=True, ax=axes[2, 0])
sns.boxplot(x='Exited', y='IsActiveMember', data=df, ax=axes[2, 1])

sns.histplot(data=df, x="Geography_Germany", kde=True, ax=axes[3, 0])
sns.boxplot(x='Exited', y='Geography_Germany', data=df, ax=axes[3, 1])

sns.histplot(data=df, x="Geography_Spain", kde=True, ax=axes[4, 0])
sns.boxplot(x='Exited', y='Geography_Spain', data=df, ax=axes[4, 1])

sns.histplot(data=df, x="Gender_Male", kde=True, ax=axes[5, 0])
sns.boxplot(x='Exited', y='Gender_Male', data=df, ax=axes[5, 1])

# Set titles for each subplot
axes[0, 0].set_title('NumOfProducts')
axes[0, 1].set_title('NumOfProducts')
axes[1, 0].set_title('HasCrCard')
axes[1, 1].set_title('HasCrCard')
axes[2, 0].set_title('IsActiveMember')
axes[2, 1].set_title('IsActiveMember')
axes[3, 0].set_title('Geography_Germany')
axes[3, 1].set_title('Geography_Germany')
axes[4, 0].set_title('Geography_Spain')
axes[4, 1].set_title('Geography_Spain')
axes[5, 0].set_title('Gender_Male')
axes[5, 1].set_title('Gender_Male')

plt.tight_layout()  # Adjust layout to prevent overlap
plt.show()

In [None]:
# numeric = ['CreditScore', 'Tenure', 'Balance', 'EstimatedSalary', 'Age']

fig, axes = plt.subplots(5, 2, figsize=(12, 10))  # 2x2 grid

sns.histplot(data=df, x="CreditScore", kde=True, ax=axes[0, 0])
sns.boxplot(x='Exited', y='CreditScore', data=df, ax=axes[0, 1])

sns.histplot(data=df, x="Age", kde=True, ax=axes[1, 0])
sns.boxplot(x='Exited', y='Age', data=df, ax=axes[1, 1])

sns.histplot(data=df, x="Tenure", kde=True, ax=axes[2, 0])
sns.boxplot(x='Exited', y='Tenure', data=df, ax=axes[2, 1])

sns.histplot(data=df, x="Balance", kde=True, ax=axes[3, 0])
sns.boxplot(x='Exited', y='Balance', data=df, ax=axes[3, 1])

sns.histplot(data=df, x="EstimatedSalary", kde=True, ax=axes[4, 0])
sns.boxplot(x='Exited', y='EstimatedSalary', data=df, ax=axes[4, 1])

# Set titles for each subplot
axes[0, 0].set_title('CreditScore')
axes[0, 1].set_title('CreditScore')
axes[1, 0].set_title('Age')
axes[1, 1].set_title('Age')
axes[2, 0].set_title('Tenure')
axes[2, 1].set_title('Tenure')
axes[3, 0].set_title('Balance')
axes[3, 1].set_title('Balance')
axes[4, 0].set_title('EstimatedSalary')
axes[4, 1].set_title('EstimatedSalary')

plt.tight_layout()  # Adjust layout to prevent overlap
plt.show()

In [None]:
# Target

# Distribution

sns.histplot(data=df, x="Exited", kde=True)

plt.tight_layout()  # Adjust layout to prevent overlap
plt.show()

In [None]:
plt.figure(figsize=(12, 10))
sns.heatmap(df.corr(), annot=True, fmt=".2f", cmap='coolwarm')
plt.show()

By examining the distribution graphs, it is clear that much of the data is unbalanced. This will need to be addressed in the analysis in order to properly fit a model.

# Split the source data into a training set, a validation set, and a test set.

In [None]:
# Split the source data into a training, validation, and test set
df_train_valid, df_test = train_test_split(df, test_size=0.2, random_state=12345)
df_train, df_valid = train_test_split(df_train_valid, test_size=0.2, random_state=12345)

# Features and targets for each dataset
features_train = df_train.drop(['Exited'], axis=1)
target_train = df_train['Exited']

features_valid = df_valid.drop(['Exited'], axis=1)
target_valid = df_valid['Exited']

features_test = df_test.drop(['Exited'], axis=1)
target_test = df_test['Exited']

In [None]:
# Checking Shapes
print('Training set:')
print(features_train.shape)
print(target_train.shape)
print()
print('Validation set:')
print(features_valid.shape)
print(target_valid.shape)
print()
print('Test set:')
print(features_test.shape)
print(target_test.shape)

In [None]:
# Define Functions

# Get AUC_ROC
def get_roc(model, features_valid):
    probabilities_valid = model.predict_proba(features_valid)
    
    probabilities_one_valid = probabilities_valid[:, 1]
    auc_roc = roc_auc_score (target_valid, probabilities_one_valid)
    return auc_roc

def plot_f1(results):
    sns.barplot(data=results, x='model', y='f1')
    plt.xticks(rotation=45, ha='right')
    plt.ylim(0, 0.7)
    plt.title('Model Validation F1')
    plt.show()

def plot_roc(results):
    sns.barplot(data=results, x='model', y='roc')
    plt.xticks(rotation=45, ha='right')
    plt.ylim(0.6, 1)
    plt.title('Model Validation ROC')
    plt.show()

In [None]:
# Logistic Regression

def get_logisticregression(features_train, target_train, features_valid, target_valid, classweight):
    model = LogisticRegression(max_iter = 1000, random_state=12345, class_weight=classweight)
    model.fit(features_train, target_train)
    predicted_valid = model.predict(features_valid)
    best_f1 = f1_score(target_valid, predicted_valid)
    print(f"Best F1: {best_f1:.3f}")
    
    auc_roc = get_roc(model, features_valid)
    
    print(f"ROC: {auc_roc:.3f}")

    result = confusion_matrix(target_valid, predicted_valid)
    print(f'Confusion Matrix:\n {result}')

    return best_f1, auc_roc

In [None]:
# Decision Tree Classifier
def get_decisiontree(features_train, target_train, features_valid, target_valid, classweight):
    # Get best hyperparameters
    best_depth = 0
    best_f1 = 0
    for depth in range(1,100):
        model = DecisionTreeClassifier(random_state=12345, max_depth=depth, class_weight=classweight)
        model.fit(features_train, target_train)
        predicted_valid = model.predict(features_valid)
        if f1_score(target_valid, predicted_valid) > best_f1:
            best_f1 = f1_score(target_valid, predicted_valid)
            best_depth = depth
            best_model = model
            
    print(f"Best f1 score {best_f1:.3f}")
    
    auc_roc = get_roc(best_model, features_valid)
    
    print(f"ROC: {auc_roc:.3f}")

    result = confusion_matrix(target_valid, predicted_valid)
    print(f'Confusion Matrix:\n {result}')
    
    return best_f1, auc_roc

In [None]:
# Random Forest Classifier

def get_randomforest(features_train, target_train, features_valid, target_valid, features_test, target_test, classweight):
    best_f1 = 0
    best_est = 0
    best_depth = 0
    for depth in range(1, 10):
        for est in range(1, 20): # choose hyperparameter range
            model = RandomForestClassifier(random_state=12345, max_depth = depth, n_estimators=est, class_weight = classweight) # set number of trees
            model.fit(features_train, target_train) # train model on training set
            predicted_valid = model.predict(features_valid)
            score = model.score(features_valid, target_valid) # calculate accuracy score on validation set
            if f1_score(target_valid, predicted_valid) > best_f1:
                best_f1 = f1_score(target_valid, predicted_valid)
                best_est = est# save number of estimators corresponding to best accuracy score
                best_depth = depth
                best_model = model
    print(f"Best f1 score {best_f1:.3f}")
    
    auc_roc = get_roc(best_model, features_valid)
    
    print(f"ROC: {auc_roc:.3f}")

    result = confusion_matrix(target_valid, predicted_valid)
    print(f'Confusion Matrix:\n {result}')
    
    if best_f1 >= 0.59:
        final_model = best_model
        final_model.fit(features_train.append(features_valid), target_train.append(target_valid))  
        test_prediction = final_model.predict(features_test)  
        predicted_test = model.predict(features_test) 
        print(f"Test F1 Score: {f1_score(target_test, predicted_test):.3f}")

    return best_f1, auc_roc, best_model

# Train Models Without Correction

In [None]:
print('Logistic Regression')
best_f1, auc_roc = get_logisticregression(features_train, target_train, features_valid, target_valid, None)

results.loc[len(results)] = ['Logistics Regression', best_f1, auc_roc]

In [None]:
print('Decision Tree Classifier')
best_f1, auc_roc = get_decisiontree(features_train, target_train, features_valid, target_valid, None)

results.loc[len(results)] = ['Decision Tree Classifier', best_f1, auc_roc]

In [None]:
print('Random Forest Classifier')
best_f1, auc_roc, best_model_rfc = get_randomforest(features_train, target_train, features_valid, target_valid, features_test, target_test, None)

results.loc[len(results)] = ['Random Forest Classifier', best_f1, auc_roc]

In [None]:
print(results)

In [None]:
print(results[['model','f1']])

In [None]:
plot_f1(results)

In [None]:
plot_roc(results)

With a target F1 score of 0.59 or higher it is clear that the imbalances previously noted must be addressed to improve the accuracy of the model. The closest model to the target was Decision Tree Classifier with an F1 score of 0.56. In order to achieve the desired results the following adjustments will be tested:

- Scaling numberic features
- Balancing class weight of models
- Upsampling
- Downsampling

# Adjusting for Imbalances

In [None]:
# Scaling numeric 

numeric = ['CreditScore', 'NumOfProducts', 'Tenure', 'Balance', 'EstimatedSalary', 'Age']

scaler = StandardScaler()
scaler.fit(features_train[numeric])

features_train[numeric] = scaler.fit_transform(features_train[numeric])
features_valid[numeric] = scaler.transform(features_valid[numeric])
features_test[numeric] = scaler.transform(features_test[numeric]) 

In [None]:
print('Logistic Regression - scaled and balanced')
best_f1, auc_roc = get_logisticregression(features_train, target_train, features_valid, target_valid, 'balanced')

results.loc[len(results)] = ['Logistics Regression - balanced', best_f1, auc_roc]

In [None]:
print('Decision Tree Classifier - scaled and balanced')
best_f1, auc_roc = get_decisiontree(features_train, target_train, features_valid, target_valid, 'balanced')

results.loc[len(results)] = ['Decision Tree Classifier - balanced', best_f1, auc_roc]

In [None]:
print('Random Forest Classifier - scaled and balanced')
best_f1, auc_roc, best_model_rfc_balanced = get_randomforest(features_train, target_train, features_valid, target_valid, features_test, target_test, 'balanced')

results.loc[len(results)] = ['Random Forest Classifier - balanced', best_f1, auc_roc]

# Upsampling

In [None]:
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=12345
    )

    return features_upsampled, target_upsampled


features_upsampled, target_upsampled = upsample(
    features_train, target_train, 10
)

In [None]:
print('Logistic Regression - upsampled')
best_f1, auc_roc = get_logisticregression(features_upsampled, target_upsampled, features_valid, target_valid, 'balanced')

results.loc[len(results)] = ['Logistics Regression - upsampled', best_f1, auc_roc]

In [None]:
print('Decision Tree Classifier - upsampled')
best_f1, auc_roc = get_decisiontree(features_upsampled, target_upsampled, features_valid, target_valid, 'balanced')

results.loc[len(results)] = ['Decision Tree Classifier - upsampled', best_f1, auc_roc]

In [None]:
print('Random Forest Classifier - upsampled')
best_f1, auc_roc, best_model_rfc_upsampled = get_randomforest(features_upsampled, target_upsampled, features_valid, target_valid, features_test, target_test, 'balanced')

results.loc[len(results)] = ['Random Forest Classifier - upsampled', best_f1, auc_roc]

# Downsampling

In [None]:
def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat(
        [features_zeros.sample(frac=fraction, random_state=12345)]
        + [features_ones]
    )
    target_downsampled = pd.concat(
        [target_zeros.sample(frac=fraction, random_state=12345)]
        + [target_ones]
    )

    features_downsampled, target_downsampled = shuffle(
        features_downsampled, target_downsampled, random_state=12345
    )

    return features_downsampled, target_downsampled


features_downsampled, target_downsampled = downsample(
    features_train, target_train, 0.1
)


In [None]:
print('Logistics Regression - downsampled')
best_f1, auc_roc = get_logisticregression(features_downsampled, target_downsampled, features_valid, target_valid, 'balanced')

results.loc[len(results)] = ['Logistics Regression - downsampled', best_f1, auc_roc]

In [None]:
print('Decision Tree Classifier - downasampled')
best_f1, auc_roc = get_decisiontree(features_downsampled, target_downsampled, features_valid, target_valid, 'balanced')

results.loc[len(results)] = ['Decision Tree Classifier - downsampled', best_f1, auc_roc]

In [None]:
print('Random Forest Classifier - downsampled')
best_f1, auc_roc, best_model_rfc_downsampled = get_randomforest(features_downsampled, target_downsampled, features_valid, target_valid,  features_test, target_test, 'balanced')

results.loc[len(results)] = ['Random Forest Classifier - downsampled', best_f1, auc_roc]

Using the methods above showed great improvement in the accuracy of the model. Two models were found that hit the target F1 score.

# Conclusion

After importing, examining, and cleaning the data provided. Multiple models were created without correcting the known imbalances. The maximimum F1 score achieved of these models was 0.56, short of the target of 0.59. Various methods of resolving imbalances were tested with two achieving the desired F1 score.

In [None]:
print(results)

In [None]:
plot_f1(results)

In [None]:
plot_roc(results)

Two models were found that achieved the target F1 score:

In [None]:
print(results[results['f1']>=0.59])

In [None]:
plot_f1(results[results['f1']>=0.59])

In [None]:
plot_roc(results[results['f1']>=0.59])

# Conclusion

Two models achieved an F1 score greater than or equal to the target, 0.59:

- Random Forest Classifier - balanced
-- F1: 0.616822
-- ROC: 0.863454
- Random Forest Classifier - upsampled
-- F1: 0.620272
-- ROC: 0.856722

When the two models were verified using the test data the target F1 score was again acheived, proving the effectiveness:
- Random Forest Classifier - balanced
-- Test F1 Score: 0.609
- Random Forest Classifier - upsampled
-- Test F1 Score: 0.608
