# **Experiment Notebook**



In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

<hr>

## A. Project


In [2]:
student_name = 'Yi Xiao'

In [3]:
student_id = '14356721'

In [4]:
experiment_id = '3'

<hr>

## B. Experiment Description


In [5]:
experiment_hypothesis = 'Training an XGBClassifier model will result in a well-performing model that can predict customer churn with a reasonably high accuracy.'

In [6]:
experiment_expectations = 'The model should achieve an accuracy score of at least 80% on the validation dataset.'

<hr>

## C. Data Understanding


### C.0 Import Packages

In [7]:
# Pandas for data handling
import pandas as pd

# Scikit Learn for ML training
import sklearn

# Altair for plotting
import altair as alt

# <fill_this>
#import 

<hr>

### C.1   Load Datasets

In [8]:
# Load training set
# Do not change this code

X_train = pd.read_csv('X_train.csv')
y_train = pd.read_csv('y_train.csv')

In [9]:
# Load validation set
# Do not change this code

X_val = pd.read_csv('X_val.csv')
y_val = pd.read_csv('y_val.csv')

In [10]:
# Load testing set
# Do not change this code

X_test = pd.read_csv('X_test.csv')
y_test = pd.read_csv('y_test.csv')

<hr>

<hr>

## D. Feature Selection


In [11]:
feature_selection_executive_summary = 'Use the same list of features from experiment 0.'

In [12]:
# <fill_this>

features_list = ['AccountAge', 'MonthlyCharges', 'TotalCharges', 'ViewingHoursPerWeek',
       'AverageViewingDuration', 'ContentDownloadsPerMonth']

<hr>

## E. Data Preparation

In [13]:
data_preparation_executive_summary = 'No major data issues were found that could impact training.'

> Rationale: No major data issues were found that could impact training.

> Results: No major data issues were found that could impact training.

<hr>

## F. Feature Engineering

In [14]:
data_preparation_executive_summary_2 = 'No additional feature engineering performed at this stage. '

> Rationale: No additional feature engineering performed at this stage. 

> Results: No additional feature engineering performed at this stage. 

<hr>

## G. Train Machine Learning Model

In [15]:
train_model_executive_summary = 'The XGBClassifier model was trained using grid search to optimize hyperparameters and improve performance. The model was evaluated on training, validation, and test datasets to ensure it generalizes well.'

### G.1 Import Algorithm

> Rationale: The XGBClassifier was chosen due to its strong performance in binary classification tasksand ability to manage complex feature interactions.

In [16]:
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV

<hr>

### G.2 Set Hyperparameters

> Rationale: A grid search was used to optimize n_estimators, max_depth, learning_rate, and subsample, which influence the balance between model complexity and overfitting.

In [17]:
xgb_model = XGBClassifier(eval_metric='logloss')

param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.3],
    'subsample': [0.7, 0.8, 1.0]
}

<hr>

### G.3 Fit Model

In [18]:
grid_search = GridSearchCV(estimator=xgb_model, param_grid=param_grid, scoring='f1', cv=3, verbose=1)

grid_search.fit(X_train, y_train)

Fitting 3 folds for each of 81 candidates, totalling 243 fits


<hr>

### G.4 Model Technical Performance

In [19]:
from sklearn.metrics import accuracy_score, f1_score

In [20]:
best_params = grid_search.best_params_
print("Best Parameters: ", best_params)

best_model = grid_search.best_estimator_

y_train_pred = best_model.predict(X_train)
y_val_pred = best_model.predict(X_val)
y_test_pred = best_model.predict(X_test)

train_accuracy = accuracy_score(y_train, y_train_pred)
val_accuracy = accuracy_score(y_val, y_val_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

train_f1 = f1_score(y_train, y_train_pred)
val_f1 = f1_score(y_val, y_val_pred)
test_f1 = f1_score(y_test, y_test_pred)

print(f"Training Accuracy: {train_accuracy:.4f}, Training F1 Score: {train_f1:.4f}")
print(f"Validation Accuracy: {val_accuracy:.4f}, Validation F1 Score: {val_f1:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}, Test F1 Score: {test_f1:.4f}")

Best Parameters:  {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.7}
Training Accuracy: 0.9193, Training F1 Score: 0.7166
Validation Accuracy: 0.8024, Validation F1 Score: 0.2237
Test Accuracy: 0.8053, Test F1 Score: 0.2191


> Results: The model demonstrated strong accuracy but lower F1 scores on the validation and test sets, indicating some difficulty in correctly identifying the positive class.

<hr>

### G.5 Business Impact from Current Model Performance

In [21]:
avg_subscription_fee = X_test['MonthlyCharges'].mean()
discount_per_month = (X_test['MonthlyCharges'].mean())/2
discount_duration_months = 3
retention_offer_cost_per_customer = discount_per_month * discount_duration_months  
lost_revenue_per_churn = X_test['MonthlyCharges'].mean()

predicted_churners = sum(y_val_pred == 1)
actual_churners = sum((y_val == 1).values)
false_negatives = actual_churners - predicted_churners

# Business impact calculation
total_intervention_cost = predicted_churners * retention_offer_cost_per_customer
total_lost_revenue_from_false_negatives = false_negatives * lost_revenue_per_churn

print(f"Total predicted churners (Model 2): {predicted_churners}")
print(f"Total cost of retention interventions: ${total_intervention_cost}")
print(f"Total lost revenue from undetected churners (false negatives): ${total_lost_revenue_from_false_negatives}")

Total predicted churners (Model 2): 533
Total cost of retention interventions: $79950
Total lost revenue from undetected churners (false negatives): $[8686.3682826]


> Results: The model predicted 533 churners, leading to a retention intervention cost of $79,950. 

<hr>

## H. Experiment Outcomes

In [23]:
final_experiment_outcome = 'Hypothesis Partially Confirmed'

> Key Learnings: While the accuracy is high, the low F1 score suggests that more effort is needed to address class imbalance or better tune the model to capture churn cases effectively.

> Recommendations for Next Experiment: Consider oversampling or undersampling techniques to improve the F1 score for the minority class (churn).

<hr>

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=803b396d-170a-478e-bdcc-2487a2a4ebcf' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>