# **Experiment Notebook**



In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

<hr>

## A. Project


In [None]:
student_name = 'Yi Xiao'

In [None]:
student_id = '14356721'

In [None]:
experiment_id = '4'

<hr>

## B. Experiment Description


In [None]:
experiment_hypothesis = 'Training an SVC model on customer data will help predict customer churn more effectively'

In [None]:
experiment_expectations = 'Show an improvement in F1-Score: The model should be better at predicting churners thus increasing the F1-Score compared to the baseline models.'

<hr>

## C. Data Understanding


### C.0 Import Packages

In [2]:
# Pandas for data handling
import pandas as pd

# Scikit Learn for ML training
import sklearn

# Altair for plotting
import altair as alt

# <fill_this>
#import 

<hr>

### C.1   Load Datasets

In [3]:
# Load training set
# Do not change this code

X_train = pd.read_csv('X_train.csv')
y_train = pd.read_csv('y_train.csv')

In [4]:
# Load validation set
# Do not change this code

X_val = pd.read_csv('X_val.csv')
y_val = pd.read_csv('y_val.csv')

In [5]:
# Load testing set
# Do not change this code

X_test = pd.read_csv('X_test.csv')
y_test = pd.read_csv('y_test.csv')

<hr>

<hr>

## D. Feature Selection


In [None]:
feature_selection_executive_summary = 'Use the same list of features from experiment 0.'

In [None]:
features_list = ['AccountAge', 'MonthlyCharges', 'TotalCharges', 'ViewingHoursPerWeek',
       'AverageViewingDuration', 'ContentDownloadsPerMonth']

<hr>

## E. Data Preparation

In [None]:
data_preparation_executive_summary = 'No major data issues were found that could impact training.'

> Rationale: No major data issues were found that could impact training.

> Results: No major data issues were found that could impact training.

<hr>

## F. Feature Engineering

In [None]:
data_preparation_executive_summary_2 = 'No additional feature engineering performed at this stage. '

> Rationale: No additional feature engineering performed at this stage. 

> Results: No additional feature engineering performed at this stage. 

<hr>

## G. Train Machine Learning Model

In [None]:
train_model_executive_summary = 'A grid search was conducted to find the optimal hyperparameters, C and gamma, for the SVC model using the RBF kernel. The model was trained on the scaled dataset and evaluated on a validation set.'

### G.1 Import Algorithm

> Rationale:  SVC with an RBF kernel was selected because of its ability to model non-linear decision boundaries. Assigning higher weights to the minority class ensures that the model pays more attention to correctly predicting churners.

In [11]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

<hr>

### G.2 Set Hyperparameters

> Rationale: Adjusting the regularization parameter C and the kernel coefficient gamma helps in balancing model flexibility and complexity.

In [12]:
param_grid = {
    'svc__C': [0.1, 1, 10, 100],
    'svc__gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1]
}

<hr>

### G.3 Fit Model

In [13]:
svc_pipeline = make_pipeline(StandardScaler(), SVC(kernel='rbf', random_state=42))

grid_search = GridSearchCV(svc_pipeline, param_grid, cv=5, scoring='accuracy', verbose=1, n_jobs=-1)

grid_search.fit(X_train, y_train.values.ravel())


Fitting 5 folds for each of 24 candidates, totalling 120 fits


<hr>

### G.4 Model Technical Performance

In [15]:
from sklearn.metrics import f1_score, accuracy_score

best_params = grid_search.best_params_
best_score = grid_search.best_score_

y_pred_val_best = grid_search.best_estimator_.predict(X_val)

val_accuracy_best = accuracy_score(y_val, y_pred_val_best)
f1_val_best = f1_score(y_val, y_pred_val_best)

print("Best Hyperparameters: ", best_params)
print("Best Cross-Validated Score: ", best_score)
print("Validation Accuracy: ", val_accuracy_best)
print("Validation F1 Score: ", f1_val_best)

Best Hyperparameters:  {'svc__C': 1, 'svc__gamma': 0.1}
Best Cross-Validated Score:  0.8251168741455824
Validation Accuracy:  0.8256469567731676
Validation F1 Score:  0.06366459627329193


> Results: The model achieved good accuracy but struggled to identify churn customers effectively, resulting in a low F1 score. This highlights the model's difficulty in dealing with imbalanced classes.

<hr>

### G.5 Business Impact from Current Model Performance

In [16]:
avg_subscription_fee = X_test['MonthlyCharges'].mean()
discount_per_month = (X_test['MonthlyCharges'].mean())/2
discount_duration_months = 3
retention_offer_cost_per_customer = discount_per_month * discount_duration_months 
lost_revenue_per_churn = X_test['MonthlyCharges'].mean()  

predicted_churners = sum(y_pred_val_best == 1)
actual_churners = sum((y_val == 1).values)
false_negatives = actual_churners - predicted_churners

total_intervention_cost = predicted_churners * retention_offer_cost_per_customer
total_lost_revenue_from_false_negatives = false_negatives * lost_revenue_per_churn

print(f"Total predicted churners (Model 1): {predicted_churners}")
print(f"Total cost of retention interventions: ${total_intervention_cost}")
print(f"Total lost revenue from undetected churners (false negatives): ${total_lost_revenue_from_false_negatives}")

Total predicted churners (Model 1): 60
Total cost of retention interventions: $9000
Total lost revenue from undetected churners (false negatives): $[116800]


> Results:  The model predicted 60 churners, leading to a retention intervention cost of $9000. 

<hr>

## H. Experiment Outcomes

In [1]:
final_experiment_outcome = 'Hypothesis Rejected'

> Key Learnings: Hyperparameter tuning via grid search improved the overall accuracy, but more work is needed to handle class imbalance.

> Recommendations for Next Experiment: Implement class weighting to improve the F1 score for churn prediction

<hr>

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=803b396d-170a-478e-bdcc-2487a2a4ebcf' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>