### Importing Libraries
This section lists all the necessary libraries needed for the model development.

In [53]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

### Loading the Dataset
The dataset that intended to be used in building the model was previously cleaned and preprocessed in Milestone 1. The data is ready to be used in model development.

In [54]:
# Load the dataset
file_path = 'cleaned_customer_churn_dataset.csv'
data = pd.read_csv(file_path)

# Print the shape of the dataset
print(f"The dataset has {data.shape[0]} Rows and {data.shape[1]} columns")

# Display the first few rows of the dataset
data_head = data.head()

data_head



The dataset has 100000 Rows and 20 columns


Unnamed: 0,CustomerID,Age,SubscriptionPlan,MonthlySpend,AccountAgeMonths,WatchTimeHours,CustomerSupportCalls,Churn,Gender_Male,Gender_Other,Region_Dhaka,Region_Kolkata,Region_Others,Region_Siliguri,DeviceType_PC,DeviceType_Smart TV,DeviceType_Tablet,PreferredLanguage_English,PreferredLanguage_Hindi,PreferredLanguage_Others
0,92874,0.117647,3,0.127063,5,0.050418,4,0,False,False,False,False,False,False,True,False,False,False,False,True
1,52015,0.196078,2,0.076124,40,0.045347,6,0,True,False,False,False,False,False,False,False,True,False,True,False
2,45860,0.27451,2,0.074501,13,0.045347,1,1,True,False,False,False,False,False,False,True,False,True,False,False
3,24080,0.117647,3,0.065101,12,0.036415,8,1,True,False,False,False,True,False,False,False,True,False,True,False
4,41229,0.019608,1,0.065293,43,0.022776,8,0,False,False,True,False,False,False,False,True,False,False,False,True


### Splitting the Dataset
I will split the dataset into training and testing subsets. The training subset of 80% will be used to train the model while the remaining 20% will be for testing.

In [55]:
# Define features (X) and target (y)
X = data.drop(columns=['CustomerID', 'Churn']) # Exclude ID and target column
y = data['Churn']

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Check the shapes of the resulting splits
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((80000, 18), (20000, 18), (80000,), (20000,))

### Training and Evaluating the Logistic Regression Model
I am starting with the Logistic Regression model as I indicated this model as my first choice. I would also evaluate the model based on metrics like accuracy, precision, recall, f1-score and AUC-ROC.

In [56]:
# Initialize and train the Logistic Regression model
log_model = LogisticRegression(max_iter=1000, random_state=42)
log_model.fit(X_train, y_train)

# Predictions on test set
y_pred_log = log_model.predict(X_test)

# Logistic Regression Evaluation
print("Logistic Regression Accuracy:", accuracy_score(y_test, y_pred_log))
print("Logistic Regression Precision:", precision_score(y_test, y_pred_log))
print("Logistic Regression Recall:", recall_score(y_test, y_pred_log))
print("Logistic Regression F1 Score:", f1_score(y_test, y_pred_log))
print("Logistic Regression AUC-ROC:", roc_auc_score(y_test, log_model.predict_proba(X_test)[:, 1]))


Logistic Regression Accuracy: 0.70025
Logistic Regression Precision: 0.0
Logistic Regression Recall: 0.0
Logistic Regression F1 Score: 0.0
Logistic Regression AUC-ROC: 0.4993363206694619


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


### Training and Evaluating the Decision Tree Model
The Decision Tree model is an alternative approach that would be used to compare the performance of both models for churn prediction. I would also evaluate the model based on same metrics I used for the Logistic Regression.

In [57]:
# Initialize and train the Decision Tree model
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Predictions on test set
y_pred_dt = dt_model.predict(X_test)

# Decision Tree Evaluation
print("Decision Tree Accuracy:", accuracy_score(y_test, y_pred_dt))
print("Decision Tree Precision:", precision_score(y_test, y_pred_dt))
print("Decision Tree Recall:", recall_score(y_test, y_pred_dt))
print("Decision Tree F1 Score:", f1_score(y_test, y_pred_dt))
print("Decision Tree AUC-ROC:", roc_auc_score(y_test, dt_model.predict_proba(X_test)[:, 1]))


Decision Tree Accuracy: 0.56985
Decision Tree Precision: 0.29425686336383716
Decision Tree Recall: 0.311092577147623
Decision Tree F1 Score: 0.30244060650287846
Decision Tree AUC-ROC: 0.49585332177623925
