<a href="https://colab.research.google.com/github/linhb03/Ai118Project/blob/dev/Intity'swifi_customers_churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

For this project, it is difficult to obtain real industry data, so I will create a fictional dataset (with approval from Professor Myra Haubrich).

The dataset represents a telecommunications company called Intity, which provides both Wi-Fi and mobile line services. The goal of the analysis is to evaluate customer churn risk and forecast new customer probabilities, while also examining how customer characteristics influence outcomes.

**Features:** age, monthly_usage, purchase_amount, service_calls, gender, region.

**Goal:** Predict probability of new customer churn, analyze coefficients to find key drivers of churn.

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from google.colab import files

print("upload file.")
uploaded = files.upload()
file_name = next(iter(uploaded))
df = pd.read_csv(file_name)

df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')

numerical_features = ['age', 'monthly_usage', 'purchase_amount', 'service_calls']
categorical_features = ['gender', 'region']

X = df[numerical_features + categorical_features]
y = df['churn']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_features),
        ('cat', OneHotEncoder(sparse_output=False, handle_unknown='ignore'), categorical_features)
    ])

model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression(random_state=42))
])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model.fit(X_train, y_train)

new_customer = pd.DataFrame({
    'age': [40],
    'monthly_usage': [350],
    'purchase_amount': [90],
    'service_calls': [3],
    'gender': ['F'],
    'region': ['West']
})

churn_probability = model.predict_proba(new_customer)[0][1]
churn_prediction = 1 if churn_probability > 0.5 else 0

print(f"Probability of new customers: {churn_probability:.2%}")
print(f"Probability of churn (1 = Churn, 0 = no Churn): {churn_prediction}")

cat_feature_names = model.named_steps['preprocessor'].named_transformers_['cat'].get_feature_names_out(categorical_features).tolist()
all_feature_names = numerical_features + cat_feature_names
coefficients = model.named_steps['classifier'].coef_[0]

print("\nModel Coefficients:")
for feature, coef in zip(all_feature_names, coefficients):
    print(f"- {feature}: {coef:.2f}")

upload file.


Saving customer_churn of Intity's wifi - Sheet1 (1).csv to customer_churn of Intity's wifi - Sheet1 (1) (3).csv
Probability of new customers: 25.44%
Probability of churn (1 = Churn, 0 = no Churn): 0

Model Coefficients:
- age: 0.01
- monthly_usage: 0.04
- purchase_amount: -0.21
- service_calls: 0.20
- gender_F: -0.74
- gender_M: 0.74
- region_East: 0.17
- region_North: 0.32
- region_South: -0.64
- region_West: 0.15
