**Build & Deploy a Customer Churn Prediction Model**

Features :

● gender: Whether the customer is male or female

● SeniorCitizen: 1 if the customer is a senior citizen, 0 otherwise

● Partner: Does the customer have a partner? (Yes/No)

● Dependents: Does the customer have any dependents? (Yes/No)

● tenure: Number of months the customer has stayed with the company

● PhoneService: Does the customer have a phone service? (Yes/No)

● MultipleLines: Does the customer have multiple phone lines? (Yes/No/No phone service)

● InternetService: Customer’s internet service provider (DSL/Fiber optic/No)

● OnlineSecurity: Does the customer have online security add-on? (Yes/No/No internet service)

● OnlineBackup: Does the customer have online backup add-on? (Yes/No/No internet service)

● DeviceProtection: Does the customer have device protection add-on? (Yes/No/No internet service)

● TechSupport: Does the customer have tech support? (Yes/No/No internet service)

● StreamingTV: Does the customer stream TV using the company’s service?(Yes/No/No internet service)

● StreamingMovies: Does the customer stream movies using the company’s service? (Yes/No/No internet service)

● Contract: Contract type – month-to-month, one year, or two year

● PaperlessBilling: Is billing done without paper? (Yes/No)

● PaymentMethod: How the customer pays (Credit card, bank transfer, etc.)

● MonthlyCharges: The amount charged to the customer monthly

● TotalCharges: Total amount charged over the entire tenure

● Churn: The target column – whether the customer left the service (Yes/No)

In [7]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import  numpy as np

1. Data Loading and Exploration

In [8]:
df = pd.read_csv("customerChurn.csv")

In [9]:
df.shape

(7043, 21)

In [10]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [11]:
df.shape

(7043, 21)

In [12]:
df['Churn'].value_counts()

Churn
No     5174
Yes    1869
Name: count, dtype: int64

2. Preprocessing

In [13]:
df['Churn'] = df['Churn'].map({'Yes':1 , 'No':0})

In [14]:
df.replace('nan', np.nan, inplace=True)
df.dropna(inplace=True)

In [15]:
from sklearn.preprocessing import LabelEncoder 

In [16]:
col = ['gender','SeniorCitizen','InternetService','OnlineSecurity','DeviceProtection','TechSupport','StreamingTV','OnlineBackup'
       ,'StreamingMovies','Contract','PaperlessBilling','PaymentMethod','Partner','Dependents','PhoneService','MultipleLines']

In [17]:
label_encoders = {}

for i in col:
    le = LabelEncoder()
    df[i] = le.fit_transform(df[i])
    label_encoders[i] = le

In [18]:
from sklearn.preprocessing import StandardScaler

In [19]:
df['TotalCharges'] = df['TotalCharges'].astype(str).str.strip()
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

In [36]:
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean')

X = df.drop(['Churn', 'customerID','SeniorCitizen','InternetService','OnlineSecurity','DeviceProtection','TechSupport','StreamingTV','OnlineBackup'
       ,'StreamingMovies','Contract','PaperlessBilling','PaymentMethod','Partner','Dependents','PhoneService','MultipleLines'], axis=1)
X = imputer.fit_transform(X)

y = df['Churn']

In [37]:
scaler=StandardScaler()
df['MonthlyCharges'] = scaler.fit_transform(df[['MonthlyCharges']])

In [38]:
df['TotalCharges'] = scaler.fit_transform(df[['TotalCharges']])

In [39]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,0,0,1,0,1,0,1,0,0,...,0,0,0,0,0,1,2,-1.160323,-0.994194,0
1,5575-GNVDE,1,0,0,0,34,1,0,0,2,...,2,0,0,0,1,0,3,-0.259629,-0.17374,0
2,3668-QPYBK,1,0,0,0,2,1,0,0,2,...,0,0,0,0,0,1,3,-0.36266,-0.959649,1
3,7795-CFOCW,1,0,0,0,45,0,1,0,2,...,2,2,0,0,1,0,0,-0.746535,-0.195248,0
4,9237-HQITU,0,0,0,0,2,1,0,1,0,...,0,0,0,0,0,1,2,0.197365,-0.940457,1


In [40]:
from sklearn.model_selection import train_test_split

In [41]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Model Training

In [42]:
from sklearn.linear_model import LogisticRegression

In [43]:
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

In [44]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

In [45]:
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.7970191625266146


In [46]:
print("Precision:", precision_score(y_test, y_pred))

Precision: 0.6617100371747212


In [47]:
print("Recall:", recall_score(y_test, y_pred))

Recall: 0.4772117962466488


In [48]:
print("F1 Score:", f1_score(y_test, y_pred))

F1 Score: 0.5545171339563862


In [49]:
print("Confusion matrix:", confusion_matrix(y_test, y_pred))

Confusion matrix: [[945  91]
 [195 178]]


In [50]:
import pickle
with open('customer_churn.pkl', 'wb') as file:
    pickle.dump(model, file)

In [51]:
with open('customer_churn.pkl', 'rb') as file:
    loaded_model = pickle.load(file)
loaded_model.predict(X_test)

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)