# Lab 06: Comprehensive Review

**You are a data analyst for a telecommunications company. The company wants to predict whether a customer is likely to churn (cancel the service) within the next six months, based on various factors.**

You are provided with a CSV file containing the following columns:

* CustomerID: Unique identifier for the customer.

* tenure: Number of months the customer has used the service.

* TotalCharges: Total amount paid by the customer.

* MonthlyCharges: Monthly fee the customer pays.

* NumCalls: Number of monthly calls made by the customer.

* Churn: Label 1 if the customer churned, 0 if they stayed.

### Your Tasks

**T1**: Calculate basic statistics (mean, median, standard deviation) for numeric columns.
Identify and handle any missing values in the dataset by filling them with the median of the respective column.

**T2**: Build a Customer Class

+ The Customer class should have attributes:
customer_id, tenure, total_charges, monthly_charges, and churn. customer_id must be private.

+ Define a method to calculate a customer churn probability:

 churn_prob = 1 - (monthly_charges / total_charges) * (tenure / 12).

  This churn probability should return a value between 0 and 1 (Hint: max(0, min(churn_prob, 1)) )

+ Define a method to describe current status, output customer_id, churn probability, and current churn

+ Create 2 instances from the Class and call previous describe method


**T3**: Data Transformation and Preparation:

Use Functional Style Programming to prepare the data for modeling. (Apply map to one hot encoding string values)

**T4**: Build a Prediction Model with PyTorch:

+ Use the given model class to train and test on the data.

+ Split the dataset into training and testing sets in an 80:20 ratio.

+ Train the model and evaluate its accuracy on the test set.

+ Report the Results: Print the model's accuracy on the test set and output the top 5 customers with the highest churn probability based on the trained model.


In [None]:
import torch.nn as nn
import torch
import torch.optim as optim

class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return self.sigmoid(x)

criterion = nn.BCELoss()
model = LinearRegressionModel(5) # 5 features
optimizer = optim.SGD(model.parameters(), lr=1e-6)

# Your assignment

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
cd /content/drive/MyDrive/Lec 6

/content/drive/MyDrive/Lec 6


In [None]:
import pandas as pd
import numpy as np
data = pd.read_csv("Churn.csv")
data.head()

Unnamed: 0,customerID,tenure,PhoneService,InternetService,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,1,No,True,29.85,29.85,No
1,5575-GNVDE,34,Yes,True,56.95,1889.5,No
2,3668-QPYBK,2,Yes,True,53.85,108.15,Yes
3,7795-CFOCW,45,No,True,42.3,1840.75,No
4,9237-HQITU,2,Yes,True,70.7,151.65,Yes


# Task 1

In [None]:
statistics = data.describe()
print("\nBasic statistics (mean, median, std) for numeric columns:")
print(statistics)


Basic statistics (mean, median, std) for numeric columns:
            tenure  MonthlyCharges  TotalCharges
count  7043.000000     6903.000000   6892.000000
mean     32.371149       64.719564   2286.627307
std      24.559481       30.084355   2267.279808
min       0.000000       18.250000     18.800000
25%       9.000000       35.500000    401.762500
50%      29.000000       70.300000   1401.000000
75%      55.000000       89.850000   3805.137500
max      72.000000      118.750000   8684.800000


In [None]:
median_tenure = data['tenure'].median()
median_total_charges = data['TotalCharges'].median()
median_monthly_charges = data['MonthlyCharges'].median()

missing_values = data.isnull().sum()
print("\nMissing values in each column:")
print(missing_values)

data['TotalCharges'] = data['TotalCharges'].fillna(median_total_charges)
data['MonthlyCharges'] = data['MonthlyCharges'].fillna(median_monthly_charges)

# Check again for missing values to confirm they have been handled
missing_values_after = data.isnull().sum()
print("\nMissing values after handling:")
print(missing_values_after)


Missing values in each column:
customerID           0
tenure               0
PhoneService         0
InternetService      0
MonthlyCharges     140
TotalCharges       151
Churn                0
dtype: int64

Missing values after handling:
customerID         0
tenure             0
PhoneService       0
InternetService    0
MonthlyCharges     0
TotalCharges       0
Churn              0
dtype: int64


# Task 2

In [None]:
class Customer:
    def __init__(self, customer_id, tenure, total_charges, monthly_charges, churn):
        self.__customer_id = customer_id
        self.tenure = tenure
        self.total_charges = total_charges
        self.monthly_charges = monthly_charges
        self.churn = churn

    def churn_probability(self):
        if self.total_charges == 0:
            return 0.0

        churn_prob = 1 - (self.monthly_charges / self.total_charges) * (self.tenure / 12)
        return max(0, min(churn_prob, 1))

    def describe_status(self):
        return (f"Customer ID: {self.__customer_id}, "
                f"Churn Probability: {self.churn_probability():.2f}, "
                f"Churn Status: {'Churned' if self.churn else 'Active'}")

customer1 = Customer(customer_id=1, tenure=12, total_charges=840.00, monthly_charges=70.00, churn=0)
customer2 = Customer(customer_id=2, tenure=24, total_charges=132.00, monthly_charges=55.00, churn=1)

print(customer1.describe_status())
print(customer2.describe_status())

Customer ID: 1, Churn Probability: 0.92, Churn Status: Active
Customer ID: 2, Churn Probability: 0.17, Churn Status: Churned


# Task 3

In [None]:
data["InternetService"] = list(map(lambda x: 1. if x == True else 0., data["InternetService"].values))
data["PhoneService"] = list(map(lambda x: 1. if x == True else 0., data["PhoneService"].values))
data["Churn"] = list(map(lambda x: 1. if x == True else 0., data["Churn"].values))
data.head()

Unnamed: 0,customerID,tenure,PhoneService,InternetService,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,1,0.0,1.0,29.85,29.85,0.0
1,5575-GNVDE,34,0.0,1.0,56.95,1889.5,0.0
2,3668-QPYBK,2,0.0,1.0,53.85,108.15,0.0
3,7795-CFOCW,45,0.0,1.0,42.3,1840.75,0.0
4,9237-HQITU,2,0.0,1.0,70.7,151.65,0.0


## Task 4

In [None]:
data.head()

Unnamed: 0,customerID,tenure,PhoneService,InternetService,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,1,0.0,1.0,29.85,29.85,0.0
1,5575-GNVDE,34,0.0,1.0,56.95,1889.5,0.0
2,3668-QPYBK,2,0.0,1.0,53.85,108.15,0.0
3,7795-CFOCW,45,0.0,1.0,42.3,1840.75,0.0
4,9237-HQITU,2,0.0,1.0,70.7,151.65,0.0


In [None]:
columns = data.columns
customers = data[columns[0]].values
X = data[columns[1:-1]].values
y = data[columns[-1]].values

In [None]:
y.shape

(7043,)

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
import torch.nn as nn
import torch
import torch.optim as optim

class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return self.sigmoid(x)

criterion = nn.BCELoss()
model = LinearRegressionModel(5) # 5 features
optimizer = optim.SGD(model.parameters(), lr=1e-6)

In [None]:
from tqdm import tqdm

In [None]:
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train).unsqueeze(1)

In [None]:
num_epochs = 100
for epoch in tqdm(range(num_epochs)):
  optimizer.zero_grad()
  pred = model(X_train_tensor)
  loss = criterion(pred, y_train_tensor)
  loss.backward()
  optimizer.step()
  print(f"Epoch: {epoch+1}/{num_epochs}\n Loss: {loss.item()}")

 58%|█████▊    | 58/100 [00:00<00:00, 282.88it/s]

Epoch: 1/100
 Loss: 71.54832458496094
Epoch: 2/100
 Loss: 71.50116729736328
Epoch: 3/100
 Loss: 71.49891662597656
Epoch: 4/100
 Loss: 71.49663543701172
Epoch: 5/100
 Loss: 71.47894287109375
Epoch: 6/100
 Loss: 71.47671508789062
Epoch: 7/100
 Loss: 71.42967987060547
Epoch: 8/100
 Loss: 71.4127197265625
Epoch: 9/100
 Loss: 71.36544036865234
Epoch: 10/100
 Loss: 71.34822082519531
Epoch: 11/100
 Loss: 71.34569549560547
Epoch: 12/100
 Loss: 71.34335327148438
Epoch: 13/100
 Loss: 71.31092834472656
Epoch: 14/100
 Loss: 71.30841064453125
Epoch: 15/100
 Loss: 71.2611312866211
Epoch: 16/100
 Loss: 71.21411895751953
Epoch: 17/100
 Loss: 71.19702911376953
Epoch: 18/100
 Loss: 71.14971923828125
Epoch: 19/100
 Loss: 71.13214874267578
Epoch: 20/100
 Loss: 71.11490631103516
Epoch: 21/100
 Loss: 71.0975570678711
Epoch: 22/100
 Loss: 71.08007049560547
Epoch: 23/100
 Loss: 71.04792022705078
Epoch: 24/100
 Loss: 71.04554748535156
Epoch: 25/100
 Loss: 71.02804565429688
Epoch: 26/100
 Loss: 70.9506454467773

100%|██████████| 100/100 [00:00<00:00, 273.90it/s]

Epoch: 59/100
 Loss: 69.93392944335938
Epoch: 60/100
 Loss: 69.9310302734375
Epoch: 61/100
 Loss: 69.9131088256836
Epoch: 62/100
 Loss: 69.89493560791016
Epoch: 63/100
 Loss: 69.8468017578125
Epoch: 64/100
 Loss: 69.84362030029297
Epoch: 65/100
 Loss: 69.79571533203125
Epoch: 66/100
 Loss: 69.70313262939453
Epoch: 67/100
 Loss: 69.65502166748047
Epoch: 68/100
 Loss: 69.60713195800781
Epoch: 69/100
 Loss: 69.5739517211914
Epoch: 70/100
 Loss: 69.57054901123047
Epoch: 71/100
 Loss: 69.52277374267578
Epoch: 72/100
 Loss: 69.48896026611328
Epoch: 73/100
 Loss: 69.4700698852539
Epoch: 74/100
 Loss: 69.40714263916016
Epoch: 75/100
 Loss: 69.3891372680664
Epoch: 76/100
 Loss: 69.37069702148438
Epoch: 77/100
 Loss: 69.36763763427734
Epoch: 78/100
 Loss: 69.31962585449219
Epoch: 79/100
 Loss: 69.25650787353516
Epoch: 80/100
 Loss: 69.22333526611328
Epoch: 81/100
 Loss: 69.2047119140625
Epoch: 82/100
 Loss: 69.12664794921875
Epoch: 83/100
 Loss: 69.078369140625
Epoch: 84/100
 Loss: 69.0296554565




In [None]:
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.FloatTensor(y_test).unsqueeze(1)

In [None]:
preds = model(X_test_tensor)
preds.shape

torch.Size([1409, 1])

In [None]:
preds = preds.squeeze(-1)

In [None]:
preds_label = list(map(lambda x: int(x>=0.6), preds))
preds_label

[0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,


In [None]:
preds

tensor([0.5630, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.5903],
       grad_fn=<SqueezeBackward1>)