# Final Project Work Plan

## Interconnect Telecom Customer Churn Prediction

### Introduction 
**For the telecom provider Interconnect, we create a machine learning prototype in this project to forecast client attrition. In addition to internet and phone services, the company offers cloud storage, streaming, and antivirus software.**
**Interconnect Telecom can lower churn and increase long-term income by recognizing customers who are likely to depart and offering tailored promotions and retention offers.**

**Below is the prepared work plan provided in carrying out the project on Interconnect Telecom Customer churn predictions.**


### 1. Data Preparation

#### 1.1 Load Data
- import necessary libraries to work for the project.
- Load `contract.csv`, `personal.csv`, `internet.csv`, and `phone.csv`.

- Inspect each dataset: dimensions, missing values, column types.

In [1]:
# importing libriaries
import pandas as pd

# loading data
df_contract = pd.read_csv('/datasets/final_provider/contract.csv')
df_personal = pd.read_csv('/datasets/final_provider/personal.csv')
df_internet = pd.read_csv('/datasets/final_provider/internet.csv')
df_phone = pd.read_csv('/datasets/final_provider/phone.csv')

#### 1.2 Inspect Basic Info, Missing Values and Duplicates.



In [2]:
# Check structure and missing values
for name, df in zip(['Contract', 'Personal', 'Internet', 'Phone'],
                    [df_contract, df_personal, df_internet, df_phone]):
    print(f"--- {name} Dataset ---")
    print(df.info())
    print(df.isna().sum(), "\n")

--- Contract Dataset ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   BeginDate         7043 non-null   object 
 2   EndDate           7043 non-null   object 
 3   Type              7043 non-null   object 
 4   PaperlessBilling  7043 non-null   object 
 5   PaymentMethod     7043 non-null   object 
 6   MonthlyCharges    7043 non-null   float64
 7   TotalCharges      7043 non-null   object 
dtypes: float64(1), object(7)
memory usage: 440.3+ KB
None
customerID          0
BeginDate           0
EndDate             0
Type                0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
dtype: int64 

--- Personal Dataset ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 5 columns):
 #   Column       

In [3]:
# Check for duplicate rows in each dataset
for name, df in zip(['Contract', 'Personal', 'Internet', 'Phone'],
                    [df_contract, df_personal, df_internet, df_phone]):
    duplicate_count = df.duplicated().sum()
    print(f"--- {name} Dataset ---")
    print(f"Total rows: {len(df)}")
    print(f"Duplicate rows: {duplicate_count}\n")

--- Contract Dataset ---
Total rows: 7043
Duplicate rows: 0

--- Personal Dataset ---
Total rows: 7043
Duplicate rows: 0

--- Internet Dataset ---
Total rows: 5517
Duplicate rows: 0

--- Phone Dataset ---
Total rows: 6361
Duplicate rows: 0



#### 1.3 Merge the Datasets
We will now merge the four datasets on the customerID column.

In [4]:
# Step-by-step merge on 'customerID'
df_merged = df_contract.merge(df_personal, on='customerID', how='left')
df_merged = df_merged.merge(df_internet, on='customerID', how='left')
df_merged = df_merged.merge(df_phone, on='customerID', how='left')

# Display basic info of the merged dataset
print("Merged dataset:")
print(df_merged.info())

Merged dataset:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7043 entries, 0 to 7042
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   BeginDate         7043 non-null   object 
 2   EndDate           7043 non-null   object 
 3   Type              7043 non-null   object 
 4   PaperlessBilling  7043 non-null   object 
 5   PaymentMethod     7043 non-null   object 
 6   MonthlyCharges    7043 non-null   float64
 7   TotalCharges      7043 non-null   object 
 8   gender            7043 non-null   object 
 9   SeniorCitizen     7043 non-null   int64  
 10  Partner           7043 non-null   object 
 11  Dependents        7043 non-null   object 
 12  InternetService   5517 non-null   object 
 13  OnlineSecurity    5517 non-null   object 
 14  OnlineBackup      5517 non-null   object 
 15  DeviceProtection  5517 non-null   object 
 16  TechSupport       5517 non

### 2. Exploratory Data Analysis (EDA)


- I will analyze churn distribution.

- I will analyze numerical and categorical features by churn.

- I will identify potential correlations or strong predictors.

- I will visualize key trends and relationships.

### 3. Feature Engineering
- Encoding categorical variables using OneHot encoding.
- Engineering relevant features (e.g., contract length, total services used, tenure).
- I wil Scale numerical features (for models like Logistic Regression or SVM).

### 4. Modeling 

#### 4.1. Split Data

- Use train/test split (e.g., 80/20).

- Consider stratification by churn.

#### 4.2. Model Training

Train and compare several models:

- Logistic Regression (baseline)

- Random Forest

- Gradient Boosting

     - XGBoost
     - CatBoost
     - LightGBM

#### 4.3. Cross-validation

- Use stratified K-fold CV (e.g., 5-fold) to evaluate AUC-ROC and accuracy.

- Track overfitting or underfitting.

## 5. Model Evaluation

- Evaluate all models on test data:

- AUC-ROC (main metric)

- Accuracy, precision, recall, F1-score

- Confusion matrix



## 6. Final Model Selection & Tuning

- Choose the best-performing model based on AUC-ROC.

- Perform hyperparameter tuning using GridSearchCV or RandomizedSearchCV


## 7. Conclusion & Reporting

- Summarize model performance (metric scores).

- Justify final model selection.

- Discuss important features driving churn.

- Provide business insights e.g., which groups are most at risk of churning.



<div class="alert alert-block alert-success">
<b>Reviewer's comment</b> <a class="tocSkip"></a>
This looks like a good plan, nice job with sectioning it out! Build your project out to fill in these sections, best of luck!
</div>