# Final Project - Churn Rate Analysis

The telecom operator Interconnect would like to be able to forecast their churn of clients. If it's discovered that a user is planning to leave, they will be offered promotional codes and special plan options. Interconnect's marketing team has collected some of their clientele's personal data, including information about their plans and contracts.

**Interconnect mainly provides two types of services:**

1. Landline communication. The telephone can be connected to several lines simultaneously.
2. Internet. The network can be set up via a telephone line (DSL, digital subscriber line) or through a fiber optic cable.

**Some other services the company provides include:**

- Internet security: antivirus software (DeviceProtection) and a malicious website blocker (OnlineSecurity)
- A dedicated technical support line (TechSupport)
- Cloud file storage and data backup (OnlineBackup)
- TV streaming (StreamingTV) and a movie directory (StreamingMovies)
The clients can choose either a monthly payment or sign a 1- or 2-year contract. They can use various payment methods and receive an electronic invoice after a transaction.

## Data Description
The data consists of files obtained from different sources:

- contract.csv — contract information
- personal.csv — the client's personal data
- internet.csv — information about Internet services
- phone.csv — information about telephone services

In each file, the column customerID contains a unique code assigned to each client.

The contract information is valid as of February 1, 2020.

## Project Goals
This project has three main objectives:

- Analyze the provided data to identify factors that can influence the churn rate of customers.
- Develop machine learning models that can efficiently predict the likelihood of customers leaving Interconnect services.
- Generate relevant promotional recommendations to mitigate customer churn and retain their loyalty to the service.

## Clarifying Questions

1. Are there any missing values or inconsistencies in the dataset, and if so, how should it be handled?
2. Are there any plans to update the dataset in the future?
3. How will the effectiveness of the special plan options be measured?

## Rough Plan for Solving the Task

Step 1: Preprocessing 

Clean and preprocess the data by handling missing values, outliers, and inconsistencies.

Stage 2: Exploratory Data Analysis

Explore the dataset to understand the distribution of variables, identify correlations, and determine the relevance of each feature to the target variable.

Step 3: Model Development, Evaluation and Tuning

Develop and train several machine learning models, such as logistic regression, decision treesand random forests to predict churn.Perform hyperparameter tuning to optimize the performance of the selected models.
Evaluate the performance of each model using AUC-ROC and accuracy metrics.

## 

## Data Pre-processing

### Import Libraries and Load Data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

We have four datasets, we will store each of them in separate variables.

In [2]:
try:
    contract = pd.read_csv('./final_provider/contract.csv')
    internet = pd.read_csv('./final_provider/internet.csv')
    personal = pd.read_csv('./final_provider/personal.csv')
    phone = pd.read_csv('./final_provider/phone.csv')
except:
    contract = pd.read_csv('/datasets/final_provider/contract.csv')
    internet = pd.read_csv('/datasets/final_provider/internet.csv')
    personal = pd.read_csv('/datasets/final_provider/personal.csv')
    phone = pd.read_csv('/datasets/final_provider/phone.csv')

### Checking Datasets

In [None]:
datasets= [contract, internet, personal, phone]
names = ['Contract', 'Internet', 'Personal', 'Phone']
for df,name in zip(datasets, names):
    print(name, 'dataset head:')
    print(df.head(10))
    print('____'*100)
    print(name, 'dataset info:')
    print(df.info())
    print('____'*100)
    print()



Contract dataset head:
   customerID   BeginDate              EndDate            Type  \
0  7590-VHVEG  2020-01-01                   No  Month-to-month   
1  5575-GNVDE  2017-04-01                   No        One year   
2  3668-QPYBK  2019-10-01  2019-12-01 00:00:00  Month-to-month   
3  7795-CFOCW  2016-05-01                   No        One year   
4  9237-HQITU  2019-09-01  2019-11-01 00:00:00  Month-to-month   
5  9305-CDSKC  2019-03-01  2019-11-01 00:00:00  Month-to-month   
6  1452-KIOVK  2018-04-01                   No  Month-to-month   
7  6713-OKOMC  2019-04-01                   No  Month-to-month   
8  7892-POOKP  2017-07-01  2019-11-01 00:00:00  Month-to-month   
9  6388-TABGU  2014-12-01                   No        One year   

  PaperlessBilling              PaymentMethod  MonthlyCharges TotalCharges  
0              Yes           Electronic check           29.85        29.85  
1               No               Mailed check           56.95       1889.5  
2              Yes 

### Dataset shapes

In [23]:
for df,name in zip(datasets, names):
    print(f"{name} shape: {df.shape}")

Contract shape: (7043, 8)
Internet shape: (5517, 8)
Personal shape: (7043, 5)
Phone shape: (6361, 2)
