# Telecom Customer Churn Prediction 

Churn is a one of the biggest problem in  the telecom industry. Research has shown that the average monthly churn rate among the top 4 wireless carriers in the US is 1.9% - 2%. This solution predicts Customer Churn based on factors such as monthly_charges, tenure, services etc. using a trainable ML model.

## Contents

1. Prequisites
2. Data Dictionary
3. Import Libraries
4. Load Input Data
5. Create Model
6. Predict Test Datapoints
7. Saving Prediction

## Prerequisites

To run this notebook you need to have install following packages:

- `pandas` to read/save csv files.
- `sklearn` to train model $\&$ generate prediction.

## Data Dictionary

- The input has to be a '.csv' file with 'utf-8' encoding. 
- PLEASE NOTE: If your input .csv file is not 'utf-8' encoded, model will not perform as expected.
- Required Features: `SeniorCitizen`, `tenure`, `MonthlyCharges`, `TotalCharges`, `gender_Female`, `gender_Male`, `Partner_No`, `Partner_Yes`, `Dependents_No`, `Dependents_Yes`, `PhoneService_No`, `PhoneService_Yes`, `MultipleLines_No`, `MultipleLines_No phone service`, `MultipleLines_Yes`, `InternetService_DSL`, `InternetService_Fiber optic`, `InternetService_No`, `OnlineSecurity_No`, `OnlineSecurity_No internet service`, `OnlineSecurity_Yes`, `OnlineBackup_No`, `OnlineBackup_No internet service`, `OnlineBackup_Yes`, `DeviceProtection_No`, `DeviceProtection_No internet service`, `DeviceProtection_Yes`, `TechSupport_No`, `TechSupport_No internet service`, `TechSupport_Yes`, `StreamingTV_No`, `StreamingTV_No internet service`, `StreamingTV_Yes`, `StreamingMovies_No`, `StreamingMovies_No internet service`, `StreamingMovies_Yes`, `Contract_Month-to-month`, `Contract_One year`, `Contract_Two year`, `PaperlessBilling_No`, `PaperlessBilling_Yes`, `PaymentMethod_Bank transfer (automatic)`, `PaymentMethod_Credit card (automatic)`, `PaymentMethod_Electronic check`, `PaymentMethod_Mailed check`, `Churn`

## Import Libraries

In [30]:
import pandas as pd 
from sklearn.linear_model import LogisticRegression 

## Load Input Data

In [31]:
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
train_df.head()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges,gender_Female,gender_Male,Partner_No,Partner_Yes,Dependents_No,Dependents_Yes,...,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaperlessBilling_Yes,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,0.0,0.690141,0.909453,0.638397,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1
1,0.0,0.15493,0.802488,0.127181,1.0,0.0,1.0,0.0,1.0,0.0,...,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1
2,1.0,0.71831,0.878109,0.646556,1.0,0.0,0.0,1.0,1.0,0.0,...,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0
3,0.0,0.112676,0.016418,0.016824,0.0,1.0,1.0,0.0,1.0,0.0,...,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0
4,0.0,0.859155,0.824876,0.763853,1.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0


In [33]:
train_df.columns

Index(['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges',
       'gender_Female', 'gender_Male', 'Partner_No', 'Partner_Yes',
       'Dependents_No', 'Dependents_Yes', 'PhoneService_No',
       'PhoneService_Yes', 'MultipleLines_No',
       'MultipleLines_No phone service', 'MultipleLines_Yes',
       'InternetService_DSL', 'InternetService_Fiber optic',
       'InternetService_No', 'OnlineSecurity_No',
       'OnlineSecurity_No internet service', 'OnlineSecurity_Yes',
       'OnlineBackup_No', 'OnlineBackup_No internet service',
       'OnlineBackup_Yes', 'DeviceProtection_No',
       'DeviceProtection_No internet service', 'DeviceProtection_Yes',
       'TechSupport_No', 'TechSupport_No internet service', 'TechSupport_Yes',
       'StreamingTV_No', 'StreamingTV_No internet service', 'StreamingTV_Yes',
       'StreamingMovies_No', 'StreamingMovies_No internet service',
       'StreamingMovies_Yes', 'Contract_Month-to-month', 'Contract_One year',
       'Contract_Two year', '

## Create Model

In [35]:
X_train = train_df.drop(['Churn'], axis=1)
y_train = train_df['Churn']

In [36]:
# Running logistic regression model

model = LogisticRegression()
result = model.fit(X_train, y_train)

## Predict Test Datapoints 

In [37]:
predictions = model.predict(test_df)

In [38]:
test_df['churn_predictions'] = predictions

## Saving Prediction

In [39]:
test_df.to_csv('output.csv', index=None)