# Telco Churn Project

## Goals: 
- To discover main drivers of churn
- Create a ML program to predict churn with at least an 80% accuracy

In [6]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import acquire
from env import get_connection
from prepare import train_val_test
import warnings
warnings.filterwarnings("ignore")
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.ensemble import RandomForestClassifier

## Acquire

- Data acquired from Telco database
- Each row represents a customer
- Each column represents a information about the customer or services included

## Prepare

### Prepare Actions:

- Removed columns that did not contain useful information
- Removed null values
- Checked that column data types were appropriate
- Split data into train, validate and test (approx. 70/15/15), stratifying on 'churn'


## Data Dictionary 

| Feature | Description |
| :--- | :--- |
| churn | When a customer cancels contract or subscription with the company |
| contract_type | The type of contract that the customer has with Telco |
| payment_type | The form in which the customer pays their monthly bill |
| dependents | Whether or not the customer has a dependent on their account |
| monthly_charges | How much a customer pays per month |
| tenure | How long a customer has been with the company |
| total_charges | How much a customer has paid over their entire tenure |
| payment_type_id | Number assignments for stats purposes |
| contract_type_id | Number assignments for stats purposes |

In [10]:
# acquiring and cleaning data
telco = acquire.get_telco_data(get_connection)
telco = telco.drop(['Unnamed: 0','gender', 'senior_citizen', 'partner', 'phone_service', 'tech_support', 'streaming_tv', 
                    'streaming_movies', 'paperless_billing', 'internet_service_type', 'online_security', 'online_backup', 
                    'device_protection', 'internet_service_type_id', 'customer_id', 'contract_type', 'payment_type'], axis =1)

# splitting data into train, validate, and test
train, validate, test = train_val_test(telco, 'churn')


In [11]:
train.head()

Unnamed: 0,payment_type_id,contract_type_id,dependents,tenure,multiple_lines,monthly_charges,total_charges,churn
5609,1,1,No,14,No,76.45,1117.55,No
2209,2,2,No,5,No,70.0,347.4,Yes
6919,1,1,No,35,Yes,75.2,2576.2,Yes
2284,1,3,No,58,Yes,86.1,4890.5,No
845,2,1,No,2,No,49.6,114.7,Yes
