# Project Goal
 
* Discover drivers of customer church at Telco.
* Use drivers to develop a machine learning model that accurately predicts churn. 

## Imports

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import os

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

import warnings
warnings.filterwarnings("ignore")

from scipy import stats

import env
import wrangle as w
import explore as e
import modeling as m

ModuleNotFoundError: No module named 'explore'

## Acquire

- Data acquired from Kaggle
- It contained 7,043 rows and 22 columns before cleaning
- Each row represents a telco customer
- Each column represents a feature of the customer

## Prepare
#### Prepare Actions:

- Removed redundant or unusefull columns
- Checked for nulls in the data (there were none)
- Convert the data type of the total charges column to reflect/use the data properly
- Added Target column 'upset' indicating weather the lower rated player won the game
- Split data into train, validate and test (approx. 60/25/15), with proportions based on the Churn column

# Data Dictionary

| Feature | Definition |
|:--------|:-----------|
|Gender| The gender of the primary account holder|
|Senior Citizen| True or False, is the primary account holder 62 years or age or older|
|Partner| Yes or No, does the primary account holder have a partner in the household|
|Dependants| Yes or No, does the primary account holder have at least 1 dependent in the household|
|Tenure| The time in years the primary account holder has been a customer|
|Phone Service| Yes or No, does the primary account holder have phone service with Telco|
|Multiple Lines| Yes or No, does the primary account holder have multiple lines with Telco|
|Internet Service Type ID|  **1** (DSL), **2** (Fiber Optic), **3** (None)
|Tech Support| Yes or No, Has the client contacted tech support|
|Contract Type ID|  **1** (Month-to-Month), **2** (One-Year), **3** (Two-Year)|
|Payment Type ID| **1** (Electronic Check), **2** (Mailed Check), **3** (Automatic Bank Transfer), **4** (Credit Card)|
|Monthly Charges| Monthly bill assesed to the cusotmer|
|Total Charges| Total revenue paid by the cusotmer|
|Churn| Yes or No, has the customer stopped doing business with us|

In [3]:
# acquire & clean the data
df = w.wrangle_telco_data()

# split the data into train, validate, and test subsets
train, validate, test = w.split_my_data(df)

### A breif look at our data:

In [4]:
train.head()

Unnamed: 0,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,internet_service_type_id,tech_support,contract_type_id,payment_type_id,monthly_charges,total_charges,churn
0,Male,0,No,No,1,Yes,No,1,No,1,2,45.05,45.05,Yes
1,Male,0,Yes,No,4,Yes,Yes,2,No,1,1,100.2,420.2,Yes
2,Male,0,No,No,19,Yes,No,2,No,1,4,73.85,1424.5,Yes
3,Male,0,Yes,No,51,Yes,No,1,Yes,3,4,83.25,4089.45,No
4,Male,0,No,No,59,Yes,Yes,1,No,3,4,54.15,3116.15,No


### A summary of the charges:

In [8]:
train[['monthly_charges', 'total_charges']].describe().round(2)

Unnamed: 0,monthly_charges,total_charges
count,3943.0,3937.0
mean,64.88,2291.53
std,30.2,2279.27
min,18.55,18.8
25%,35.58,386.5
50%,70.35,1393.6
75%,89.9,3801.7
max,118.75,8684.8
