# SyriaTel Communications Project: Customer Churn (Binary Classification)

### Table of Contents

1. Introduction + Cleaning + EDA: Exploration of the data without intensive technologies. 
2. Modeling the Data
3. Question One
4. Question Two
5. Question Three
6. Conclusion: The findings repeated in a concise summary.

### Strategic Approach

The stakeholder, SyriaTel, is a telecommunications company whose goal is to better understand what factors are causing customer churn. Due to limits on data size, the model will only be robust enough to be used by SyriaTel to analyze their specific data provided.

By exploring typical customer churn and using the CRISP-DM framework, I will seek to answer the following questions: 

1. What features of the dataset are primary determinants of customer churn and to what extent?

2. What are the ways that these findings can be interpreted and how can SyriaTel implement cost-effective solutions?

3. What is the customer charge per minute when each customer pays the average charge per customer? 

### Importing Relevant Packages

In [2]:
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, RandomizedSearchCV
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder
import itertools
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LinearRegression, LogisticRegression, Ridge, Lasso, ElasticNet
from sklearn.metrics import accuracy_score, roc_auc_score, roc_curve, mean_squared_error, classification_report, confusion_matrix, roc_auc_score
from sklearn.preprocessing import StandardScaler, scale
from sklearn.pipeline import Pipeline
from scipy.stats import randint
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score, auc, confusion_matrix, classification_report, recall_score
from sklearn.neighbors import KNeighborsClassifier
import pickle
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier, plot_importance
from sklearn.datasets import make_blobs, make_moons
from sklearn.svm import SVC
from numpy import loadtxt
import shap
from statsmodels.stats.outliers_influence import variance_inflation_factor
import plotly.express as px
from dtreeviz.trees import *
from sklearn import tree
import networkx as nx
import pylab as plt
from networkx.drawing.nx_agraph import graphviz_layout



# Question Three

# What is the customer charge per minute when each customer pays the average charge per customer? 

In [6]:
df = pd.read_csv('data/Kaggle_Customer_Churn_Dataset.csv')
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [9]:
df['total_mins'] = df['total day minutes'] + df['total night minutes'] + df['total eve minutes'] + df['total intl minutes']

### Average Number of Total Minutes: 591 Mins: Nearly 10 Hours Per Month

In [10]:
df.total_mins.describe()

count    3333.000000
mean      591.864776
std        89.954251
min       284.300000
25%       531.500000
50%       593.600000
75%       652.400000
max       885.000000
Name: total_mins, dtype: float64

### Average Cost Per Customer

In [12]:
df['total_charge'] = df['total day charge'] + df['total night charge'] + df['total eve charge'] + df['total intl charge']

In [13]:
df.total_charge.describe()

count    3333.000000
mean       59.449754
std        10.502261
min        22.930000
25%        52.380000
50%        59.470000
75%        66.480000
max        96.150000
Name: total_charge, dtype: float64

### Average Cost Per Minute at Flat Fee of 59.45 is 10 cents.

This seems like a competitive-sounding rate with marketability but further investigation is needed to see if it would be a competitive offer in the market.

In [14]:
59.45 / 591.86

0.10044605143108168

# Conclusion

It could be possible for SyriaTel to reduce its customer churn if it is able to find a way to make a subscription model profitable. 

### Why does SyriaTel Need a Subscription Model?

SyriaTel can significantly improve it's churn rate by implementing a flat monthly fee subscription model for it's customers. Not only do customers prefer subscription models as a way to automate their lives but also, subscription models help the companies using them too:
 - Customers are not upset or surprised by their bills. 
 - Companies know ahead of time what their estimated revenue is from subscriptions.
 - Companies are able to provide better solutions for unique needs of customer segments. 
 - Companies are able to build penalties around particular plans and subscriptions. 
 - When customers receive penalties they are more likely to take responsibility since they are aware of the penalties that they agreed to.

# Future Work

Investigating further into probability of churn by state, reasons for customer service calls, and profits and costs within the company would shed even more light on potential solutions for the churn problem at SyriaTel.