### Introduction
This Notebook is for focused customer retention programs, i.e. predict behavior to retain customers and uses the Telco customer churning dataset to build a classifier for predicting cutomer churn. The data set is obtained from Kaggle [here](https://www.kaggle.com/blastchar/telco-customer-churn). The goal is to predict whether a customer will churn (i.e. leave) given a set of predictors. 

In [1]:
import pandas as pd
import numpy as np


In [2]:
# Load the data
data = pd.read_csv('churn.csv')
data.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


#### Data cleanup keeping relevant features.

In [3]:
# Remove columns
data.drop(['customerID', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'StreamingMovies', 'PaperlessBilling', 'PaymentMethod'], axis = 1, inplace = True)

# Create dummy variables
data = pd.get_dummies(data = data, columns = ['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'TechSupport', 'StreamingTV', 'Contract'], drop_first = True)
data.head()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges,Churn,gender_Male,Partner_Yes,Dependents_Yes,PhoneService_Yes,MultipleLines_No phone service,MultipleLines_Yes,InternetService_Fiber optic,InternetService_No,TechSupport_No internet service,TechSupport_Yes,StreamingTV_No internet service,StreamingTV_Yes,Contract_One year,Contract_Two year
0,0,1,29.85,29.85,No,0,1,0,0,1,0,0,0,0,0,0,0,0,0
1,0,34,56.95,1889.5,No,1,0,0,1,0,0,0,0,0,0,0,0,1,0
2,0,2,53.85,108.15,Yes,1,0,0,1,0,0,0,0,0,0,0,0,0,0
3,0,45,42.3,1840.75,No,1,0,0,0,1,0,0,0,0,1,0,0,1,0
4,0,2,70.7,151.65,Yes,0,0,0,1,0,0,1,0,0,0,0,0,0,0


In [4]:
X = data.drop('Churn', axis = 1)
y = data.Churn

Train a SVM mpdel with linear Kernel and investigate the classification error on the test set. 
1. First split the data into 80%-20%. 
2. Also apply appropriate preprocessing steps, and tune the model.

In [5]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline

In [6]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 862)

In [7]:
# Classifier
from sklearn.svm import SVC 

linear_svm_clf = Pipeline([ ('scaler', StandardScaler()),
                           ('svm_clf', SVC(kernel="linear", random_state = 862)) ])

parameters = {'svm_clf__C' : [ 1,10,12,15,20,100]}

linear_clf = GridSearchCV(linear_svm_clf, parameters, cv=5, n_jobs=-1)
linear_clf.fit(X_train, y_train)
print(linear_clf.best_params_)
print(np.mean(linear_clf.predict(X_train) != y_train)) # Training accuracy
print(np.mean(linear_clf.predict(X_test) != y_test)) # Testing accuracy

{'svm_clf__C': 12}
0.21137777777777778
0.22103766879886283


**Result: Classification error on test set is 0.22103766879886283**


**Try different polynomial and rbf kernels. Diagnose if it can improve the test error. Set seed to create reproducible results.**

In [8]:
# Polynomial
poly_svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('svm_clf', SVC(kernel="poly", random_state = 862)) ])

parameters = {'svm_clf__C' : [ 0.1, 0.2,0.5,1,5,10],
             'svm_clf__degree': [1,2,3,4],
             'svm_clf__coef0': [0,1,2,3]}
poly_clf = GridSearchCV(poly_svm_clf, parameters, cv=5, n_jobs=-1)
poly_clf.fit(X_train, y_train)

print(poly_clf.best_params_)
print(np.mean(poly_clf.predict(X_train) != y_train))
print(np.mean(poly_clf.predict(X_test) != y_test))

{'svm_clf__C': 0.2, 'svm_clf__coef0': 2, 'svm_clf__degree': 3}
0.19786666666666666
0.21108742004264391


In [9]:
# RBF
rbf_svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('svm_clf', SVC(kernel="rbf", random_state = 862))
])
parameters = {'svm_clf__C' : [ 0.1,0.5,1,5,10],
             'svm_clf__gamma': [0.1,0.5,1,5]}
rbf_clf = GridSearchCV(rbf_svm_clf, parameters, cv=5, n_jobs=-1)
rbf_clf.fit(X_train, y_train)

print(rbf_clf.best_params_)
print(np.mean(rbf_clf.predict(X_train) != y_train))
print(np.mean(rbf_clf.predict(X_test) != y_test))

{'svm_clf__C': 0.5, 'svm_clf__gamma': 0.1}
0.19644444444444445
0.22316986496090974


**Best result:** 

1. The plynomial kernel with degree 3 gives the least test error 0.211 and gives the best result out of the three.  
2. The rbf kernel is the worst because of overfitting or high variance having least training error and highest test error 0.2231.  
3. The linear kernel test error 0.2210 is slightly less than rbf kernel 0.2231. 

#### Baseline Model 
Fit a regularized logistic regression on the data set to see if it can achieve a better test error.

In [10]:
# Logistic regression
from sklearn.linear_model import LogisticRegression

# Set up grid search for regularized logistic regression
estimator = Pipeline( [('scale', StandardScaler()),
                      ('clf', LogisticRegression(penalty = 'l1', solver = 'liblinear', random_state = 1000))] )
grid = {'clf__C': np.linspace(0.01,50,100),
       'clf__max_iter': np.linspace(1000,10000,10)}

clf = GridSearchCV(estimator, grid, cv = 5, scoring = 'accuracy', n_jobs = -1)
clf.fit(X_train, y_train)

print(clf.best_params_)
print(np.mean(clf.predict(X_train) != y_train))
print(np.mean(clf.predict(X_test) != y_test))

{'clf__C': 1.0198989898989899, 'clf__max_iter': 1000.0}
0.19893333333333332
0.2125088841506752


**Observation:**  
Overall, the polynomial kernel with degree 3 gives the least test error 0.211 or Accuracy score of 0.789 and gives the best result. However, it is just a little better than the regularized logistic regression test error as 0.212. This could mean that the complexity of the polynomial kernel matches with the complexity of regularized logistic regression.