# Evaluating Customer Churn for a Telecommunications Company

This machine learning project predicts customer churn of a telecommunications company. This project is a Kaggle project. The link to the project is: https://www.kaggle.com/blastchar/telco-customer-churn/home. 

In [None]:
import pandas as pd
import numpy as np
from numpy import around
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#  Metrics
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from sklearn.model_selection import cross_val_score

# Models
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier

### Import data and convert churn yes/no values into binary

In [None]:
# Import Data
df = pd.read_csv("../input/telecommunications-churn/train-scaled.csv")
# Change Churn from string (yes/no) to binary.
y = pd.Series(np.where(df.Churn.values == 'Yes', 1, 0),
          df.index)
# Gather the ID
customer_id = df["customerID"]
# Training data
X = df.drop(["Unnamed: 0", "customerID", "Churn"], axis=1)

In [None]:
X.shape

### Feature selection and PCA

We have 7043 records and 61 features. Although 61 features may not seem a lot, when I use K-Cross Fold validation with the SVM algorithm, it takes considerable amount of time. 

Our goal is to reduce the feature set while retaining the most important information for the model.

The feature selection algorithm we will use is MRMR. MRMR is a filter based feature selection algorithm which tries to select the most relevant features with the target class labels and minimize the redundancy
among those selected features simultaneously. Thanks to [Hanchuan Peng](http://home.penglab.com/proj/mRMR/) for creating an [implementation](https://github.com/fbrundu/pymrmr) of MRMR for python. 

Note: We will be using the Mutual Information Difference (MID) feature evaluation meth-
od inside MRMR. Thanks to Gokhan Gulgezen, Zehra Cataltepe, and Lei Yu for their [analysis of the MRMR selection algorithms](http://web.itu.edu.tr/~cataltepe/pdf/2009_ECMLGulgezen.pdf). They found that MID was the most stable feature selection algorithm out of all the MRMR selction algorithms.

In [None]:
# Feature Selection algorithm
from pymrmr import mRMR
# Selects the import features
f_select = mRMR(X, 'MID', 10)
# Important features
f_select

We use PCA to transform and decouple the dataset defined by the selected features

In [None]:
from sklearn.decomposition import PCA
pca = PCA()  
XX = pca.fit_transform(X[f_select]) # XX is out new training matrix that has two basis vectors

### Creating training and test sets

In [None]:
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(XX, y, test_size = 0.3, random_state = 0)

In [None]:
# prepare configuration for cross validation test harness
seed = 0
# prepare models
models = []
models.append(('LR', LogisticRegression(random_state=seed)))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('RF', RandomForestClassifier(n_estimators=10, criterion='gini', random_state=seed)))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(kernel='rbf', probability=True, random_state=seed)))
models.append(('ADA', AdaBoostClassifier(n_estimators=50, learning_rate=1,random_state=seed)))
models.append(('XGB', XGBClassifier(max_depth=3, n_estimators=100, learning_rate=0.05)))


# evaluate each model in turn
cross_val_results = list()
accuracy_results = list()
names = list()
scoring = 'accuracy'

for name, model in models:
    print("-" * 70)
    print("This is {name} model.".format(name=name))
    print("-" * 70)
    # Test each model on one run through    
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # Model statistics for using the test data in the training set
    accuracy = accuracy_score(y_test, y_pred)
    accuracy_results.append(accuracy)
    print('This is the accuracy score: {}'.format(accuracy))
    print('Here is the classification report: ')
    print(classification_report(y_test, y_pred))
    print('Confusion Matrix')
    print(cm)
    print('\n')
    
    # Test model on its cross-validation score
    cv_results = cross_val_score(model, X_test, y_test, cv=10, scoring=scoring)
    cross_val_results.append(cv_results)
    names.append(name)
    msg = "Cross validation score of %s Model: %f (%f)" % (name, cv_results.mean(), cv_results.std())
    print(msg)
    print("-" * 70)
    print('\n\n\n')

## Results

Judging by the accuracy, cross validation accuracy, and F-1 scores, the top 3 algorithms are: XGBoost, AdaBoost, and Logistic regression. Overall, the models did well. Their mean accuracy and cross validation accuracy is 77% each.  

For a telecommunications company, it is important to identify which customers may their service. Since telecommunications comapnies make their money by long-term contracts. If they can keep their customers from leaving their firm, then their business will not decline.

Businesses want to minimize the risk that they misidentify a customer who is leaving the firm, which is called a false negative. However, it is also important that the firm does not overcorrect and overly identify people who are staying with the firm as those who may leave. This is wasted marketing dollars. 

There is a trade off between identifying which customers are either staying or leaving the firm and minimizing false negatives and positives. XGBoost, AdaBoost, and Logistic Regression have high accuracy scores but average false negative and false positive rates.

## Potential Improvements

One can potentially improve this analysis by implementing ensemble methods and compare their results to XGBoost, AdaBoost, and Logistic regression. Another potential improvement is to use grid search to optimize the parameters of feature selection, PCA, and the various machine learning algorithms used.