# Telecom customer churn prediction
   ## Introduction
Customer churn or customer turnover refers to when a customer ceases services with a company. Churn prediction is a subset of problem that can be extend to many area such as employees in a company, customer churn from a mobile subscription etc. 
We are going to use the Telecom data to predict churn. After loading the the data, we will explore attributes and different relationships between them before building our model.
 
 ## Loading Data

In [None]:
#importing necessary libraries

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import AdaBoostClassifier,GradientBoostingClassifier,RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
%matplotlib inline

data=pd.read_csv("/kaggle/input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")
print("This dataset has {0} rows and {1} columns".format(data.shape[0],data.shape[1]))
data.head(4)

## Understanding the dataset
The dataset has 21 attributes and below is the definition:
* customerID: Customer ID, unique identifier for each customer
* gender    : Whether the customer is a male or a female
* SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
* Partner : Whether the customer has a partner or not (Yes, No)
* Dependents: Whether the customer has dependents or not (Yes, No)
* tenure : Number of months the customer has stayed with the company
* PhoneService : Whether the customer has a phone service or not (Yes, No)
* MultipleLines : Whether the customer has multiple lines or not (Yes, No, No phone service)
* InternetService : Customer’s internet service provider (DSL, Fiber optic, No)
* OnlineSecurity : Whether the customer has online security or not (Yes, No, No internet service)
* OnlineBackup : Whether the customer has online backup or not (Yes, No, No internet service)
* DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
* TechSupport : Whether the customer has tech support or not (Yes, No, No internet service)
* StreamingTV : Whether the customer has streaming TV or not (Yes, No, No internet service)
* StreamingMovies : Whether the customer has streaming movies or not (Yes, No, No internet service)
* Contract : The contract term of the customer (Month-to-month, One year, Two year)
* PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
* PaymentMethod : The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
* MonthlyCharges: The amount charged to the customer monthly
* TotalCharges : The total amount charged to the customer
* Churn: Whether the customer churned or not (Yes or No

## Exploratory data analyis
In this section, we will first  do an exploratory data analysis by exploring most attributes and check their contribution or how they are related to customers churn. We will follow the steps below:
* 1. Listing statistical properties
* 2. Finding the missing values
* 3. Correlation
* 4. Detecting Outliers

### 1. Listing statistical properties
Before running our statistic, we will take a look at the data type.

In [None]:
data.dtypes

We are going to convert Total charges that is a numeric from object data type to float.

In [None]:
data['TotalCharges']=pd.to_numeric(data['TotalCharges'], errors='coerce')
cols=['MonthlyCharges', 'TotalCharges','tenure']
data[cols].describe()

As we can see, each customer get charge a minimum of 64.76, the average total charges per customer is about 2283 and the average of month a customer stay with the company is about about 32 months.

In [None]:
fig=plt.figure(figsize=(6,5))
p= sns.countplot(x='Churn', data=data)
ax=plt.gca()
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x() + p.get_width()/2.,height+2, '{:.2f}%'.format(100*(height/data.shape[0])),fontsize=14,ha='center',va='bottom')
sns.set(font_scale=1.5)
ax.set_xlabel("Labels for Customer Churn")
ax.set_ylabel("Numbers of records")
plt.title("Data distriblution")
;


We see that about 73% of customers did not churn and about 27% churned.The data seems to be somewhat imbalance.  

### 2. Finding the missing values

In [None]:
data.isnull().sum()

As we see above, the column TotalCharges has 11 missing values. We are going to replace the value missing by the mean.         


In [None]:
meanTotalCharge = data.TotalCharges.mean()
data['TotalCharges']=data['TotalCharges'].fillna(meanTotalCharge)

###  Inspecting the mean attributes of customers who churn 

In [None]:
data.groupby('Churn').mean()

As we can see, customers who churn seems on average to stay less in the company and have a monthly greater charges  compare to those who do not churn. Their total charges is lower than customers that do not churn.

In [None]:
f,axes = plt.subplots(ncols=3, figsize=(17,6))
sns.distplot(data.tenure,kde=True,ax=axes[0], color='darkorange').set_title("Customer tenure")
axes[0].set_ylabel('No of Customers')

sns.distplot(data.MonthlyCharges,kde=True,ax=axes[1],color='maroon').set_title('Monthly Charges')
axes[1].set_ylabel('No of Customers')

sns.distplot(data.TotalCharges,kde=True,ax=axes[2]).set_title('Total Charges')
axes[2].set_ylabel('No of Customers')
;

From the observations above, it looks like:
* The **tenure** seems to be bimodal. The first most represent who haven't been in the company for a long time, the second some faithfull customers who have been with the company for a very long time.
* Looking at **Monthlycharges**, It looks like newer customer are charged more than those who stay longer with the company. Most customers seems to pay between 70-90.
* **Totalcharges** is a right skewed distribution, there are a lot of customers with lower total charges, but fewer with very large balance.

###  Inspecting Churn by gender

In [None]:
plt.figure(figsize=(8,4))
p=sns.countplot(x="gender", hue="Churn", data=data)
ax=plt.gca()
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,height+2, '{:.2f}%'.format(100*(height/data.shape[0])),fontsize=12,ha='center',va='bottom')
sns.set(font_scale=1.5)
plt.title('Churn Distribution by gender', fontweight="bold")
;

There are more male customer than female customers. But box sexes seems to churn with the same percentage.

###  Churn by contract type

In [None]:
plt.figure(figsize=(10,5))
p=sns.countplot(x='Contract',hue='Churn',data=data)
ax=plt.gca()
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,height+2,'{:.2f}%'.format(100*(height/data.shape[0])),fontsize=14,ha='center',va='bottom')
sns.set(font_scale=1.5)
plt.title('Churn Distribution by Contract', fontweight='bold')
;

Most customers are month to month customers, they churn more than customers who subscribe for one year or two years contrats.

###  Churn by Payment method

In [None]:
plt.figure(figsize=(15,4))
p=sns.countplot(x='PaymentMethod',hue='Churn', data=data)

plt.title ('Churn by payment method', fontweight='bold')
;

Customers who pay by Electronic check seems to churm more than customers who pay by mailed check, bank transfer or credit card. Mailed check, bank transfer or credit card customers seems to churn in about the same rate.

###  Churn by Montly rate

In [None]:
plt.figure(figsize=(15,4))
ax = sns.kdeplot(data.loc[(data['Churn']=='No'),'MonthlyCharges'],shade=True,label='No Churn')
ax = sns.kdeplot(data.loc[(data['Churn']=='Yes'),'MonthlyCharges'],shade=True,label='Churn')
ax.set(xlabel='Customer Montly Charges',ylabel='Frequency')
plt.title('Customer Monthly Charges - Churn vs No Churn', fontweight='bold');


# 

Customers who are charged less that 40 a month seems to churn less. As the monthly rate increase, they churn more. Customers who churn the most pay between 70-100 a month.

Customers who pay by Electronic check seems to churm more than customers who pay by mailed check, bank transfer or credit card. Mailed check, bank transfer or credit card customers seems to churn in about the same rate.

###  Churn by total charges

In [None]:
plt.figure(figsize=(15,4))
ax=sns.kdeplot(data.loc[(data['Churn']=='No'),'TotalCharges'], shade=True, label='No Churn')
ax=sns.kdeplot(data.loc[(data['Churn']=='Yes'),'TotalCharges'], shade=True, label='Churn')
ax.set(xlabel='Customer Total Charges',ylabel='Frequency')
plt.title('Customer Total Charges - Churn vs No Churn', fontweight='bold');

Customers who have a total balance less than 1500 seems to churn more than customers with higher balance.

###  Churn by Monthy charges and tenure

In [None]:
plt.figure(figsize=(8,4))
p=sns.countplot(x='SeniorCitizen',hue='Churn', data=data)

plt.title ('Churn by SeniorCitizen', fontweight='bold')
;

Non senior citizens churn more that senior citizens

In [None]:
data.head()

## Detecting outliers
An outlier is a value that lies at an abnormally high distance from other values in the dataset. It can be much smaller or much larger. Basically, it doe not show the same pattern as other values. We will be using interquartile range(IQR) to detect outliers. The interquartile range is te range between the first quartile(Q1) and the third quartile (Q3). With this approach, any value which is more than 1.5 IQR+Q3 or less than Q1 - 1.5 IQR is considered as outlier. We will check the outlier in price.


In [None]:
def percent_outlier(data):
    Q1 = np.percentile(data,25)
    Q3 = np.percentile(data,75)
    IQR = Q3-Q1
    lower_bound = Q1-(IQR*1.5)
    upper_bound = Q3+(IQR*1.5)
    return (lower_bound,upper_bound)

We are going to draw the boxplot for the tenure column and get the outlier list.

In [None]:
ax = sns.boxplot(y='tenure',x='Churn',data=data)
ax.set_title('Tenure box plot by Churn')
;

In [None]:
lowerbound,upperbound=percent_outlier(data.tenure)
tenureout=[x for x in data.tenure if (x<lowerbound) or (x>upperbound)]
tenureout
print ("All tenure value less than {0} and more than {1} are considered outliers".format(lowerbound,upperbound))
print("The min tenure is ",min(data.tenure))
print("The max tenure is ",max(data.tenure))

In [None]:
lowerbound,upperbound=percent_outlier(data.MonthlyCharges)
tenureout=[x for x in data.tenure if (x<lowerbound) or (x>upperbound)]
tenureout
print ("All monthly charges less than {0} and more than {1} are considered outliers".format(lowerbound,upperbound))
print("The min monthly charges is ",min(data.MonthlyCharges))
print("The max monthly charges is ",max(data.MonthlyCharges))

In [None]:
lowerbound,upperbound=percent_outlier(data.TotalCharges)
tenureout=[x for x in data.tenure if (x<lowerbound) or (x>upperbound)]
tenureout
print ("All Total Charges  less than {0} and more than {1} are considered outliers".format(lowerbound,upperbound))
print("The min Total Charges is ",min(data.TotalCharges))
print("The max Total Chargess is ",max(data.TotalCharges))

Based on the method used here to detect outliers, all values seems to be in the normal range. Therefore, our dataset does not have outliers.

## Feature engineering
In this section, we will find the feature that are more predictive for our model. Before proceed to our features engineering, we are going to map all the string boolean to numeric boolean yes=1 and No=0

In [None]:
# Converting string boolean to numeric boolean
data['PhoneService']=data['PhoneService'].map({'Yes':1,'No':0})
data['PaperlessBilling'] =data['PaperlessBilling'].map({'Yes':1,'No':0})
data['Churn'] =data['Churn'].map({'Yes':1,'No':0})
data['Partner'] = data['Partner'].map({'Yes':1,'No':0})
data['Dependents']=data['Dependents'].map({'Yes':1,'No':0})

For other categorical features, we will do a one-hot encoding to transform them to binary. For each variable that has n features, we will create n-1 features.Basically one-hot encoding creates a dummy feature for each unique value in the nominal feature and assign 1 if it has a value and 0 otherwise.

In [None]:
# One hot encoding for categorical features
contract = pd.get_dummies(data['Contract'], prefix='Contract',drop_first=True)
# combining to the original dataframe
data = pd.concat([data,contract],axis=1)

payement = pd.get_dummies(data['PaymentMethod'],prefix='PaymentMethod',drop_first=True)
data=pd.concat([data,payement],axis=1)

gender=pd.get_dummies(data['gender'],prefix='gender',drop_first=True)
data=pd.concat([data,gender],axis=1)

TelLines  = pd.get_dummies(data['MultipleLines'],prefix='MultiLines',drop_first=True)
data = pd.concat([data,TelLines], axis=1)


# dummy for'InternetService'
internet = pd.get_dummies(data['InternetService'],prefix='InternetService',drop_first=True)
data = pd.concat([data,internet],axis=1)


# dummy for 'OnlineSecurity'.
security = pd.get_dummies(data['OnlineSecurity'],prefix='OnlineSecurity')
security1= security.drop(['OnlineSecurity_No internet service'],axis=1)
data = pd.concat([data,security1],axis=1)

# dummy for 'OnlineBackup'.
backup =pd.get_dummies(data['OnlineBackup'],prefix='OnlineBackup')
backup1 =backup.drop(['OnlineBackup_No internet service'],axis=1)
data = pd.concat([data,backup1],axis=1)

# dummy for 'DeviceProtection'. 
device =pd.get_dummies(data['DeviceProtection'],prefix='DeviceProtection')
device1 = device.drop(['DeviceProtection_No internet service'],axis=1)
data = pd.concat([data,device1],axis=1)

# dummy for 'TechSupport'. 
support =pd.get_dummies(data['TechSupport'],prefix='TechSupport')
support1 = support.drop(['TechSupport_No internet service'],axis=1)
data = pd.concat([data,support1],axis=1)

# dummy for 'StreamingTV'.
TV =pd.get_dummies(data['StreamingTV'],prefix='StreamingTV')
TV1 = TV.drop(['StreamingTV_No internet service'],axis=1)
data = pd.concat([data,TV1],axis=1)

# dummy for 'StreamingMovies'. 
movies =pd.get_dummies(data['StreamingMovies'],prefix='StreamingMovies')
movies1 = movies.drop(['StreamingMovies_No internet service'],axis=1)
data = pd.concat([data,movies1],axis=1)


# Dropping the original variables
data = data.drop(['Contract','PaymentMethod','gender','MultipleLines','InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
       'TechSupport', 'StreamingTV', 'StreamingMovies'], axis=1)

print("After creating dumming variables, the new dataset has {0} rows and {1} columns".format(data.shape[0],data.shape[1]))

## Feature Standardisation
We are going to bring all our continues features to the same magnitude by standisize them. 

In [None]:
cols=['tenure','MonthlyCharges','TotalCharges']
num_data = data.loc[:,cols]
norm_data = (num_data-num_data.mean())/num_data.std()
data.drop(cols,axis=1, inplace=True)
data=pd.concat([data,norm_data],axis=1)
data.head()

## Model Building
In this section, we are going to build our first model.  We are going to choose find different machine algorithms to train our base model using all features, then select the one that perform well to tune in order to have better accuracy.
### Selecting machine learning algorithms
This is a classification problem, we want to predict whether or not a customer will churn. Here are the classifications that we will explore:
* K-Nearest Neighbor (KNN)
* Logistic Regression
* AdaBoost
* GradientBoosting
* RandomForest


#### Splitting the data in training set and test set
We are going to keep 70% of data for training and 30% for testing. Based on our analysis above, we saw that  about 73% of customers did not churn and about 27%, which is somewhat unbalanced. We are going to add the argument **stratify=y** to make sure that both training and test datasets have the same class proportions as the original dataset.

In [None]:
X = data.drop(['Churn','customerID'], axis=1)
y = data.Churn

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=100, stratify=y)

In [None]:
# K-Nearest Neighbor (KNN)
knn = KNeighborsClassifier(n_neighbors=5,weights='uniform',algorithm='auto',leaf_size=30,p=2,metric='minkowski',metric_params=None)

# Logistic regression
logreg = LogisticRegression(penalty='l1',dual=False,tol=0.0001,C=1.0,fit_intercept=True,intercept_scaling=1,class_weight=None,
                           random_state=None,solver='liblinear',max_iter=100,multi_class='ovr',verbose=1)

#adaBoost
ada = AdaBoostClassifier(base_estimator=None,n_estimators=200,learning_rate=1.0)
#Gradient Boosting
gboosting =GradientBoostingClassifier(loss='deviance',learning_rate=0.1,n_estimators=200,subsample=1.0,min_samples_split=2,
                                     min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_depth=3,init=None,random_state=None,
                                      max_features=None,verbose=0    )
#RandomForest Classifier
rf = RandomForestClassifier(n_estimators=10,criterion='gini',max_depth=None,min_samples_split=2,min_samples_leaf=1,
                           min_weight_fraction_leaf=0.0,max_features='auto',max_leaf_nodes=None,bootstrap=True,oob_score=False,
                           n_jobs=1,random_state=None,verbose=0)


### Training the baseline model
Using the algorithms above, we will train our data using all features.

In [None]:
#Creating Dictionnary of models to train

models ={}
models['K-NN']=knn
models['Logistic Regression']=logreg
models['adaBoost']=ada
models['gradient Boosting']=gboosting
models['random Forest']=rf

In [None]:
#Creating a function to train all our algorithms

def trainfunc(X_train,y_train,models):
    for label,model in models.items():
        print("Training our {0} model".format(label))
        model.fit(X_train,y_train)
;

In [None]:
#Training all base models
trainfunc(X_train=X_train,y_train=y_train,models=models)

### Baseline Model evaluation
We are going to see how our algorithms will perform on the testing set. We will use as evaluation metrics the mean accuracy score and the ROC-AUC score.

In [None]:
result=pd.DataFrame(columns=['Basecore','BaseAUC-ROC'],index=models.keys())

def testingfunc(X_test,y_test,models,lscore='Basecore',auc='BaseAUC-ROC'):
    for label,model in models.items():
        print("Testing our {0} model".format(label))
        score=str(round(model.score(X_test,y_test),2))
        labels=model.predict_proba(np.array(X_test.values))[:,1]
        roc=str(round(roc_auc_score(y_test,labels,average='macro',sample_weight=None),2))
        result.loc[label,lscore]=score
        result.loc[label,auc]=roc
    return result

In [None]:
#Testing base model
testingfunc(X_test=X_test,y_test=y_test,models=models,lscore='Basecore',auc='BaseAUC-ROC')

The table above has the mean accuracy score and the Auc-roc score, which is more significant than the former.In fact, the mean accuracy score considers only one threshold value, while ROC-AUC score takes in consideration all possible treshold value and return the score.
We see that Logistic regression, adaBoost and gradient boosting gives us the best ROC-AUC. Their scores are very competitive as well. From now on, we are going to work with **logistic regression** and **gradient boosting** to see how we can improve our prediction.

### Model optimization
In this section, we are going to try to improve the accuracy of our model. We will first focus on two techniques:
* Features selection
* Cross validation
* Hyperparameter tuning

**Features selection** consist of choosing the set of features that is most important to the model and that reduces overfitting.
**Cross validation** is a technique that consist of dividing the data in multiple folds(k), and at each iteration, using one k-1 fold for training and one fold for validation. This will help to avoid **overfitting**(our model does not generalize properly on unseen data) and help us choosing the best model. The general term is k-fold cross validation which k in the number of fold the training data is split into.

**Hyperparameter tuning** consist of feeding our model with a range of paramters and consider the one that allow the model generate better accuracy.

#### Features selection
We will be assessing feature importance with random forests. We can measure the features importance as the averaged impurity decrease computed from all the decision trees in the forest, without making any assumptions about whether our data is linearly separable or not.


In [None]:
mylabels = X_train.columns
forest = RandomForestClassifier(n_estimators=500,random_state=1)
forest.fit(X_train,y_train)
importances=forest.feature_importances_
indices = np.argsort(importances)[::-1]
for f in range(X_train.shape[1]):
    print("%2d) %-*s %f" %(f+1,30,mylabels[indices[f]],importances[indices[f]]))

In [None]:
plt.figure(figsize=(15,6))
plt.title("Feature Importances")
plt.bar(range(X_train.shape[1]), importances[indices], color="red", align="center")
plt.xticks(range(X_train.shape[1]),mylabels, rotation=75, fontsize=12)
plt.xlim([-1,X_train.shape[1]])
;

Now let's set a threshold to take the most important features. we will set our thresold to 0.05.

In [None]:
from sklearn.feature_selection import SelectFromModel
sfm = SelectFromModel(forest, threshold=0.0125, prefit=True)
selected = sfm.transform(X_train)
print("{0} best features were seleted".format(selected.shape[1]) )
selected_columns=[]
for feat in range(selected.shape[1]):
    selected_columns.append(mylabels[indices[feat]])
    print("%2d) %-*s %f"%(feat+1,30,mylabels[indices[feat]],importances[indices[feat]]))

#### Retaining our model after features selection
We are going to now retraining our model after features selection. The goal of this exercise is to make sure our accuracy does not decrease drop drastically after we removed some columns.

In [None]:
#selected_columns
X_train_selected=X_train[selected_columns]
X_test_selected=X_test[selected_columns]

#training with best features
trainfunc(X_train=X_train_selected,y_train=y_train,models=models)
#Testing with best features
result["Dimrecscore"]=""
result["DimrecAUC-ROC"]=""
testingfunc(X_test=X_test_selected,y_test=y_test,models=models,lscore='Dimrecscore',auc='DimrecAUC-ROC')

The accuracy after dimensionality reduction.The new columns are **Dimrecscore,DimrecAUC-ROC**.As we can see, our accuracy did not change much after we dropped some columns.

#### Cross validation
We are going to implement our cross validation now using scikit-learn cross-validation score module. We will be choosing k=5.

In [None]:
from sklearn.model_selection import cross_val_score

def cvbuild(cvmodel, scr,X_train,y_train,cv=10):
    
    for label,model in cvmodel.items():
        cvscore = cross_val_score(model,X_train,y_train,cv=cv,scoring=scr)
        result.loc[label,'cvAUC-ROC']=str(round(cvscore.mean(),2))
        result.loc[label,'cvscore_std']=str(round(cvscore.std(),4))
    return result


In [None]:
cvd=cvbuild(cvmodel=models,X_train=X_train_selected,y_train=y_train,scr='roc_auc')
cvd


After cross validation, we see an improvement in Knn and random forest score. Logistic regression, adaboost score and gradient boosting all have the same score as in the dimension reduction. However, we see that adaBoost and gradient boosting have with equal score lower standard deviation, therefore, they seems to be more consistent for our model compare to other algorithms.

#### Hyperparameter tuning
Given that implemeting parameter tuning can be very time consuming, we will tune only two of our best models (adaBoost and Gradient Boosting)

In [None]:
from sklearn.model_selection import RandomizedSearchCV
# param tuning for ada boost
adaParams ={'n_estimators':[10,50,200,420]}
gridSearchada = RandomizedSearchCV(estimator=ada,param_distributions=adaParams,n_iter=4, scoring='roc_auc',
                                  cv=5).fit(X_train_selected,y_train);

In [None]:
# best params
gridSearchada.best_estimator_,gridSearchada.best_score_

In [None]:
#param tuning for gradient boosting
from scipy.stats import randint
gbParams = {'loss' :['deviance','exponential'],'n_estimators' : randint(10,500), 'max_depth':randint(1,10)}
gridSearchGB= RandomizedSearchCV(estimator=gboosting,param_distributions=gbParams,n_iter=10,
                                scoring='roc_auc',cv=5).fit(X_train_selected,y_train)

In [None]:
gridSearchGB.best_params_,gridSearchGB.best_score_

Above are the best parameters found for adaBoost and gradient Boosting.
#### Implementing and testing the better model.
In this section, we are going to re-implement our two models using the best parameters found.


In [None]:
#Training the best approach
bestada = gridSearchada.best_estimator_.fit(X_train_selected,y_train)
bestgboosting=gridSearchGB.best_estimator_.fit(X_train_selected,y_train)

# Getting score for the best approach
# ada
ada_labels = bestada.predict_proba(np.array(X_test_selected))[:,1]
bestadaroc=round(roc_auc_score(y_test,ada_labels, average='macro',sample_weight=None),2)
# gradient boosting
gb_labels = bestgboosting.predict_proba(np.array(X_test_selected))[:,1]
bestgbroc=round(roc_auc_score(y_test,gb_labels, average='macro',sample_weight=None),2)

result.loc['adaBoost','bestAUC-ROC']=bestadaroc
result.loc['gradient Boosting','bestAUC-ROC']=bestgbroc

result

### Performance evaluation


In [None]:
from sklearn.metrics import classification_report
ada_pred = bestada.predict(X_test_selected)
print(classification_report(y_test,ada_pred))

In [None]:
gb_pred=bestgboosting.predict(X_test_selected)
print(classification_report(y_test,gb_pred))

Above is the text summary of the precision, recall and F1 score.

* Precision is the ability of the classifier to not label a sample that is negative as positve
* Recall is the ability of the classifier to fin positive samples
* F1 score is the weigthed mean of the precision and recall.

comparatively, we see that gradient boosting has a better accuracy and recall compare to adaBoost.