## 1.IMPORTING LIBRARIES

    *Numpy for linear algebra and data manipulation.
    *Matplotlib for visualising data.
    *Pandas for loading and analyzing data.
    *Sklearn for normalizing the data and creating the model.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

## 2.LOADING AND ANALYZING THE DATA

    *Loading the customer_dataset.csv dataset.
    *The dataset is composed of demographic features and churn data.
    *Analyzing the data using pandas.

In [2]:
df = pd.read_csv("customer_data.csv")

df.head()

Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,longmon,...,pager,internet,callwait,confer,ebill,loglong,logtoll,lninc,custcat,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,4.4,...,1.0,0.0,1.0,1.0,0.0,1.482,3.033,4.913,4.0,1.0
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,9.45,...,0.0,0.0,0.0,0.0,0.0,2.246,3.24,3.497,1.0,1.0
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,6.3,...,0.0,0.0,0.0,1.0,0.0,1.841,3.24,3.401,3.0,0.0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,6.05,...,1.0,1.0,1.0,1.0,1.0,1.8,3.807,4.331,4.0,0.0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,7.1,...,0.0,0.0,1.0,1.0,0.0,1.96,3.091,4.382,3.0,0.0


We can pick the most appropriate features for independent variables.

In [6]:
df = df[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',   'callcard', 'wireless','churn']]
df['churn'] = df['churn'].astype('int64')

df.head()

Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,1
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,1
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,0


In [7]:
print("There are {} customers and {} features.".format(df.shape[0], df.shape[1]))

There are 200 customers and 10 features.


## 3.PREPARING AND SPLITTING DATA

    *Converting type of the data
    *Normalizing the data
    *Splitting train and test sets.

In [16]:
X = np.asanyarray(df[["tenure", "age", "address", "income", "ed", "employ", "equip"]])
y = np.asanyarray(df["churn"])

#Let's transform the independent variable matrix.
print("Before normalization:\n{}\n".format(X[:3]))

X = sklearn.preprocessing.StandardScaler().fit(X).transform(X)
print("After normalization:\n{}\n".format(X[:3]))

Before normalization:
[[ 11.  33.   7. 136.   5.   5.   0.]
 [ 33.  33.  12.  33.   2.   0.   0.]
 [ 23.  30.   9.  30.   1.   2.   0.]]

After normalization:
[[-1.13518441 -0.62595491 -0.4588971   0.4751423   1.6961288  -0.58477841
  -0.85972695]
 [-0.11604313 -0.62595491  0.03454064 -0.32886061 -0.6433592  -1.14437497
  -0.85972695]
 [-0.57928917 -0.85594447 -0.261522   -0.35227817 -1.42318853 -0.92053635
  -0.85972695]]



In [21]:
training_X, test_X, training_y, test_y = sklearn.model_selection.train_test_split(X, y, test_size=0.25)

print("Training set: {}\nTest set: {}".format((training_X.shape, training_y.shape), (test_X.shape, test_y.shape)))

Training set: ((150, 7), (150,))
Test set: ((50, 7), (50,))


## 4.CREATING and EVALUATING the MODEL

    *Creating the model using sklearn.
    *Logistic regression algorithm is used for predicting if the customer will leave or not.
    *Getting the binary and probabilistic predictions.
    *Analyzing the model with precision, recall, and F1 scores.

In [34]:
model = sklearn.linear_model.LogisticRegression(C=0.01, solver="liblinear").fit(training_X, training_y)

#Getting the binary predictions.
y_hat = model.predict(test_X)
print("Binary predictions\n{}".format(y_hat[:3]))

#Getting predictions as probabilities.
y_hat_prob = model.predict_proba(test_X)
print("Probability of predictions:\n{}".format(y_hat_prob[:3]))

Binary predictions
[0 1 0]
Probability of predictions:
[[0.56342024 0.43657976]
 [0.40829513 0.59170487]
 [0.52043767 0.47956233]]


In [36]:
print("Classification report of the model:\n{}".format(sklearn.metrics.classification_report(test_y, y_hat)))

Classification report of the model:
              precision    recall  f1-score   support

           0       0.94      0.78      0.85        40
           1       0.47      0.80      0.59        10

    accuracy                           0.78        50
   macro avg       0.70      0.79      0.72        50
weighted avg       0.85      0.78      0.80        50



5.CONCLUSION

    *The general accuracy is 78%.
    *The weighted average is 80%.
    *The model can be enhanced with different solvers and parameter configuration.