# Discovering customer attrition patterns

We analyze customer attrition data to discover patterns. These will help us dive deeper into those patterns and do root cause analysis of why they are happening. We will use association rules mining algorithm for this purpose.

## How to use Pip from the Jupyter Notebook
If you're using the Jupyter notebook and want to install a package with pip, you similarly might be inclined to run pip directly in the shell

In [22]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25ldone
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=dba1e55b3a944199fc7f87ad42418753518e44abdd831e6d29e8c7db00a3318b
  Stored in directory: /Users/kevilkhadka/Library/Caches/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


## Load the Dataset and Transform
We first load the data ("attrition.csv") and view it.

In [23]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from apyori import apriori

#Load the prospect dataset
raw_data = pd.read_csv("attrition.csv")
raw_data.head()

Unnamed: 0,LIFETIME,TYPE,REASON,AGE_GROUP,EMP_STATUS,MARITAL_STATUS,RENEWALS,PROBLEMS,OFFERS
0,1 - 3 M,CANCEL,BETTER DEALS,< 20,STUDENT,SINGLE,0,0 to 5,0 to 2
1,1 - 3 M,CANCEL,BETTER DEALS,< 20,STUDENT,SINGLE,0,0 to 5,0 to 2
2,1Y - 2Y,CANCEL,NOT HAPPY,30 - 50,EMPLOYED,MARRIED,1,10 plus,0 to 2
3,1Y - 2Y,EXPIRY,BETTER DEALS,30 - 50,EMPLOYED,MARRIED,1,0 to 5,2 to 5
4,1Y - 2Y,CANCEL,NOT HAPPY,30 - 50,UNEMPLOYED,SINGLE,1,10 plus,0 to 2


The CSV contains information about each customer who have left the business. It contains attributes like LIFETIME of the customer, How the customer left, reasons, problems and demographics.

For doing association rules mining, the data needs to be in a specific format. Each line should be a transaction with a list of items for that transaction. We will take the CSV file data convert them into values like "name = value" to create this specific data structure

In [26]:
basket_str = ""

for rowNum, row in raw_data.iterrows():
    
    #Break lines
    if (rowNum != 0):
        basket_str = basket_str + "\n"
        
    #Add the rowid as the first column
    basket_str = basket_str + str(rowNum) 
    
    #Add columns
    for colName, col in row.iteritems():
        basket_str = basket_str + ",\"" + colName + "=" + str(col) +"\""

#print(basket_str)
basket_file = open("warranty_basket.csv", "w")
basket_file.write(basket_str)
basket_file.close()

## Build Association Rules
We now use the apriori algorithm to build association rules. We then extract the results and populate a data frame for future use.

The apriori provides the LHS for multiple combinations of the items. We capture the counts along with confidence and lift in this example.

In [27]:
ilt_data = basket_data.drop(basket_data.columns[[0]], axis=1)
results = list(apriori(filt_data.as_matrix()))

rulesList = pd.DataFrame(columns = ('LHS', 'RHS', 'COUNT', 'CONFIDENCE','LIFT'))
rowCount = 0

#Convert results into a Data Frame
for row in results:
    for affinity in row[2]:
        rulesList.loc[rowCount] = [ ', '.join(affinity.items_base) ,\
                                    affinity.items_add, \
                                    len(affinity.items_base) ,\
                                    affinity.confidence,\
                                    affinity.lift]
        rowCount += 1

NameError: name 'basket_data' is not defined

## Using the Rules
We can take a look at the rules by simply doing a head.

In [28]:
rulesList.head()

NameError: name 'rulesList' is not defined

We can also filter rules where the count of elements is 1 and the confidence is > 70%

In [29]:
rulesList[(rulesList.COUNT <= 1) & (rulesList.CONFIDENCE > 0.7)].head(5)

NameError: name 'rulesList' is not defined

Looking at the rules, we can easily see some patterns. Customers who have left the business between 3 months and 1 year are always in the age group 20-30. Similarly, customers in age group 20-30 always cancelled the service. These are interesting facts that can be analyzed further by the business.