# Discovering customer attrition patterns

We analyze customer attrition data to discover patterns. These will help us dive deeper into those patterns and do root cause analysis of why they are happening. We will use association rules mining algorithm for this purpose.

## How to use Pip from the Jupyter Notebook
If you're using the Jupyter notebook and want to install a package with pip, you similarly might be inclined to run pip directly in the shell

In [40]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install apyori



In [47]:
pip install -U pandas-profiling

Collecting pandas-profiling
  Downloading pandas_profiling-2.8.0-py2.py3-none-any.whl (259 kB)
[K     |████████████████████████████████| 259 kB 6.8 MB/s eta 0:00:01
[?25hCollecting visions[type_image_path]==0.4.4
  Downloading visions-0.4.4-py3-none-any.whl (59 kB)
[K     |████████████████████████████████| 59 kB 6.0 MB/s  eta 0:00:01
[?25hCollecting requests>=2.23.0
  Downloading requests-2.23.0-py2.py3-none-any.whl (58 kB)
[K     |████████████████████████████████| 58 kB 6.3 MB/s  eta 0:00:01
Collecting phik>=0.9.10
  Downloading phik-0.10.0-py3-none-any.whl (599 kB)
[K     |████████████████████████████████| 599 kB 10.2 MB/s eta 0:00:01
Collecting tqdm>=4.43.0
  Downloading tqdm-4.46.0-py2.py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 2.8 MB/s  eta 0:00:01
Collecting missingno>=0.4.2
  Downloading missingno-0.4.2-py3-none-any.whl (9.7 kB)
Collecting tangled-up-in-unicode>=0.0.6
  Downloading tangled_up_in_unicode-0.0.6-py3-none-any.whl (3.1 MB)
[K    

Building wheels for collected packages: htmlmin, confuse, imagehash
  Building wheel for htmlmin (setup.py) ... [?25ldone
[?25h  Created wheel for htmlmin: filename=htmlmin-0.1.12-py3-none-any.whl size=27084 sha256=88543d513697d913bc7dcf12b82ebf9035c72db932f4ba901d9194fe994b05fe
  Stored in directory: /Users/kevilkhadka/Library/Caches/pip/wheels/70/e1/52/5b14d250ba868768823940c3229e9950d201a26d0bd3ee8655
  Building wheel for confuse (setup.py) ... [?25ldone
[?25h  Created wheel for confuse: filename=confuse-1.1.0-py3-none-any.whl size=17573 sha256=1f4c78768880f709e68c7c50fb5700fc270acc95aab3855d528420aece60b797
  Stored in directory: /Users/kevilkhadka/Library/Caches/pip/wheels/18/e0/b3/79594ba4a96afaf41d39916ed83ce7d5ec031874c66a76eabb
  Building wheel for imagehash (setup.py) ... [?25ldone
[?25h  Created wheel for imagehash: filename=ImageHash-4.1.0-py2.py3-none-any.whl size=291990 sha256=439b4e17497dac98b47879936b67e232a6cd57b5e692662b9d59d73f5f52ef0b
  Stored in directory: /U

## Load the Dataset and Transform
We first load the data ("attrition.csv") and view it.

In [48]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from apyori import apriori

#Load the prospect dataset
raw_data = pd.read_csv("attrition.csv")
raw_data.head()

Unnamed: 0,LIFETIME,TYPE,REASON,AGE_GROUP,EMP_STATUS,MARITAL_STATUS,RENEWALS,PROBLEMS,OFFERS
0,1 - 3 M,CANCEL,BETTER DEALS,< 20,STUDENT,SINGLE,0,0 to 5,0 to 2
1,1 - 3 M,CANCEL,BETTER DEALS,< 20,STUDENT,SINGLE,0,0 to 5,0 to 2
2,1Y - 2Y,CANCEL,NOT HAPPY,30 - 50,EMPLOYED,MARRIED,1,10 plus,0 to 2
3,1Y - 2Y,EXPIRY,BETTER DEALS,30 - 50,EMPLOYED,MARRIED,1,0 to 5,2 to 5
4,1Y - 2Y,CANCEL,NOT HAPPY,30 - 50,UNEMPLOYED,SINGLE,1,10 plus,0 to 2


The CSV contains information about each customer who have left the business. It contains attributes like LIFETIME of the customer, How the customer left, reasons, problems and demographics.

For doing association rules mining, the data needs to be in a specific format. Each line should be a transaction with a list of items for that transaction. We will take the CSV file data convert them into values like "name = value" to create this specific data structure

In [53]:
basket_str = ""

for rowNum, row in raw_data.iterrows():
    
    #Break lines
    if (rowNum != 0):
        basket_str = basket_str + "\n"
        
    #Add the rowid as the first column
    basket_str = basket_str + str(rowNum) 
    
    #Add columns
    for colName, col in row.iteritems():
        basket_str = basket_str + ",\"" + colName + "=" + str(col) +"\""

#print(basket_str)
basket_file = open("warranty_basket.csv", "w")
basket_file.write(basket_str)
basket_file.close()

## Build Association Rules
We now use the apriori algorithm to build association rules. We then extract the results and populate a data frame for future use.

The apriori provides the LHS for multiple combinations of the items. We capture the counts along with confidence and lift in this example.

In [57]:
# read back
basket_data = pd.read_csv("warranty_basket.csv",header = None)
filt_data = basket_data.drop(basket_data.columns[[0]], axis = 1)
results = list(apriori(filt_data.as_matrix()))

rulesList_1 = pd.DataFrame(columns = ('LHS', 'RHS', 'COUNT', 'CONFIDENCE','LIFT'))
rulesList = rulesList.as_matrix()

rowCount = 0

#Convert results into a Data Frame
for row in results:
    for affinity in row[2]:
        rulesList.loc[rowCount] = [', '.join(affinity.items_base),\
                                    affinity.items_add,\
                                    len(affinity.items_base),\
                                    affinity.confidence,\
                                    affinity.lift]
        rowCount += 1

AttributeError: 'DataFrame' object has no attribute 'as_matrix'

## Using the Rules
We can take a look at the rules by simply doing a head.

In [58]:
rulesList.head()

NameError: name 'rulesList' is not defined

We can also filter rules where the count of elements is 1 and the confidence is > 70%

In [59]:
rulesList[(rulesList.COUNT <= 1) & (rulesList.CONFIDENCE > 0.7)].head(5)

NameError: name 'rulesList' is not defined

Looking at the rules, we can easily see some patterns. Customers who have left the business between 3 months and 1 year are always in the age group 20-30. Similarly, customers in age group 20-30 always cancelled the service. These are interesting facts that can be analyzed further by the business.