## Implementing Apriori Algorithm with Python

In this section we will use the Apriori algorithm to find rules that describe associations between attributes of census data with 30162 rowsand 12 attributes. 

Another interesting point is that we do not need to write the script to calculate support, confidence, and lift for all the possible combination of items. We will use an off-the-shelf library where all of the code has already been implemented.

The library apyori. Use the following command in your environment: pip install apyori

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as mlt
from apyori import apriori

In [2]:
census_data = pd.read_csv("D:\Data Science\Internship\Assignment\Problem\Problem1\census.csv", header = None)
num_records = len(census_data)
print(num_records)

30162


We can transform it into the right format i.e. a DF

In [4]:
records = []
for i in range(0, num_records):
    records.append([str(census_data.values[i,j]) for j in range(0, 12)])

In [5]:
records[0]

['age=Middle-aged',
 'sex=Male',
 'education=Bachelors',
 'native-country=United-States',
 'race=White',
 'marital-status=Never-married',
 'workclass=State-gov',
 'occupation=Adm-clerical',
 'hours-per-week=Full-time',
 'income=Small',
 'capital-gain=Low',
 'capital-loss=None']

In [12]:
association_rules = apriori(records, min_support = 0.3, min_confidence = 0.2, min_lift = 1, min_length = 2, max_length = 2)
association_results = list(association_rules)

In [13]:
print(len(association_results))

37


In [14]:
print(association_results[0])

RelationRecord(items=frozenset({'age=Middle-aged'}), support=0.5223459982759764, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'age=Middle-aged'}), confidence=0.5223459982759764, lift=1.0)])


In [10]:
results = []

for item in association_results:
    pair = item[0]
    items = [x for x in pair]
    
    value0 = str(items[0])
    value1 = str(items[1])
    
    value2 = str(item[1]) #to convert into object
    
    value3 = str(item[2][0][2])
    value4 = str(item[2][0][3])
    
    rows = (value0, value1, value2, value3, value4)
    results.append(rows)
    
labels = ['Title 1', 'Title 2', 'Support', 'Confidence', 'Lift']

census = pd.DataFrame.from_records(results, columns = labels)

IndexError: list index out of range

In [26]:
census

Unnamed: 0,Title 1,Title 2,Support,Confidence,Lift
0,hours-per-week=Full-time,age=Middle-aged,0.31005901465420066,0.5935893367185021,1.0112879334672087
1,sex=Male,age=Middle-aged,0.3638684437371527,0.6966042526182165,1.0309606215638196
2,workclass=Private,age=Middle-aged,0.3887673231218089,0.7442716597905427,1.007301525738237
3,education=HS-grad,capital-gain=None,0.30498640673695376,0.33300752968433245,1.0207492998311825
4,hours-per-week=Full-time,capital-gain=None,0.5445925336516146,0.5946278598320301,1.0130572474160469
5,income=Small,capital-gain=None,0.7198130097473643,0.7859470026064292,1.0464259509408986
6,marital-status=Never-married,capital-gain=None,0.30833499104833895,0.3366637706342311,1.0440522979508202
7,sex=Female,capital-gain=None,0.3057158013394337,0.33380393860411234,1.0292572476157469
8,workclass=Private,capital-gain=None,0.6824149592202109,0.745112945264987,1.0084401263161866
9,education=HS-grad,capital-loss=None,0.3136396790663749,0.3292152427353402,1.0091250153844848


<b>Sorting dataframe by Confidence

In [10]:
census.sort_values('Confidence', ascending = False, inplace = True)

In [11]:
census

Unnamed: 0,Title 1,Title 2,Support,Confidence,Lift
15,education=HS-grad,native-country=United-States,0.305317949738081,0.9358739837398374,1.0263173028490755
20,race=White,marital-status=Married-civ-spouse,0.4196008222266428,0.8998222538215428,1.046559935979847
21,sex=Male,marital-status=Married-civ-spouse,0.4172468669186393,0.8947742623533593,1.3242483464721306
22,race=White,native-country=United-States,0.8029308401299649,0.8805264688772542,1.024117508744678
16,income=Small,hours-per-week=Full-time,0.4690007293946024,0.7990284681427926,1.0638428823220143
5,income=Small,capital-gain=None,0.7198130097473643,0.7859470026064292,1.0464259509408986
19,workclass=Private,income=Small,0.5772163649625356,0.7685177010682439,1.0401162568258266
11,income=Small,capital-loss=None,0.7282010476758836,0.7643640160083522,1.0176899201396628
17,workclass=Private,hours-per-week=Full-time,0.4471520456203169,0.7618052417532761,1.0310315759563096
8,workclass=Private,capital-gain=None,0.6824149592202109,0.745112945264987,1.0084401263161866


<b>Resetting index

In [12]:
census.reset_index(drop = True, inplace = True)
census

Unnamed: 0,Title 1,Title 2,Support,Confidence,Lift
0,education=HS-grad,native-country=United-States,0.305317949738081,0.9358739837398374,1.0263173028490755
1,race=White,marital-status=Married-civ-spouse,0.4196008222266428,0.8998222538215428,1.046559935979847
2,sex=Male,marital-status=Married-civ-spouse,0.4172468669186393,0.8947742623533593,1.3242483464721306
3,race=White,native-country=United-States,0.8029308401299649,0.8805264688772542,1.024117508744678
4,income=Small,hours-per-week=Full-time,0.4690007293946024,0.7990284681427926,1.0638428823220143
5,income=Small,capital-gain=None,0.7198130097473643,0.7859470026064292,1.0464259509408986
6,workclass=Private,income=Small,0.5772163649625356,0.7685177010682439,1.0401162568258266
7,income=Small,capital-loss=None,0.7282010476758836,0.7643640160083522,1.0176899201396628
8,workclass=Private,hours-per-week=Full-time,0.4471520456203169,0.7618052417532761,1.0310315759563096
9,workclass=Private,capital-gain=None,0.6824149592202109,0.745112945264987,1.0084401263161866


In [18]:
def arrangingRules(rules):
    for i in rules.index:
        if (rules.Confidence[i] >= '0.7$$$$'):
            print(rules)

In [19]:
arrangingRules(census)

                         Title 1                            Title 2  \
0              education=HS-grad       native-country=United-States   
1                     race=White  marital-status=Married-civ-spouse   
2                       sex=Male  marital-status=Married-civ-spouse   
3                     race=White       native-country=United-States   
4                   income=Small           hours-per-week=Full-time   
5                   income=Small                  capital-gain=None   
6              workclass=Private                       income=Small   
7                   income=Small                  capital-loss=None   
8              workclass=Private           hours-per-week=Full-time   
9              workclass=Private                  capital-gain=None   
10             workclass=Private                    age=Middle-aged   
11             workclass=Private                  capital-loss=None   
12                      sex=Male                    age=Middle-aged   
13    

In [17]:
census['Confidence']

0      0.9358739837398374
1      0.8998222538215428
2      0.8947742623533593
3      0.8805264688772542
4      0.7990284681427926
5      0.7859470026064292
6      0.7685177010682439
7      0.7643640160083522
8      0.7618052417532761
9       0.745112945264987
10     0.7442716597905427
11      0.742161127544806
12     0.6966042526182165
13     0.5946278598320301
14     0.5935893367185021
15     0.5905689925178355
16     0.4085812660015891
17     0.3366637706342311
18    0.33380393860411234
19    0.33300752968433245
20     0.3292152427353402
21     0.3286932312510875
22     0.3285540281886202
Name: Confidence, dtype: object