# Import the Libraries

pip install apyori
Once your installation is done, we need to perform some data preprocessing on the bank dataset. Firstly, load the data set using pandas.


In [1]:
!pip install apyori



In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame
from apyori import apriori

# Importing the Dataset

In [0]:
df = pd.read_csv('lab2.csv')

In [4]:
df.head(10)

Unnamed: 0,ACCOUNT,SERVICE,VISIT
0,500026,CKING,1
1,500026,SVG,2
2,500026,ATM,3
3,500026,ATM,4
4,500075,CKING,1
5,500075,MMDA,2
6,500075,SVG,3
7,500075,ATM,4
8,500075,TRUST,5
9,500075,TRUST,6


The lab2 data set has over 32,000 rows. Each row of the data set represents a customer-service combination. Therefore, a single customer can have multiple rows in the data set, and each row represents one of the products he or she owns. The median number of products per customer is three. 

In [0]:
user= pd.Series.unique(df.ACCOUNT).shape[0]
item= pd.Series.unique(df.SERVICE).shape[0]

In [6]:
print('Number of total records='+str(len(df)))
print('Number of users='+str(user)+' | Number of Product='+str(item))

Number of total records=32367
Number of users=7991 | Number of Product=13


# Data Proprocessing

The Apriori library we are going to use requires our dataset to be in the form of a list of lists

As we are looking to generate association rules from items purchased by each account holder, we need to group our accounts and then generate list of all services purchased.

In [7]:
transactions = df.groupby(['ACCOUNT'])['SERVICE'].apply(list)
mylist = transactions.values.tolist()

#first 3
print(mylist[:3])

[['CKING', 'SVG', 'ATM', 'ATM'], ['CKING', 'MMDA', 'SVG', 'ATM', 'TRUST', 'TRUST'], ['CKING', 'SVG', 'IRA', 'ATM', 'ATM']]


In [8]:
records =(df.groupby(['ACCOUNT','SERVICE']).size().unstack().reset_index().fillna(0).set_index('ACCOUNT')) 
records.iloc[0:10,:]

SERVICE,ATM,AUTO,CCRD,CD,CKCRD,CKING,HMEQLC,IRA,MMDA,MTG,PLOAN,SVG,TRUST
ACCOUNT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
500026,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
500075,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,2.0
500129,2.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
500256,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
500341,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
500350,0.0,0.0,0.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
500458,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
500595,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0
500743,0.0,0.0,1.0,0.0,2.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
500744,0.0,0.0,0.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Applying Apriori

Now that the transactions table contains all services purchased by each account number, we are ready to build our association rules. apyori's apriori function accepts a number of arguments, mainly:

* transactions: list of list of items in transactions (eg. [['A', 'B'], ['B', 'C']]).
* min_support: Minimum support of relations in float percentage. Default 0.1.
* min_confidence: Minimum confidence of relations in float percentage. Default 0.0.
* min_lift: Minimum lift of relations in float percentage. Default 0.0.
* max_length: Max length of the relations. Default None.


We will run our apyori model with our transactions and **min_support of 0.05.**

In [0]:
association_rules = apriori(mylist,min_support=0.05)

In [0]:
association_results = list(association_rules)

# Viewing the Results

In [0]:
results = pd.DataFrame(association_results)

In [12]:
results.head()

Unnamed: 0,items,support,ordered_statistics
0,(ATM),0.384558,"[((), (ATM), 0.3845576273307471, 1.0)]"
1,(AUTO),0.092854,"[((), (AUTO), 0.09285446126892755, 1.0)]"
2,(CCRD),0.154799,"[((), (CCRD), 0.154799149042673, 1.0)]"
3,(CD),0.245276,"[((), (CD), 0.24527593542735576, 1.0)]"
4,(CKCRD),0.113002,"[((), (CKCRD), 0.11300212739331748, 1.0)]"


The table contains statistics of support, condense and lift for each of the rules.

Consider the rule A ? B. Recall the following:

* Support of A ? B is the probability that a customer has both A and B.
* Confidence of A ? B is the probability that a customer has B given that the customer has A.
* Expected confidence (not shown here) of A ? B is the probability that a customer has B.
* Lift of A ? B is a measure of strength of the association. If Lift=2 for the rule A=>B, then a customer having A is twice as likely to have B than a customer chosen at random. Lift is the confidence divided by expected confidence.
* In a typical setting, you would like to view the rules by lift. Sort the rules using code.


In [20]:
for i in results:
 
 def pre_confidence(i):
   confidence = i[0].confidence
   return confidence
  
 def pre_lift(i):
   lift = i[0].lift
   return lift
 
 results['Items'] = results ['items']
 results['Support'] = results ['support']
 results['Confidence'] = results['ordered_statistics'].apply(pre_confidence)
 results['Lift'] = results['ordered_statistics'].apply(pre_lift)
 
 output = pd.DataFrame(results)
 output.drop(["items","support","ordered_statistics"], axis = 1, inplace = True) 

output

Unnamed: 0,Items,Support,Confidence,Lift
0,(ATM),0.384558,0.384558,1.0
1,(AUTO),0.092854,0.092854,1.0
2,(CCRD),0.154799,0.154799,1.0
3,(CD),0.245276,0.245276,1.0
4,(CKCRD),0.113002,0.113002,1.0
5,(CKING),0.85784,0.85784,1.0
6,(HMEQLC),0.164685,0.164685,1.0
7,(IRA),0.108372,0.108372,1.0
8,(MMDA),0.174446,0.174446,1.0
9,(MTG),0.074334,0.074334,1.0


In [0]:
output.to_csv("output.csv")

In [0]:
#End of Question#