# Association Rules Mining

Association rule mining is a procedure which aims to observe frequently occurring patterns, correlations, or associations from datasets found in various kinds of databases such as relational databases, transactional databases, and other forms of repositories.

An association rule has 2 parts:

- an antecedent (if) and
- a consequent (then)

An antecedent is something that’s found in data, and a consequent is an item that is found in combination with the antecedent. Have a look at this rule for instance:

 - “If a customer buys bread, he’s 70% likely of buying milk.”

 Association rules are created by thoroughly analyzing data and looking for frequent if/then patterns. Then, depending on the following two parameters, the important relationships are observed:

1. Support: Support indicates how frequently the if/then relationship appears in the database.
2. Confidence: Confidence tells about the number of times these relationships have been found to be true.

In [1]:
from pycaret.datasets import get_data
data = get_data('france')

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536370,22728,ALARM CLOCK BAKELIKE PINK,24,12/1/2010 8:45,3.75,12583.0,France
1,536370,22727,ALARM CLOCK BAKELIKE RED,24,12/1/2010 8:45,3.75,12583.0,France
2,536370,22726,ALARM CLOCK BAKELIKE GREEN,12,12/1/2010 8:45,3.75,12583.0,France
3,536370,21724,PANDA AND BUNNIES STICKER SHEET,12,12/1/2010 8:45,0.85,12583.0,France
4,536370,21883,STARS GIFT TAPE,24,12/1/2010 8:45,0.65,12583.0,France


In [9]:
from pycaret.arules import *
exp_arul101 = setup(data = data, 
                    transaction_id = 'InvoiceNo',
                    item_id = 'Description',ignore_items='POSTAGE') 

Description,Value
session_id,6447
# Transactions,461
# Items,1565
Ignore Items,POSTAGE


setup() function initializes the environment in PyCaret and transforms the transactional dataset into a shape that is acceptable to Apriori algorithm. It requires three mandatory parameters: pandas dataframe, transaction_id which is the name of column representing transaction id and will be used to pivot the matrix; and item_id which is the name of the column used for creation of rules. Normally, this will be the variable of interest. You can also pass an optional parameter ignore_items to ignore certain values for creation of rule.

## Create Model

Creating an association rule model is simple. create_model() requires no mandatory parameters. It has 4 optional parameters which are as follows:

- metric: Metric to evaluate if a rule is of interest. Default is set to confidence. Other available metrics include 'support', 'lift', 'leverage', 'conviction'.

- threshold: Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest. Default is set to 0.5.

- min_support: A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions. Default is set to 0.05.

- round: Number of decimal places metrics in score grid will be rounded to.

Let's create an association rule model with all default values.

In [12]:
model1 = create_model() #model created and stored in model1 variable.
print(model1.shape) #141 rules created.

model1.head() #see the rules

(45, 9)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,"(SET/20 RED RETROSPOT PAPER NAPKINS , SET/6 RE...",(SET/6 RED SPOTTY PAPER CUPS),0.0868,0.1171,0.0846,0.975,8.3236,0.0744,35.3145
1,"(SET/20 RED RETROSPOT PAPER NAPKINS , SET/6 RE...",(SET/6 RED SPOTTY PAPER PLATES),0.0868,0.1085,0.0846,0.975,8.9895,0.0752,35.6616
2,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),0.1085,0.1171,0.1041,0.96,8.1956,0.0914,22.0716
3,(CHILDRENS CUTLERY SPACEBOY ),(CHILDRENS CUTLERY DOLLY GIRL ),0.0586,0.0629,0.0542,0.9259,14.719,0.0505,12.6508
4,(SET/6 RED SPOTTY PAPER CUPS),(SET/6 RED SPOTTY PAPER PLATES),0.1171,0.1085,0.1041,0.8889,8.1956,0.0914,8.0239


In [11]:
plot_model(model1)
plot_model(model1, plot = '3d')