<font color="#CC3D3D"><p>
# Exercise: Association Rule Mining with Pycaret

### 1. Getting the Data

In [1]:
# For this exercise we will use a small sample from UCI dataset called Online Retail Dataset. 
# This is a transactional dataset which contains transactions occurring between 01/12/2010 and 09/12/2011 
# for a UK-based and registered non-store online retail. 
# The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. 

from pycaret.datasets import get_data
data = get_data('france')

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536370,22728,ALARM CLOCK BAKELIKE PINK,24,12/1/2010 8:45,3.75,12583.0,France
1,536370,22727,ALARM CLOCK BAKELIKE RED,24,12/1/2010 8:45,3.75,12583.0,France
2,536370,22726,ALARM CLOCK BAKELIKE GREEN,12,12/1/2010 8:45,3.75,12583.0,France
3,536370,21724,PANDA AND BUNNIES STICKER SHEET,12,12/1/2010 8:45,0.85,12583.0,France
4,536370,21883,STARS GIFT TAPE,24,12/1/2010 8:45,0.65,12583.0,France


In [2]:
data.shape

(8557, 8)

### 2. Setting up Environment in PyCaret

`setup()` function initializes the environment in PyCaret and transforms the transactional dataset into a shape that is acceptable to Apriori algorithm. It requires three mandatory parameters: pandas dataframe, `transaction_id` which is the name of column representing transaction id and will be used to pivot the matrix; and `item_id` which is the name of the column used for creation of rules. Normally, this will be the variable of interest. You can also pass an optional parameter `ignore_items` to ignore certain values for creation of rule.

In [3]:
from pycaret.arules import *

In [4]:
from pycaret.arules import *
from pycaret.arules import *
exp_arm = setup(data = data, 
                    transaction_id = 'InvoiceNo',
                    item_id = 'Description') 

Description,Value
session_id,5405.0
# Transactions,461.0
# Items,1565.0
Ignore Items,


### 3. Create a Model

Creating an association rule model is simple. `create_model()` requires no mandatory parameters. It has 4 optional parameters which are as follows:

- **metric:** Metric to evaluate if a rule is of interest. Default is set to confidence. Other available metrics include 'support', 'lift', 'leverage', 'conviction'.
- **threshold:** Minimal threshold for the evaluation metric, via the `metric` parameter, to decide whether a candidate rule is of interest. Default is set to `0.5`.
- **min_support:** A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction `transactions_where_item(s)_occur / total_transactions`. Default is set to `0.05`.
- **round:** Number of decimal places metrics in score grid will be rounded to.

In [5]:
model1 = create_model() 

In [6]:
print(model1.shape) 

(141, 9)


In [7]:
model1.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(JUMBO BAG WOODLAND ANIMALS),(POSTAGE),0.0651,0.6746,0.0651,1.0,1.4823,0.0212,inf
1,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.0868,0.1085,0.0846,0.975,8.9895,0.0752,35.6616
2,"(SET/20 RED RETROSPOT PAPER NAPKINS , SET/6 RE...",(SET/6 RED SPOTTY PAPER CUPS),0.0868,0.1171,0.0846,0.975,8.3236,0.0744,35.3145
3,"(SET/20 RED RETROSPOT PAPER NAPKINS , POSTAGE,...",(SET/6 RED SPOTTY PAPER CUPS),0.0716,0.1171,0.0694,0.9697,8.2783,0.061,29.1345
4,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.0716,0.1085,0.0694,0.9697,8.9406,0.0617,29.4208


#### Setup with `ignore_items`

In [8]:
exp_aru = setup(data = data, 
                    transaction_id = 'InvoiceNo',
                    item_id = 'Description',
                    ignore_items = ['POSTAGE'])

Description,Value
session_id,1696
# Transactions,461
# Items,1565
Ignore Items,['POSTAGE']


In [9]:
model2 = create_model()

In [10]:
print(model2.shape)

(45, 9)


In [11]:
model2.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,"(SET/20 RED RETROSPOT PAPER NAPKINS , SET/6 RE...",(SET/6 RED SPOTTY PAPER CUPS),0.0868,0.1171,0.0846,0.975,8.3236,0.0744,35.3145
1,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.0868,0.1085,0.0846,0.975,8.9895,0.0752,35.6616
2,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),0.1085,0.1171,0.1041,0.96,8.1956,0.0914,22.0716
3,(CHILDRENS CUTLERY SPACEBOY ),(CHILDRENS CUTLERY DOLLY GIRL ),0.0586,0.0629,0.0542,0.9259,14.719,0.0505,12.6508
4,(SET/6 RED SPOTTY PAPER CUPS),(SET/6 RED SPOTTY PAPER PLATES),0.1171,0.1085,0.1041,0.8889,8.1956,0.0914,8.0239


### 4. Plot Model

In [12]:
plot_model(model2)

In [13]:
plot_model(model2, plot = '3d')