# Association Rules

### Import library required
Library yang digunakan adalah **pandas dan mlxtend**. Silahkan install terlebih dahulu jika belum menginstallnya dengan perintah `pip install nama-library`.

In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

### Load Data
Data yang digunakan adalah data yang dimasukkan secara manual. Jika ingin menggunakan data dari file csv atau excel silahkan pakai perintah `pd.read_csv()` atau `pd.read_excel()`

In [3]:
data = [
    'Broccoli, Green Peppers, Corn',
    'Asparagus, Squash, Corn',
    'Corn, Tomatoes, Beans, Squash',
    'Green Peppers, Corn, Tomatoes, Beans',
    'Beans, Asparagus, Broccoli',
    'Squash, Asparagus, Beans, Tomatoes',
    'Tomatoes, Corn',
    'Broccoli, Tomatoes, Green Peppers',
    'Squash, Asparagus, Beans',
    'Beans, Corn',
    'Green Peppers, Broccoli, Beans, Squash',
    'Asparagus, Beans, Squash',
    'Squash, Corn, Asparagus, Beans',
    'Corn, Green Peppers, Tomatoes, Beans, Broccoli'
]
data

['Broccoli, Green Peppers, Corn',
 'Asparagus, Squash, Corn',
 'Corn, Tomatoes, Beans, Squash',
 'Green Peppers, Corn, Tomatoes, Beans',
 'Beans, Asparagus, Broccoli',
 'Squash, Asparagus, Beans, Tomatoes',
 'Tomatoes, Corn',
 'Broccoli, Tomatoes, Green Peppers',
 'Squash, Asparagus, Beans',
 'Beans, Corn',
 'Green Peppers, Broccoli, Beans, Squash',
 'Asparagus, Beans, Squash',
 'Squash, Corn, Asparagus, Beans',
 'Corn, Green Peppers, Tomatoes, Beans, Broccoli']

### Olah Data
Rubah data menjadi satu baris per item

In [4]:
lst = list()
for i, items in zip(range(len(data)), data):
    for item in items.split(', '):
        lst.append([i+1, item, 1])
        
lst = pd.DataFrame(lst).rename(columns = {0 : 'ID', 1 : 'Item', 2 : 'Quantity'})
lst

Unnamed: 0,ID,Item,Quantity
0,1,Broccoli,1
1,1,Green Peppers,1
2,1,Corn,1
3,2,Asparagus,1
4,2,Squash,1
5,2,Corn,1
6,3,Corn,1
7,3,Tomatoes,1
8,3,Beans,1
9,3,Squash,1


In [5]:
bucket = (lst.groupby(['ID', 'Item'])['Quantity']
          .sum()
          .unstack()
          .reset_index()
          .fillna(0)
          .set_index('ID')
          .applymap(lambda x : 1 if x > 0 else 0))
bucket

Item,Asparagus,Beans,Broccoli,Corn,Green Peppers,Squash,Tomatoes
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,0,0,1,1,1,0,0
2,1,0,0,1,0,1,0
3,0,1,0,1,0,1,1
4,0,1,0,1,1,0,1
5,1,1,1,0,0,0,0
6,1,1,0,0,0,1,1
7,0,0,0,1,0,0,1
8,0,0,1,0,1,0,1
9,1,1,0,0,0,1,0
10,0,1,0,1,0,0,0


Sebelum data dimasukkan dalam algoritma apriori, data sudah harus berbentuk seperti di tabel di atas

### Frquent Itemsets
Misalnya kita ingin menggunakan  Minimal Support 30%

In [7]:
frequent_itemsets = apriori(bucket, min_support=0.30, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.428571,(Asparagus)
1,0.714286,(Beans)
2,0.357143,(Broccoli)
3,0.571429,(Corn)
4,0.357143,(Green Peppers)
5,0.5,(Squash)
6,0.428571,(Tomatoes)
7,0.357143,"(Asparagus, Beans)"
8,0.357143,"(Asparagus, Squash)"
9,0.357143,"(Beans, Corn)"


### Make Rules
Misalnya kita ingin membuat rules berdasarkan nilai minimal confidence 70%

In [8]:
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Asparagus),(Beans),0.428571,0.714286,0.357143,0.833333,1.166667,0.05102,1.714286
1,(Asparagus),(Squash),0.428571,0.5,0.357143,0.833333,1.666667,0.142857,3.0
2,(Squash),(Asparagus),0.5,0.428571,0.357143,0.714286,1.666667,0.142857,2.0
3,(Squash),(Beans),0.5,0.714286,0.428571,0.857143,1.2,0.071429,2.0
