<a href="https://colab.research.google.com/github/eduardodacostasoares/Data_Science/blob/master/Association_rules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**ASSOCIATION RULES**#

###Calculation of frequents *itemsets*, using the *Apriori* algorithm with the *mlxt* package.###

In [None]:
! pip install mlxtend
! pip install xlrd



##**Association rules created using frequent itemsets (Only for fun :D )**##

Frequent *itemset*  (support >= 0.6)

In [None]:
ssociation_rules

***TransactionEncoder*** - Encodes database transaction data in form of a Python list of lists into a NumPy array.

***apriori*** - Apriori is a popular algorithm for extracting frequent itemsets.

******

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
pd.set_option('precision', 2)

dataset = [['Milk', 'Onion', 'Potato', 'Bean', 'Eggs', 'Yogurt'],
           ['Rice', 'Onion', 'Potato', 'Bean', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Bean', 'Eggs'],
           ['Milk', 'Corn', 'Bean', 'Yogurt'],
           ['Corn', 'Onion', 'Bean', 'Ice Cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = apriori(df, min_support = 0.6, use_colnames = True)

print(frequent_itemsets)

    support             itemsets
0       1.0               (Bean)
1       0.8               (Eggs)
2       0.6               (Milk)
3       0.6              (Onion)
4       0.6             (Yogurt)
5       0.8         (Bean, Eggs)
6       0.6         (Bean, Milk)
7       0.6        (Bean, Onion)
8       0.6       (Bean, Yogurt)
9       0.6        (Eggs, Onion)
10      0.6  (Bean, Eggs, Onion)


##**Association rules learning**##

Association rules with **minimum confidence** equal to **0.7**

In [7]:
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Bean),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0
1,(Eggs),(Bean),0.8,1.0,0.8,1.0,1.0,0.0,inf
2,(Milk),(Bean),0.6,1.0,0.6,1.0,1.0,0.0,inf
3,(Onion),(Bean),0.6,1.0,0.6,1.0,1.0,0.0,inf
4,(Yogurt),(Bean),0.6,1.0,0.6,1.0,1.0,0.0,inf
5,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
6,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
7,"(Bean, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
8,"(Bean, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
9,"(Eggs, Onion)",(Bean),0.6,1.0,0.6,1.0,1.0,0.0,inf


Association rules with **minimum lift** equal to **1.2**

The lift value of an association rule is the ratio of the confidence of the rule and the expected confidence of the rule.

In [8]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
print(rules)

     antecedents    consequents  antecedent support  consequent support  \
0         (Eggs)        (Onion)                 0.8                 0.6   
1        (Onion)         (Eggs)                 0.6                 0.8   
2   (Bean, Eggs)        (Onion)                 0.8                 0.6   
3  (Bean, Onion)         (Eggs)                 0.6                 0.8   
4         (Eggs)  (Bean, Onion)                 0.8                 0.6   
5        (Onion)   (Bean, Eggs)                 0.6                 0.8   

   support  confidence  lift  leverage  conviction  
0      0.6        0.75  1.25      0.12         1.6  
1      0.6        1.00  1.25      0.12         inf  
2      0.6        0.75  1.25      0.12         1.6  
3      0.6        1.00  1.25      0.12         inf  
4      0.6        0.75  1.25      0.12         1.6  
5      0.6        1.00  1.25      0.12         inf  


In [10]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
print(rules)

     antecedents    consequents  antecedent support  consequent support  \
0         (Eggs)        (Onion)                 0.8                 0.6   
1        (Onion)         (Eggs)                 0.6                 0.8   
2   (Bean, Eggs)        (Onion)                 0.8                 0.6   
3  (Bean, Onion)         (Eggs)                 0.6                 0.8   
4         (Eggs)  (Bean, Onion)                 0.8                 0.6   
5        (Onion)   (Bean, Eggs)                 0.6                 0.8   

   support  confidence  lift  leverage  conviction  antecedent_len  
0      0.6        0.75  1.25      0.12         1.6               1  
1      0.6        1.00  1.25      0.12         inf               1  
2      0.6        0.75  1.25      0.12         1.6               2  
3      0.6        1.00  1.25      0.12         inf               2  
4      0.6        0.75  1.25      0.12         1.6               1  
5      0.6        1.00  1.25      0.12         inf          

Slicing the dataframe to show only the rules with antecedent length equal or greater than **2**, confidence greater than **0.75** and lift equal or greater than **1.2**

In [11]:
rules[   (rules['antecedent_len'] >= 2) &
         (rules['confidence'] >0.75) &
         (rules['lift'])]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
3,"(Bean, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2


Slicing to show only rules that the antecedents are **Eggs** **Beans**.

In [13]:
rules[rules['antecedents'] == {'Eggs', "Bean"}]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
2,"(Bean, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,2
