Association Rule Learning is rule-based learning for identifying the association between different variables in a database.

Apriori algorithm finds the most frequent itemsets or elements in a transaction database and identifies association rules between the items just like the above-mentioned example.



To construct association rules between elements or items, the algorithm considers 3 important factors which are, support, confidence and lift

The support of item I is defined as the ratio between the number of transactions containing the item I by the total number of transactions 

Confidence is measured by the proportion of transactions with item I1, in which item I2 also appears. The confidence between two items I1 and I2, in a transaction is defined as the total number of transactions containing both items I1 and I2 divided by the total number of transactions containing I1.

Lift is the ratio between the confidence and support.

We will use the mlxtend library to implement Apriori.

This library wants the dataset in the following format.

    transaction_id    Cake   Ballon   Caps
    1                  0       1       1
    2                  1       0       0
    3                  1       1       1
    4                  0       0       0
   
Where 1 indicates that the item was bought in the particular transaction.

Now we will import our libraries and the dataset.

In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [None]:
dp = pd.read_csv('https://raw.githubusercontent.com/harshit5674/DATA-MINING/main/datasets/apriori1.csv', encoding="ISO-8859-1")
dp.head()

In [None]:
dp['Description'] = dp['Description'].str.strip()

The above statement is to remove the spaces in the description.

In [None]:
#some of transaction quantity is negative which can not be possible remove them.
d = dp[dp.Quantity >0]

In [None]:
table = pd.pivot_table(data=df1,index='InvoiceNo',columns='Description',values='Quantity', aggfunc='sum',fill_value=0)

The above statement converts our dataset into the format described above.

In [None]:
table.head()

You will notice that this would contain quantities of each item in each transaction, according to the above format we just need binary 0 or 1, so we will make that change.

In [None]:
def convert_into_binary(x):
    if x > 0:
        return 1
    else:
        return 0

In [None]:
table = table.applymap(convert_into_binary)

In [None]:
# remove postage item as it is just a seal which almost all transaction contains. 
table.drop(columns=['POSTAGE'],inplace=True)

In [None]:
# call apriori function and pass minimum support here we are passing 4%. 
# means 4 times in total number of transaction the item should be present.
frequent_itemsets = apriori(basket_sets, min_support=0.04, use_colnames=True)

In [None]:
frequent_itemsets

First step in generation of association rules is to get all the frequent itemsets on which binary partitions can be performed to get the antecedent and the consequent.

Frequent itemsets are the ones which occur at least a minimum number of times in the transactions. Technically, these are the itemsets for which support value (fraction of transactions containing the itemset) is above a minimum threshold — min_support. We have kept min_support=0.09 in our above notebook.

In [None]:
# We would apply association rules on frequent itemset. 
# here we are setting based on lift and keeping minimum lift as 1

rules_mlxtend = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules_mlxtend.head()

From a list of all possible candidate rules, we aim to identify rules that fall above a minimum threshold level (like min_confidence or min_lift).

In [None]:
rules_mlxtend[ (rules_mlxtend['lift'] >= 3) & (rules_mlxtend['confidence'] >= 0.6) ].head()

antecedents and consequents -> The IF component of an association rule is known as the antecedent. The THEN component is known as the consequent. The antecedent and the consequent are disjoint; they have no items in common.

antecedent support -> This measure gives an idea of how frequent antecedent is in all the transactions.

consequent support -> This measure gives an idea of how frequent consequent is in all the transactions.
