<h2>Apriori Algorithm in Machine Learning</h2>
<h3><b>dataset :</b> groceries - groceries.csv</h3>

The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work on the databases that contain transactions. With the help of these association rule, it determines how strongly or how weakly two objects are connected. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset associations efficiently. It is the iterative process for finding the frequent itemsets from the large dataset.

This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is mainly used for market basket analysis and helps to find those products that can be bought together. It can also be used in the healthcare field to find drug reactions for patients.

Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two transactions, 2 and 3 are the frequent itemsets.

Link : <a href="https://www.javatpoint.com/apriori-algorithm-in-machine-learning">Steps for Apriori Algorithm</a>

In [2]:
import pandas as pd

In [3]:
data = pd.read_csv('/Users/snehvora/Desktop/Machine Learning/groceries - groceries.csv')
data 

Unnamed: 0,Item(s),Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Item 7,Item 8,Item 9,...,Item 23,Item 24,Item 25,Item 26,Item 27,Item 28,Item 29,Item 30,Item 31,Item 32
0,4,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,...,,,,,,,,,,
1,3,tropical fruit,yogurt,coffee,,,,,,,...,,,,,,,,,,
2,1,whole milk,,,,,,,,,...,,,,,,,,,,
3,4,pip fruit,yogurt,cream cheese,meat spreads,,,,,,...,,,,,,,,,,
4,4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9830,17,sausage,chicken,beef,hamburger meat,citrus fruit,grapes,root vegetables,whole milk,butter,...,,,,,,,,,,
9831,1,cooking chocolate,,,,,,,,,...,,,,,,,,,,
9832,10,chicken,citrus fruit,other vegetables,butter,yogurt,frozen dessert,domestic eggs,rolls/buns,rum,...,,,,,,,,,,
9833,4,semi-finished bread,bottled water,soda,bottled beer,,,,,,...,,,,,,,,,,


In [4]:
data = data.iloc[:,1:]
data.shape

(9835, 32)

In [5]:
transactions = []
for i in range(0,1000):  
    transactions.append([str(data.values[i,j]) for j in range(0, data.shape[1])])

# Each item in the record list will include items bought by a customer
transactions[0]

['citrus fruit',
 'semi-finished bread',
 'margarine',
 'ready soups',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan']

In [6]:
from apyori import apriori
association_rules = apriori(transactions=transactions, min_support=0.008, min_confidence=0.3, min_lift=3, min_length=2)  

In [7]:
association_results = list(association_rules)
association_results

[RelationRecord(items=frozenset({'whipped/sour cream', 'berries'}), support=0.017, ordered_statistics=[OrderedStatistic(items_base=frozenset({'berries'}), items_add=frozenset({'whipped/sour cream'}), confidence=0.3617021276595745, lift=4.88786658999425)]),
 RelationRecord(items=frozenset({'root vegetables', 'butter'}), support=0.017, ordered_statistics=[OrderedStatistic(items_base=frozenset({'butter'}), items_add=frozenset({'root vegetables'}), confidence=0.3695652173913044, lift=3.3596837944664033)]),
 RelationRecord(items=frozenset({'candy', 'tropical fruit'}), support=0.009, ordered_statistics=[OrderedStatistic(items_base=frozenset({'candy'}), items_add=frozenset({'tropical fruit'}), confidence=0.35999999999999993, lift=3.7113402061855663)]),
 RelationRecord(items=frozenset({'coffee', 'long life bakery product'}), support=0.008, ordered_statistics=[OrderedStatistic(items_base=frozenset({'long life bakery product'}), items_add=frozenset({'coffee'}), confidence=0.3076923076923077, lif

In [8]:
for item in association_results:
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])
    print("Support: " + str(item[1]))
    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("==========================================")

Rule: whipped/sour cream -> berries
Support: 0.017
Confidence: 0.3617021276595745
Lift: 4.88786658999425
Rule: root vegetables -> butter
Support: 0.017
Confidence: 0.3695652173913044
Lift: 3.3596837944664033
Rule: candy -> tropical fruit
Support: 0.009
Confidence: 0.35999999999999993
Lift: 3.7113402061855663
Rule: coffee -> long life bakery product
Support: 0.008
Confidence: 0.3076923076923077
Lift: 4.048582995951417
Rule: curd -> sliced cheese
Support: 0.009
Confidence: 0.3214285714285714
Lift: 4.285714285714286
Rule: ham -> whipped/sour cream
Support: 0.008
Confidence: 0.38095238095238093
Lift: 5.148005148005148
Rule: hard cheese -> tropical fruit
Support: 0.009
Confidence: 0.37499999999999994
Lift: 3.865979381443298
Rule: newspapers -> oil
Support: 0.01
Confidence: 0.33333333333333337
Lift: 3.745318352059926
Rule: root vegetables -> packaged fruit/vegetables
Support: 0.011
Confidence: 0.3333333333333333
Lift: 3.0303030303030303
Rule: tropical fruit -> pip fruit
Support: 0.013
Confid