# Apriori Algorithm
Market basket analysis is typically implemented using association rule learning, which is a technique for discovering interesting relationships between variables in a dataset. One of the most popular algorithms for association rule learning is the Apriori algorithm, which is used to identify frequent item sets and generate association rules. These rules are evaluated using measures such as support, confidence, and lift, which are used to determine the strength of the association between items.

For example, consider a dataset of customer purchases from a grocery store. An association rule might be "If a customer buys bread, they are likely to also buy butter."

The Evaluation metric used to evaluate the strength of the association between items are:

# Support: 
It is a measure of the frequency of an itemset in the dataset. Mathematically, it is calculated as follows:

suport(I)=number of transactions containing itemset I / total number of transactions


# Confidence:
It tells us how often the items a and b occur given that a is bought. It is a measure of the reliability of an association rule. Mathematically, it is calculated as follows 

  confidence (X → Y) = support(X U Y) / support (X)


# Lift: 

It is a measure of the strength of an association rule. Mathematically, it is calculated as follows:

  lift(X → y) = confidence (X → Y) / support (Y)

where X and Y are the antecedent and consequent of the rule, respectively

In [16]:
import pandas as pd
import numpy as np

In [32]:
pd.set_option("Display.max_row",200)

In [2]:
dataset = pd.read_csv("groceries - groceries.csv")
dataset.head(3)

Unnamed: 0,Item(s),Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Item 7,Item 8,Item 9,...,Item 23,Item 24,Item 25,Item 26,Item 27,Item 28,Item 29,Item 30,Item 31,Item 32
0,4,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,...,,,,,,,,,,
1,3,tropical fruit,yogurt,coffee,,,,,,,...,,,,,,,,,,
2,1,whole milk,,,,,,,,,...,,,,,,,,,,


In [5]:
dataset.isnull().sum()

Item(s)       0
Item 1        0
Item 2     2159
Item 3     3802
Item 4     5101
Item 5     6106
Item 6     6961
Item 7     7606
Item 8     8151
Item 9     8589
Item 10    8939
Item 11    9185
Item 12    9367
Item 13    9484
Item 14    9562
Item 15    9639
Item 16    9694
Item 17    9740
Item 18    9769
Item 19    9783
Item 20    9797
Item 21    9806
Item 22    9817
Item 23    9821
Item 24    9827
Item 25    9828
Item 26    9828
Item 27    9829
Item 28    9830
Item 29    9831
Item 30    9834
Item 31    9834
Item 32    9834
dtype: int64

In [6]:
dataset.columns

Index(['Item(s)', 'Item 1', 'Item 2', 'Item 3', 'Item 4', 'Item 5', 'Item 6',
       'Item 7', 'Item 8', 'Item 9', 'Item 10', 'Item 11', 'Item 12',
       'Item 13', 'Item 14', 'Item 15', 'Item 16', 'Item 17', 'Item 18',
       'Item 19', 'Item 20', 'Item 21', 'Item 22', 'Item 23', 'Item 24',
       'Item 25', 'Item 26', 'Item 27', 'Item 28', 'Item 29', 'Item 30',
       'Item 31', 'Item 32'],
      dtype='object')

In [7]:
dataset.shape

(9835, 33)

In [13]:
# we make list market and customer and drop or delete  nun values 
# in this function all the data put in one columns (all rows are convert in one columns)
market = []
for i in range(0,dataset.shape[0]):
    cus = []
    for j in dataset.columns :
        if type(dataset[j][i])==str: # str are represent string 
            cus.append(dataset[j][i])
    market.append(cus) # customers data are put in market and it will ready list of market without nan values

In [None]:
# market

In [20]:
# we get ready list which market data are convert in single list 
l = []
for i in market:
     for j in i:
        l.append(j)

In [21]:
import collections

In [24]:
p = collections.Counter(l) # market are counting its mean how many time data are who product are sale or buy

In [25]:
# p.keys()
# p.values()

In [26]:
# we get ready a dictionary 
d = {"Name item":p.keys(),"Repeated Data":p.values()}

In [33]:
# we find data which more repeated 
pd.DataFrame(d).sort_values(by="Repeated Data",ascending=False) 

Unnamed: 0,Name item,Repeated Data
7,whole milk,2513
11,other vegetables,1903
17,rolls/buns,1809
31,soda,1715
5,yogurt,1372
24,bottled water,1087
42,root vegetables,1072
4,tropical fruit,1032
52,shopping bags,969
50,sausage,924


In [None]:
# we get ready matric of product by mlxtend how many time product are repeating

In [34]:
from mlxtend.preprocessing.transactionencoder import TransactionEncoder

In [35]:
tr = TransactionEncoder()
tr.fit(market)

In [37]:
df = pd.DataFrame(tr.transform(market),columns=tr.columns_)

In [39]:
# df

In [40]:
from mlxtend.frequent_patterns import apriori

In [41]:
apriori(df,min_support=0.07,use_colnames=True,max_len=3).sort_values(by=["support"]) 
# (whole milk, other vegetables) this product are more repeated    and sort_values function are use for display more values 

Unnamed: 0,support,itemsets
0,0.080529,(bottled beer)
1,0.110524,(bottled water)
2,0.077682,(canned beer)
3,0.082766,(citrus fruit)
4,0.072293,(fruit/vegetable juice)
5,0.079817,(newspapers)
6,0.193493,(other vegetables)
7,0.088968,(pastry)
8,0.075648,(pip fruit)
9,0.183935,(rolls/buns)
