Link: https://analyticsindiamag.com/hands-on-guide-to-market-basket-analysis-with-python-codes/

# Market Basket Analysis using Apriori Method

### Association Rule Learning

#### The association rule learning is a rule-based machine learning approach that generates the relationship between variables in a dataset. It has major applications in the retail industry including E-Commerce retail businesses. Using this strategy, the products sold in an association can be explored and can be offered to customers to buy together.

### The association rule learning has three popular algorithms – Apriori, Eclat, and FP-Growth. In this article, we will discuss the Apriori method of association learning.

## Apriori Algorithm in Market Basket Analysis

#### Apriori is a popular algorithm used in market basket analysis. This algorithm is used with relational databases for frequent itemset mining and association rule learning. It uses a bottom-up approach where frequent items are extended one item at a time and groups of candidates are tested against the available dataset. This process continues until no further extensions are found. It uses the concept of Support, Confidence and Lift.

Loading the required Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from apyori import apriori

In [13]:
# Loading the dataset

dataset = pd.read_csv("C:\\Users\\SHASHI\\OneDrive\\Desktop\\Python\\Market_Basket_Optimisation.csv", header=None)

dataset.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [7]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   shrimp             7500 non-null   object 
 1   almonds            5746 non-null   object 
 2   avocado            4388 non-null   object 
 3   vegetables mix     3344 non-null   object 
 4   green grapes       2528 non-null   object 
 5   whole weat flour   1863 non-null   object 
 6   yams               1368 non-null   object 
 7   cottage cheese     980 non-null    object 
 8   energy drink       653 non-null    object 
 9   tomato juice       394 non-null    object 
 10  low fat yogurt     255 non-null    object 
 11  green tea          153 non-null    object 
 12  honey              86 non-null     object 
 13  salad              46 non-null     object 
 14  mineral water      24 non-null     object 
 15  salmon             7 non-null      object 
 16  antioxydant juice  3 non

### Once we have read the dataset, we need to get the list of items in each transaction. SO we will run two loops here. One for the total number of transactions, and other for the total number of columns in each transaction. This list will work as a training set from where we can generate the list of association rules.

In [18]:
# Getting the list of transactions from the dataset

transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

#### Now once we are ready with the list of items in our training set, we need to run the apriori algorithm which will learn the list of association rules from the training set. Suppose we want to find the association of items with a product which is sold at least 3 times a day. So, the minimum support here will be 3 items per day multiplied by 7 days of weak and divided by the total number of transactions. That means (3*7)/7501 =  0.00279. So the equivalent 0.003 is taken here as support. Now let us we are looking for a 30% confidence in the association rule so we have kept 0.3 as the minimum confidence. The minimum lift is taken as 3 and the minimum length is considered as 2 because we want to find an association between a minimum of two items. These hyperparameters can be tuned depending on the business requirements. 

In [27]:
# Training the Apriori Algorithm on the dataset

rule_list = apriori(transactions, min_support = 0.003, min_confidence = 0.30, min_lift = 3, min_length = 2)

In [28]:
print(rule_list)

<generator object apriori at 0x000002178629C120>


In [29]:
list(rule_list)

[RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'herb & pepper', 'ground beef'}), support=0.015997866951073192, ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}), confidence=0.3234501347708895, lift=3.2919938411349285)]),
 RelationRecord(items=frozenset({'tomato sauce', 'ground beef'}), support=0.005332622317024397, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomato sauce'}), items_add=frozenset({'groun

### Once executed the above code of line, we have generated the list of association rules between the items of the retail. To see these rules we need to execute the below line of code.

In [30]:
# Visualizing the list of rules
results = list(rule_list)
for i in results:
    print('\n')
    print(i)
    print('**********') 

As we can see in the above output screenshot, there are rules generated along with confidence. The first rule indicates an association between mushroom cream sauce and escalope with a confidence of 30%. The next rule shows an association between escalope and pasta with a confidence of 37.28%. There are 102 rules generated in this experiment. The number of generated rules depends on the values of hyperparameters. We can increase the minimum confidence value and find the rules accordingly.

So, this is a way of market basket analysis association rule learning. In this experiment, we have used the apriori algorithms. We can also use other algorithms such as Eclat and FP-Growth for the same purpose. 