<a href="https://www.kaggle.com/code/elakapoor/market-basket-analysis?scriptVersionId=91109736" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25ldone
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=afca70a91f9443476720fd5063f99cebfcda7a62780eb169f1a78f480a4ce64a
  Stored in directory: /root/.cache/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


<center><h1> Market Basket Analysis using Apriori Algorithm </h1></center>
<img src = "https://images.unsplash.com/photo-1628102491629-778571d893a3?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1160&q=80">

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing

from apyori import apriori

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/association-rule-learningapriori/Market_Basket_Optimisation.csv


# What is Market Basket?
When we go to any supermarket or shop online we try to purchase all the item we need together instead of buying each item seperately. Thus we can define Market Basket as a basket which is used to group together items of a person’s interest which he/she will buy in one transaction.
Each trip to the market is a single transaction, and in case of e-commerce all items bought in a single login is a transaction

# Objective of Market Basket
1. Cross Selling: It is a stratergy where the seller encourages the customer to spend more money by recommending related products that complement what is being bought already by the consumer. This stratergy would encourage the customer to spend more than he/she had actually thought he/she would.

2. Product Placement: When you go to supermarket you may see that the milk items are kept together. Moreover you may see that as you move forward you find the bread making items such as flour, butter, eggs etc kept just after the milk products. This placement of the items on the supermarket shelf follows planogram. A planogram is defined as a “diagram or model that indicates the placement of retail products on shelves in order to maximize sales”.



# Assosiation Rules
Association Rule Mining is primarily used when you want to identify an association between different items in a set, then find frequent patterns in a transactional database or relational databases(RDBMS). The applications of Association Rule Mining are found in Marketing, Basket Data Analysis (or Market Basket Analysis) in retailing, clustering and classification. It is most commonly used to analyze customer buying habits by finding associations between the different items that customers place in their “shopping baskets”.
The discovery of these associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers. The strategies may include:

1. Changing the store layout according to trends
2. Customer behavior analysis
3. Catalog design
4. Cross marketing on online stores
5. What are the trending items customers buy
6. Customized emails with add-on sales etc.

Online retailers and publishers can use this type of analysis to:

1. Inform the placement of content items on their media sites, or products in their catalog
2. Deliver targeted marketing

# Advantages of Market Basket Analysis
Market basket Analysis(MBA) is applied to data of customers from the point of sale (PoS) systems.

**Advantages for retailers:**

1. Increases customer engagement
2. Boosting sales and increasing RoI
3. Optimize marketing strategies and campaigns
4. Identifies customer behavior and pattern

**Advantages to customers:**

1. In online retail it helps customer by displaying more products related to intended purchase.

# Algorithms used in Market Basket Analysis
For each algorithm the important objective is to predict the probability of items that are being bought together by customers. Following are some of the algorithms used for analysis:

1. AIS
2. SETM Algorithm
3. Apriori Algorithm
4. FP Growth

In this article we will focus on **Apriori Algorithm** which is currently the most popular algorithm.



# Apriori Algorithm
*Objective:*
It helps to find frequent itemsets in transactions and identifies association rules between these items. 

*Advantage:*
1. It is also considered accurate and overtop AIS and SETM algorithms.
2. It is easy to implement and interpret
3. It can be used on large datasets and can easily be parallelized.

*Disadvantage:*
1. Calculating support is expensive as it has to go through the entire dataset
2. It is computationally expensive

*Key Metrics:*
It uses the concept of Confidence, Support and Lift.

1. Support - Support of an item or items is the frequency of that or those item(s) appearing out of the total transactions.<br><br>
Support = (shampoo + conditioner)/total different baskets = 8/10 = .8<br><br>
Both products are bought together in 80% of the transactions. This means that more the value for the support more is the chance of the items purchased together.<br>

2. Confidence - This is the likelihood of an item being purchased given another item is purchased. In other words, it determines how often that the products are purchased together.<br><br>
Confidence = (shampoo + conditioner) / shampoo = 5/6 = .83<br><br>
This means that 83% of the time shampoo is purchased by the customer, conditioner is also purchased.<br>

3. Lift- It is a metric that tells us by what value the probability of buying Y is leveraged or increased when X is purchased.
        a. Lift = 1 → There’s no relation between the purchase of the products.
        b. Lift > 1 → The products are likely to be bought together. Higher the lift, the higher the chances.
        c. Lift < 1 → The products are unlikely to be bought together. They are substitutes.

In [3]:
path = "/kaggle/input/association-rule-learningapriori/Market_Basket_Optimisation.csv"
df = pd.read_csv(path, header = None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


The dataset doesnot have any header. Thus we have used header = None. Otherwise, the first row will be dispalyed as a header row. <br>
Let us check the rows, columns in the dataset along with the null values and type of columns.

In [4]:
# number of rows, columns
df.shape

(7501, 20)

In [5]:
# number of null values in each column
df.isnull().sum()

0        0
1     1754
2     3112
3     4156
4     4972
5     5637
6     6132
7     6520
8     6847
9     7106
10    7245
11    7347
12    7414
13    7454
14    7476
15    7493
16    7497
17    7497
18    7498
19    7500
dtype: int64

In [6]:
# type of column
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 20 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       7501 non-null   object
 1   1       5747 non-null   object
 2   2       4389 non-null   object
 3   3       3345 non-null   object
 4   4       2529 non-null   object
 5   5       1864 non-null   object
 6   6       1369 non-null   object
 7   7       981 non-null    object
 8   8       654 non-null    object
 9   9       395 non-null    object
 10  10      256 non-null    object
 11  11      154 non-null    object
 12  12      87 non-null     object
 13  13      47 non-null     object
 14  14      25 non-null     object
 15  15      8 non-null      object
 16  16      4 non-null      object
 17  17      4 non-null      object
 18  18      3 non-null      object
 19  19      1 non-null      object
dtypes: object(20)
memory usage: 1.1+ MB


In [7]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [8]:
#converting dataframe into list of lists
dataframe_list=[]

# values of range is according to the number of rows and columns
for i in range(1,7501):
    dataframe_list.append([str(df.values[i,j]) for j in range(0,20)])

In [9]:
#applying apriori algorithm
association_rules = apriori(dataframe_list, min_support=0.0045, min_confidence=0.2,
                            min_lift=3, min_length=2)
association_results = list(association_rules)

In [10]:
association_results[0]

RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004533333333333334, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.2905982905982906, lift=4.843304843304844)])

# Inferences from the above 
1. Support: 0.0045. <br>
The support is calculated = (chicken + light cream)/(total transactions)<br><br>
2. Confidence: 0.2905<br>
Confidence = (chicken + light cream)/(light cream)<br>
The confidence level for the rule is 0.2905, which shows that out of all the transactions that contain light cream, 29.05% percent contain chicken too.<br><br>
3. Lift: 4.84<br>
Lift = ((chicken + light cream)/light cream)/((chicken)/total)<br>
The lift of 4.8433 tells us that chicken is 4.8433 times more likely to be bought by the customers who buy light cream compared to the default likelihood sale of chicken.<br>

# Create a dataframe

In [11]:
Rule = []
Support = []
Confidence = []
Lift = []
for item in association_results:
    pair = item[0]
    items = [x for x in pair]
    Rule.append(items)
    Support.append(str(item[1]))
    Confidence.append(str(item[2][0][2]))
    Lift.append(str(item[2][0][3]))

In [12]:
dict = {'Rule': Rule, 'Support': Support, 'Confidence': Confidence, "Lift": Lift} 
d = pd.DataFrame(dict)
d

Unnamed: 0,Rule,Support,Confidence,Lift
0,"[chicken, light cream]",0.0045333333333333,0.2905982905982906,4.843304843304844
1,"[escalope, mushroom cream sauce]",0.0057333333333333,0.3006993006993007,3.790327319739085
2,"[escalope, pasta]",0.0058666666666666,0.3728813559322034,4.700185158809287
3,"[herb & pepper, ground beef]",0.016,0.3234501347708895,3.2915549671393096
4,"[ground beef, tomato sauce]",0.0053333333333333,0.3773584905660377,3.840147461662528
5,"[whole wheat pasta, olive oil]",0.008,0.2714932126696833,4.130221288078346
6,"[pasta, shrimp]",0.0050666666666666,0.3220338983050848,4.514493901473151
7,"[chicken, light cream, nan]",0.0045333333333333,0.2905982905982906,4.843304843304844
8,"[chocolate, frozen vegetables, shrimp]",0.0053333333333333,0.2325581395348837,3.260160834601174
9,"[ground beef, spaghetti, cooking oil]",0.0048,0.5714285714285714,3.281557646029315


# Conclusions
1. In this tutorial we went through the market basket analysis using Apriori Algorithm. 
2. Study the key metrics used in algorithm
3. Check the assosiation rules and infered the values we got
4. Summariase the result as a dataframe

# Further Analysis
The above analysis can be done using other alorithms mentioned. Also we can do EDA on the dataframe obtained to visualize our results.<br>

I hope you enjoyed the work. This work is based on internet research. Incase you have any feed back please let me know in comment section. Please upvote in case you like the work!!