Market basket analysis is used by companies to identify items that are frequently purchased together. 

**How Does Market Basket Analysis Work?**

Market basket analysis is frequently used by restaurants, retail stores, and online shopping platforms to encourage customers to make more purchases in a single visit. This is a use-case of data science in marketing that increases company sales and drives business growth and commonly utilizes the Apriori algorithm.

**What is the Apriori Algorithm?**

The Apriori algorithm is the most common technique for performing market basket analysis.

It is used for association rule mining, which is a rule-based process used to identify correlations between items purchased by users.

**What Are the Components of the Apriori Algorithm?**

The Apriori algorithm has three main components:

*   Support
*   Lift
*   Confidence

Here is a tabular representation of this purchase data:

          Milk      Beer      Eggs   Bread  Bananas   Apples 
Basket1    1         1          1      1       0       0

Basket2    1         0          0      1       0       0

Basket3    1         0          0      1       0       1

Basket4    0         0          0      1       1       1


Let’s calculate the support, confidence, and lift.

**Support**

The first component of the Apriori algorithm is support – we use it to assess the overall popularity of a given product with the following formula:

Support(item) = Transactions comprising the item / Total            transactions

A high support value indicates that the item is present in most purchases, therefore marketers should focus on it more.

**Confidence**

Confidence tells us the likelihood of different purchase combinations. We calculate that using the following formula:

Confidence (Bread -> Milk) = Transactions comprising bread and milk / Transactions comprising bread

**Lift**

Finally, lift refers to the increase in the ratio of the sale of milk when you sell bread:

Lift = Confidence (Bread -> Milk) / Support(Bread) = 0.75/1 = 1.3.

This means that customers are 1.3 times more likely to buy milk if you also sell bread.

**Step 1: Pre-Requisites for Performing Market Basket Analysis**

Download the dataset "groceries_dataset.csv"

**Step 2: Reading the Dataset**

In [None]:
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv('content/drive/My Drive/Data/Groceries_dataset.csv')
df.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


ParserError: ignored

**Step 3: Data Preparation for Market Basket Analysis**

Before we perform market basket analysis, we need to convert this data into a format that can easily be ingested into the Apriori algorithm. In other words, we need to turn it into a tabular structure comprising ones and zeros, as displayed in the bread and milk example above.

To achieve this, the first group items that have the same member number and date:

In [None]:
df[‘single_transaction’] = df[‘Member_number’].astype(str)+’_’+df[‘Date’].astype(str)

df.head()

Now, let’s pivot this table to convert the items into columns and the transaction into rows:

In [None]:
df2 = pd.crosstab(df['single_transaction'], df['itemDescription'])
df2.head()

The final data pre-processing step involves encoding all values in the above data frame to 0 and 1.

In [None]:
def encode(item_freq):
    res = 0
    if item_freq > 0:
        res = 1
    return res
    
basket_input = df2.applymap(encode)

**Step 4: Build the Apriori Algorithm for Market Basket Analysis**

Now, let’s import the Apriori algorithm from the MLXtend Python package and use it to discover frequently-bought-together item combinations:

In [None]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

frequent_itemsets = apriori(basket_input, min_support=0.001, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift")

rules.head()

To get the most frequent item combinations in the entire dataset, let’s sort the dataset by support, confidence, and lift:

In [None]:
rules.sort_values(["support", "confidence","lift"],axis = 0, ascending = False).head(8)

The resulting table shows that the four most popular product combinations that are frequently bought together are:

Rolls and milk
Yogurt and milk
Sausages and milk
Soda and vegetables
One reason for this could be that the grocery store ran a promotion on these items together or displayed them within the same line of sight to improve sales.