<a href="https://colab.research.google.com/github/fkivuti/Care-5-Retail-Market-Basket-Project/blob/main/Market_Basket_Analysis_Projectwk10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Practice Notebook: Market Basket Analysis

# Background Information

Care five is a German multinational retail corporation headquartered in Berlin, Germany.
It is the eighth-largest retailer in the world by revenue. It operates a chain of
hypermarkets, groceries stores, and convenience stores, which as of January 2021,
comprises its 12,00 stores in over 30 countries.

# Statement problem

As a Data analyst working for one of the stores, I must perform market basket
analysis to help the store maximize revenue. More specifically, my  task will be to analyze transactional data to identify the top 10 products likely to be purchased together.

# Task to be performed

- Define the business question
- Perform data importation and loading
- Perform data preprocessing
- Find frequent itemsets
- Generate association rules
- Perform metric interpretation and provide recommendation

# Pre-requisites

In [None]:
# Import the required libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

Data Importation and loading

In [None]:
# Step 1: Loading and Previewing the dataset
# ---
# 
care5_df = pd.read_csv("MarketBasketAnalysis.csv")
care5_df.head()

Unnamed: 0,A,Quantity,Transaction,Store,Product
0,30000,2,93194,6,Magazine
1,30001,2,93194,6,Candy Bar
2,30002,2,93194,6,Candy Bar
3,30003,2,93194,6,Candy Bar
4,30004,2,93194,6,Candy Bar


# Data processing

In [None]:
# Step 1: Data processing 
# ---
# We group the dataframe by Transaction
# and Item and display the count of items
care5_df2 = care5_df.groupby(["Transaction","Product"]).size().reset_index(name="Count")
care5_df2.head()

Unnamed: 0,Transaction,Product,Count
0,93194,Candy Bar,4
1,93194,Magazine,1
2,93197,Pencils,1
3,93200,Candy Bar,3
4,93200,Magazine,1


In [None]:
# Step 2. Then we consolidate the items into one transaction per row 
# with each item one-hot encoded.
# ---
#
care5_df3 = (care5_df2.groupby(['Transaction', 'Product'])['Count']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

care5_df3.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0.0,4.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [None]:
# Step 3: Data processing
# ---
# We then use our custom encoding function to convert 
# all the values to 0 or 1. 
# The Apriori algorithm will only take 0's or 1's.
# ---
# 
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

care5_df4 = care5_df3.applymap(encode_units)

care5_df4.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0
93212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


In [None]:
# Step 4: Generating frequent itemsets
# ---
# We'll generate the most frequent itemsets by using apriori function() 
# pass the parameters: 
# ---
# care5_df4 - our transactional dataset
# min_support = 0.01 - We set minimum-support threshold at 1%
# use_colnames = True to display the column names in our itemset columns.
# If you set use_colnames = False the itemsets will be shown in indices.
# ---
# 
bs_frequent_itemsets = apriori(care5_df4, min_support=0.01, use_colnames=True)
bs_frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)


In [None]:
# Step 5: Finding the association rules
care5_rules = association_rules(bs_frequent_itemsets, metric="lift", min_threshold=1)

# Sorting 
care5_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
care5_rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
46,"(Pencils, Toothpaste)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
22,"(Greeting Cards, Magazine)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
40,"(Toothpaste, Magazine)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
28,"(Toothpaste, Greeting Cards)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
21,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452
51,"(Pencils, Magazine)",(Greeting Cards),0.028546,0.15284,0.012043,0.421875,2.760244,0.00768,1.465358
50,"(Pencils, Greeting Cards)",(Magazine),0.029884,0.231936,0.012043,0.402985,1.737486,0.005112,1.286508
20,"(Candy Bar, Greeting Cards)",(Magazine),0.04609,0.231936,0.017247,0.374194,1.61335,0.006557,1.227319
57,"(Toothpaste, Magazine)",(Greeting Cards),0.029884,0.15284,0.011151,0.373134,2.441344,0.006583,1.351422
34,"(Pencils, Magazine)",(Candy Bar),0.028546,0.175736,0.010407,0.364583,2.074609,0.005391,1.297202


**Observation**

* The output above shows the Top 10 itemsets sorted by confidence value and all itemsets have support value over 1% and lift value over 1. 

* The first itemset shows the association rule "if Pencil, toothpaste then Candy Bar" with support value at 0.011002 means nearly 1.1%% of all transactions have this combination of Pencil, toothpaste then Candy Bar are bought together.

* We also have 48% confidence that Candy Bar sales happen whenever Pencil, toothpaste are purchased. 

* The lift value of 2.75 (greater than 1) shows that the purchase of Candy Bar is indeed influenced by the purchase of Pencil & toothpaste rather than Candy bar purchase being independent of Pencil and toothpast. 

* The lift value of 2.57 means that Pencil, toothpaste purchase lifts the Candy Bar purchase by 2.57 times.

* Therefore, we can conclude that there is indeed evidence to suggest that the purchase of Pencil and Toothpaste leads to the purchase of Candy Bar. The three items should be placed near each other, thereby increasing the store's revenue.
