<a href="https://colab.research.google.com/github/seremmartin64-ops/ML/blob/main/MarketBasketAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Market Basket Analysis

Market Basket Analysis is a data mining technique used to uncover relationships between items purchased together in a transaction. It helps retailers understand customer purchasing behavior by identifying patterns, such as which products are frequently bought together.

## Key Concepts

### 1. Association Rules
These are rules that describe the relationships between items. For example, if customers often buy bread and butter together, the rule might be:
- **If bread is bought, butter is likely to be bought.**

### 2. Support
This measures how often items appear together in transactions. It’s calculated as the proportion of transactions that include both items.

### 3. Confidence
This indicates the likelihood that a customer who buys one item will also buy another. It’s calculated as the support of the item set divided by the support of the first item.

### 4. Lift
This measures the strength of the association between items, comparing the observed support of the items together to the support expected if they were independent. A lift greater than 1 indicates a positive correlation.

## Applications
Retailers use Market Basket Analysis to:
- Optimize product placement
- Create promotions
- Improve inventory management

Ultimately, this enhances sales and customer satisfaction.

<img src="https://i.ytimg.com/vi/zkpskbTGw6I/maxresdefault.jpg">

In [None]:
# @title Default title text
# Market Basket Analysis Uses Apriori Algorithm, which is an Unsupervised Machine Learning
# Algorithm that is used to associate one item set to another.

#Components of Apriori
#1. Support
#2. Confidence
#3. Lift

#-- Support
    #Support refers to the default popularity of any product. You find the support as a quotient of the division of the number of transactions comprising that product
    #by the total number of transactions. Hence, we get
    #Support (Biscuits) = (Transactions relating biscuits) / (Total transactions)
    #= 400/4000 = 10 percent.

#-- Confidence
    # Confidence refers tchocolateso the posgoogle.com/jno-gwid-tmvsibility that the customers bought both biscuits and
    # chocolates together. So, you need to divide the number of transactions that
    # comprise both biscuits and chocolates by the total number of biscuit transactions
    # Confidence = (Transactions relating both biscuits and Chocolate) / (Total
    # transactions involving Biscuits)
    # = 200/400
    # = 50 percent.

#-- Lift
    #Consider the above example; lift refers to the increase in the ratio of the sale of chocolates when you sell biscuits.
    # The mathematical equations of lift are given below.
    # Lift = (Confidence (Biscuits - chocolates)/ (Support (Biscuits)
    # = 50/10 = 5
    # It means that the probability of people buying both biscuits and chocolates together is five times more than that of purchasing the biscuits alone.
    # If the lift value is below one, it requires that the people are unlikely to buy both the items together. Larger the value, the better is the combination.



In [None]:
# MARKET ANALYSIS FOR A SUPERMARKET STORE
# OBJECTIVE: WHICH ITEMS ARE ASSOCIATED TOGETHER TO BOST SALES(LIFT)

# Import the Data Analysis Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Reading the Retail Data
retail_data = pd.read_csv('https://msi.martial.co.ke/data/OnlineRetail.csv')
retail_data.head(10)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
5,536365,22752,SET 7 BABUSHKA NESTING BOXES,2,12/1/2010 8:26,7.65,17850.0,United Kingdom
6,536365,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,12/1/2010 8:26,4.25,17850.0,United Kingdom
7,536366,22633,HAND WARMER UNION JACK,6,12/1/2010 8:28,1.85,17850.0,United Kingdom
8,536366,22632,HAND WARMER RED POLKA DOT,6,12/1/2010 8:28,1.85,17850.0,United Kingdom
9,536367,84879,ASSORTED COLOUR BIRD ORNAMENT,32,12/1/2010 8:34,1.69,13047.0,United Kingdom


In [None]:
# Data Analysis
# a) Check any empty records, increase there is?, we can drop them
# The Customers who are empty, are the ones who pay via cash
# Lets drop the empty records.
retail_data.isnull().sum()

retail_data.dropna(inplace=True)

retail_data.isnull().sum()


Unnamed: 0,0
InvoiceNo,0
StockCode,0
Description,0
Quantity,0
InvoiceDate,0
UnitPrice,0
CustomerID,0
Country,0


In [None]:
# Lets Check How Many Countries these Business Operates
# You can relate this one to Branches that the Retail Business has.

# Customer purchasing behaviour depends on the regions
# E,g Clothes Bought in Saudi Arabia might be different from clothes bought in Netherlands.

retail_data.groupby('Country').size().sort_values(ascending=False)

Unnamed: 0_level_0,0
Country,Unnamed: 1_level_1
United Kingdom,361878
Germany,9495
France,8491
EIRE,7485
Spain,2533
Netherlands,2371
Belgium,2069
Switzerland,1877
Portugal,1480
Australia,1259


In [None]:
#Lets check for FRANCE
basketdf = (retail_data[retail_data['Country'] =="France"]
          .groupby(['InvoiceNo', 'Description'])['Quantity']
          .sum().unstack().reset_index().fillna(0)
          .set_index('InvoiceNo'))

basketdf

# The other Alternative is to use the Pivot Table.

Description,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,NINE DRAWER OFFICE TIDY,SET 2 TEA TOWELS I LOVE LONDON,SPACEBOY BABY GIFT SET,TRELLIS COAT RACK,10 COLOUR SPACEBOY PEN,12 COLOURED PARTY BALLOONS,12 EGG HOUSE PAINTED WOOD,...,WRAP SUKI AND FRIENDS,WRAP VINTAGE PETALS DESIGN,YELLOW COAT RACK PARIS FASHION,YELLOW GIANT GARDEN THERMOMETER,ZINC STAR T-LIGHT HOLDER,ZINC FOLKART SLEIGH BELLS,ZINC HERB GARDEN CONTAINER,ZINC METAL HEART DECORATION,ZINC T-LIGHT HOLDER STAR LARGE,ZINC T-LIGHT HOLDER STARS SMALL
InvoiceNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536370,0.0,0.0,0.0,0.0,24.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537065,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537463,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
C579532,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
C579562,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
C580161,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
C580263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# One-Hot Encoding
# It replaces any value more than 0 with 1
# Any quantity more than zero(0), will be indicated as 1

# Any quantity more than zero(0), will be indicated as 1
import warnings
warnings.filterwarnings('ignore')

def encode_quantities(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basketdf.applymap(encode_quantities)
basket_sets

Description,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,NINE DRAWER OFFICE TIDY,SET 2 TEA TOWELS I LOVE LONDON,SPACEBOY BABY GIFT SET,TRELLIS COAT RACK,10 COLOUR SPACEBOY PEN,12 COLOURED PARTY BALLOONS,12 EGG HOUSE PAINTED WOOD,...,WRAP SUKI AND FRIENDS,WRAP VINTAGE PETALS DESIGN,YELLOW COAT RACK PARIS FASHION,YELLOW GIANT GARDEN THERMOMETER,ZINC STAR T-LIGHT HOLDER,ZINC FOLKART SLEIGH BELLS,ZINC HERB GARDEN CONTAINER,ZINC METAL HEART DECORATION,ZINC T-LIGHT HOLDER STAR LARGE,ZINC T-LIGHT HOLDER STARS SMALL
InvoiceNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536370,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536852,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536974,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537065,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537463,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
C579532,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
C579562,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
C580161,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
C580263,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# Finally lets apply the Apriori Algorithm
# import association rules and apriori from mlextend library
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

  return datetime.utcnow().replace(tzinfo=utc)


In [None]:
# The mostly bought products i,e products with higher support(apriori)
import warnings
warnings.filterwarnings('ignore')

frequent_itemsets = apriori(basket_sets, min_support=0.10, use_colnames=True)
frequent_itemsets

  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)


Unnamed: 0,support,itemsets
0,0.106987,(LUNCH BAG APPLE DESIGN)
1,0.131004,(LUNCH BAG RED RETROSPOT)
2,0.10262,(LUNCH BAG SPACEBOY DESIGN )
3,0.100437,(LUNCH BAG WOODLAND)
4,0.122271,(LUNCH BOX WITH CUTLERY RETROSPOT )
5,0.144105,(PLASTERS IN TIN CIRCUS PARADE )
6,0.115721,(PLASTERS IN TIN SPACEBOY)
7,0.146288,(PLASTERS IN TIN WOODLAND ANIMALS)
8,0.655022,(POSTAGE)
9,0.159389,(RABBIT NIGHT LIGHT)


In [None]:
# Apply the Association Rules
association_rules(frequent_itemsets, metric="lift", min_threshold=1)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(POSTAGE),(LUNCH BAG RED RETROSPOT),0.655022,0.131004,0.104803,0.16,1.221333,1.0,0.018993,1.034519,0.525316,0.153846,0.033367,0.48
1,(LUNCH BAG RED RETROSPOT),(POSTAGE),0.131004,0.655022,0.104803,0.8,1.221333,1.0,0.018993,1.724891,0.208543,0.153846,0.420253,0.48
2,(POSTAGE),(PLASTERS IN TIN CIRCUS PARADE ),0.655022,0.144105,0.126638,0.193333,1.341616,1.0,0.032246,1.061027,0.738106,0.188312,0.057517,0.536061
3,(PLASTERS IN TIN CIRCUS PARADE ),(POSTAGE),0.144105,0.655022,0.126638,0.878788,1.341616,1.0,0.032246,2.84607,0.297502,0.188312,0.648638,0.536061
4,(POSTAGE),(PLASTERS IN TIN WOODLAND ANIMALS),0.655022,0.146288,0.117904,0.18,1.230448,1.0,0.022082,1.041112,0.542897,0.172524,0.039488,0.492985
5,(PLASTERS IN TIN WOODLAND ANIMALS),(POSTAGE),0.146288,0.655022,0.117904,0.80597,1.230448,1.0,0.022082,1.777964,0.219381,0.172524,0.437559,0.492985
6,(POSTAGE),(RABBIT NIGHT LIGHT),0.655022,0.159389,0.141921,0.216667,1.359361,1.0,0.037518,1.073121,0.76631,0.211039,0.068139,0.553539
7,(RABBIT NIGHT LIGHT),(POSTAGE),0.159389,0.655022,0.141921,0.890411,1.359361,1.0,0.037518,3.147926,0.314486,0.211039,0.682331,0.553539
8,(POSTAGE),(RED TOADSTOOL LED NIGHT LIGHT),0.655022,0.152838,0.135371,0.206667,1.35219,1.0,0.035259,1.067851,0.755002,0.201299,0.06354,0.54619
9,(RED TOADSTOOL LED NIGHT LIGHT),(POSTAGE),0.152838,0.655022,0.135371,0.885714,1.35219,1.0,0.035259,3.018559,0.307449,0.201299,0.668716,0.54619
