Skip to content

haihapham/Market-Basket-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

MARKET BASKET ANALYSIS - SQL SERVER

1. OVERVIEW.

Market Basket Analysis (MBA) also known as association rule learning or affinity analysis, is a data mining technique that can be used in various fields, such as marketing, bioinformatics, education field, nuclear science, etc. The main aim of MBA in marketing is to provide information to the retailer to understand the purchase behavior of the buyer, which can help the retailer in correct decision-making (Kaur & Kang, 2016).
The objective of this analysis is to determine when customers have 2 certain products in their shopping cart or basket, which items they will be likely to purchase with.

2. MARKET BASKET ANALYSIS WITH ADVENTUREWORKS2019 DATABASE.

  • The AdventureWorksDW21019 sample database of Microsoft SQL Server includes the following tables: image
  • FactInternetSales table, recording sales through internet channel, will be mainly used to analyze our basket with two attributes: SalesOrderNumber and ProductKey.
SELECT TOP 10 
	SalesOrderNumber
	, ProductKey
FROM FactInternetSales;
  • Result:
    image

2.1. Query order with at least 3 different products.

SELECT 
    SalesOrderNumber
    , COUNT(ProductKey) AS NumberofProducts
FROM FactInternetSales
GROUP BY SalesOrderNumber
HAVING COUNT(ProductKey) >= 3;

We have the following result:
image

2.2. Query Sales Order Number and Product Key for these above orders.

WITH OrderList AS
(
SELECT
        SalesOrderNumber
        , COUNT(ProductKey) AS NumberofProducts
    FROM FactInternetSales
    GROUP BY SalesOrderNumber
    HAVING COUNT(ProductKey) >= 3
)
SELECT
    OrderList.SalesOrderNumber
    , InSales.ProductKey
FROM OrderList
JOIN FactInternetSales AS InSales 
ON Orderlist.SalesOrderNumber = InSales.SalesOrderNumber;

We have the following result:
image

2.3. List out baskets of 3 products in the same order.

WITH OrderList AS
(
    SELECT
        SalesOrderNumber
        , COUNT(ProductKey) AS NumberofProducts
    FROM FactInternetSales
    GROUP BY SalesOrderNumber
    HAVING COUNT(ProductKey) >= 3
),
OrderInfo AS 
(
    SELECT
        OrderList.SalesOrderNumber
        , InSales.ProductKey
    FROM OrderList
    JOIN FactInternetSales AS InSales 
    ON Orderlist.SalesOrderNumber = InSales.SalesOrderNumber
)

SELECT 
	OrderInfo1.SalesOrderNumber
	, OrderInfo1.ProductKey AS Product1
	, OrderInfo2.ProductKey AS Product2
    , OrderInfo3.ProductKey AS Product3
FROM OrderInfo AS OrderInfo1
JOIN OrderInfo AS OrderInfo2 
    ON OrderInfo2.SalesOrderNumber = OrderInfo1.SalesOrderNumber
JOIN OrderInfo AS OrderInfo3 
    ON OrderInfo3.SalesOrderNumber = OrderInfo1.SalesOrderNumber   
WHERE OrderInfo1.ProductKey < OrderInfo2.ProductKey  
    AND OrderInfo2.ProductKey < OrderInfo3.ProductKey;
  • We self-join 3 CTE tables (tables of Sales Order Number and Product Key) to create a new table with an Order Number and 3 different products in that order.*
  • Condition to avoid duplicate - WHERE:
    OrderInfo1.ProductKey < OrderInfo2.ProductKey < OrderInfo3.ProductKey.
  • We have the following result:
    image

2.4. Measure the frequency of the third product with 2 existing first products.

WITH OrderList AS
(
    SELECT
        SalesOrderNumber
        , COUNT(ProductKey) AS NumberofProducts
    FROM FactInternetSales
    GROUP BY SalesOrderNumber
    HAVING COUNT(ProductKey) >= 3
),
OrderInfo AS 
(
    SELECT
        OrderList.SalesOrderNumber
        , InSales.ProductKey
    FROM OrderList
    JOIN FactInternetSales AS InSales 
    ON Orderlist.SalesOrderNumber = InSales.SalesOrderNumber
)

SELECT DISTINCT
	OrderInfo1.ProductKey AS Product1
	, OrderInfo2.ProductKey AS Product2
    , OrderInfo3.ProductKey AS Product3
    , COUNT(OrderInfo3.ProductKey) OVER(PARTITION BY OrderInfo1.ProductKey, OrderInfo2.ProductKey, OrderInfo3.ProductKey) AS Frequency
FROM OrderInfo AS OrderInfo1
JOIN OrderInfo AS OrderInfo2 
    ON OrderInfo2.SalesOrderNumber = OrderInfo1.SalesOrderNumber
JOIN OrderInfo AS OrderInfo3 
    ON OrderInfo3.SalesOrderNumber = OrderInfo1.SalesOrderNumber   
WHERE OrderInfo1.ProductKey < OrderInfo2.ProductKey  
    AND OrderInfo2.ProductKey < OrderInfo3.ProductKey
ORDER BY Frequency DESC;

We have the following result:
image

  • From the above result, we can recognize that when there are products number 477 and 478 in customers' baskets, they are most likely to buy products number 485.
  • We can apply MBA in purchasing suggestions for customers or product arrangements in supermarkets to improve overall sales.

3. CONCLUSION

SQL can help us to query data needed for Market Basket Analysis by suggesting them the most relevant product besides what they have chose to optimize revenue for the organizations or businesses and make the right business decisions.

REFERENCE

  1. Kaur, M. and Kang, S. (2016) “Market basket analysis: Identify the changing trends of Market Data Using Association Rule Mining,” Procedia Computer Science, 85, pp. 78–85. Available at: https://doi.org/10.1016/j.procs.2016.05.180.

About

Market Basket analysis - SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published