Title - Implementation of Market Basket Analysis

Market Basket Analysis (MBA) is a powerful technique in data mining and retail analytics that aims to discover associations and relationships between items frequently purchased together by customers. It provides valuable insights into customer behavior and helps businesses optimize product placement, promotions, and inventory management. In this theory, we will explore the key concepts and steps involved in the implementation of Market Basket Analysis.

1. Apriori Algorithm:
The Apriori algorithm is a fundamental approach used to perform Market Basket Analysis. It works on the principle of association rule mining, where frequent itemsets and association rules are generated from transactional data. The algorithm's name is derived from the "apriori" property, which states that if an itemset is frequent, all of its subsets must also be frequent.

2. Transactional Data:
Market Basket Analysis requires transactional data, which consists of customer purchase records. Each transaction represents a set of items bought together by a customer during a specific purchase instance. The data is typically represented in a binary matrix format, where each row represents a transaction, and each column corresponds to an item. If an item is present in a transaction, it is denoted by 1; otherwise, it is denoted by 0.

3. Support, Confidence, and Lift:
Three essential metrics are used in Market Basket Analysis to identify significant associations between items:

   a. Support: It measures the frequency of an itemset in the dataset. It is calculated as the number of transactions containing the itemset divided by the total number of transactions.

   b. Confidence: It measures the conditional probability that an item B is purchased given that item A is purchased. It is calculated as the support of the combined itemset (A and B) divided by the support of item A.

   c. Lift: It measures the strength of the association between items A and B. It is calculated as the confidence of the association divided by the support of item B.

4. Steps for Market Basket Analysis:

   a. Data Preprocessing: Convert the transactional data into a suitable format (e.g., binary matrix) to represent customer purchases.

   b. Generating Frequent Itemsets: Use the Apriori algorithm to identify frequent itemsets, i.e., sets of items that meet a predefined minimum support threshold.

   c. Generating Association Rules: Based on the frequent itemsets, generate association rules with a minimum confidence threshold. These rules represent item associations with potential significance.

   d. Interpretation: Analyze the generated association rules to gain insights into customer purchasing patterns and identify meaningful product associations.

5. Interpretation of Results:
The output of Market Basket Analysis includes frequent itemsets and association rules. Frequent itemsets represent sets of items that are frequently purchased together, while association rules indicate the strength of the relationships between items. These results are valuable for retailers to understand cross-selling opportunities, optimize product bundling, and design targeted marketing strategies.

6. Real-World Applications:
Market Basket Analysis finds applications in various industries, such as retail, e-commerce, and marketing:

   a. Retail Merchandising: Retailers can use the insights from Market Basket Analysis to optimize product placement and design effective cross-selling strategies.

   b. E-commerce Recommendations: E-commerce platforms use association rules to recommend related products to customers based on their purchase history.

   c. Inventory Management: By understanding item associations, businesses can manage inventory efficiently and ensure sufficient stock availability for popular item combinations.

In conclusion, Market Basket Analysis is a valuable technique for discovering associations between items purchased by customers. The Apriori algorithm efficiently generates frequent itemsets and association rules, enabling businesses to uncover valuable insights into customer behavior and optimize their retail strategies. With its wide range of applications in various industries, Market Basket Analysis remains a powerful tool for data-driven decision-making and customer-centric marketing approaches.

In [None]:
import pandas as pd
from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import association_rules

In [None]:
data = pd.read_csv('Day1.csv')

In [None]:
data

Unnamed: 0,Wine,Chips,Bread,Butter,Milk,Apple
0,Wine,,Bread,Butter,Milk,
1,,,Bread,Butter,Milk,
2,,Chips,,,,Apple
3,Wine,Chips,Bread,Butter,Milk,Apple
4,Wine,Chips,,,Milk,
5,Wine,Chips,Bread,Butter,,Apple
6,Wine,Chips,,,Milk,
7,Wine,,Bread,,,Apple
8,Wine,,Bread,Butter,Milk,
9,,Chips,Bread,Butter,,Apple


In [None]:
encoded_data = pd.get_dummies(data)

In [None]:
encoded_data

Unnamed: 0,Wine_Wine,Chips_Chips,Bread_Bread,Butter_Butter,Milk_Milk,Apple_Apple
0,1,0,1,1,1,0
1,0,0,1,1,1,0
2,0,1,0,0,0,1
3,1,1,1,1,1,1
4,1,1,0,0,1,0
5,1,1,1,1,0,1
6,1,1,0,0,1,0
7,1,0,1,0,0,1
8,1,0,1,1,1,0
9,0,1,1,1,0,1


Frequent Itemsets Mining:The code uses the FP-Growth algorithm to mine frequent itemsets from the encoded data. The fpgrowth() function from MLxtend is called with the min_support parameter set to 0.01, indicating that only itemsets with a support of at least 1% should be considered frequent. The use_colnames parameter is set to True to use the original item names in the output.


In [None]:
frequent_itemsets = fpgrowth(encoded_data, min_support=0.01, use_colnames=True)



In [None]:
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.761905,(Milk_Milk)
1,0.714286,(Bread_Bread)
2,0.714286,(Wine_Wine)
3,0.666667,(Butter_Butter)
4,0.666667,(Apple_Apple)
...,...,...
58,0.238095,"(Bread_Bread, Milk_Milk, Chips_Chips, Butter_B..."
59,0.190476,"(Apple_Apple, Chips_Chips, Bread_Bread, Milk_M..."
60,0.190476,"(Apple_Apple, Chips_Chips, Bread_Bread, Milk_M..."
61,0.190476,"(Chips_Chips, Bread_Bread, Milk_Milk, Wine_Win..."


Association Rule Generation:
The code generates association rules based on the frequent itemsets mined in the previous step. The association_rules() function is called with the metric parameter set to "lift" and the min_threshold parameter set to 1, indicating that only rules with a lift of at least 1 should be considered.

In [None]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

In [None]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Bread_Bread),(Milk_Milk),0.714286,0.761905,0.571429,0.800000,1.0500,0.027211,1.190476,0.166667
1,(Milk_Milk),(Bread_Bread),0.761905,0.714286,0.571429,0.750000,1.0500,0.027211,1.142857,0.200000
2,(Milk_Milk),(Wine_Wine),0.761905,0.714286,0.619048,0.812500,1.1375,0.074830,1.523810,0.507692
3,(Wine_Wine),(Milk_Milk),0.714286,0.761905,0.619048,0.866667,1.1375,0.074830,1.785714,0.423077
4,(Bread_Bread),(Wine_Wine),0.714286,0.714286,0.571429,0.800000,1.1200,0.061224,1.428571,0.375000
...,...,...,...,...,...,...,...,...,...,...
443,"(Wine_Wine, Butter_Butter)","(Bread_Bread, Apple_Apple, Chips_Chips, Milk_M...",0.476190,0.238095,0.142857,0.300000,1.2600,0.029478,1.088435,0.393939
444,(Apple_Apple),"(Chips_Chips, Butter_Butter, Bread_Bread, Milk...",0.666667,0.190476,0.142857,0.214286,1.1250,0.015873,1.030303,0.333333
445,(Bread_Bread),"(Apple_Apple, Chips_Chips, Milk_Milk, Wine_Win...",0.714286,0.142857,0.142857,0.200000,1.4000,0.040816,1.071429,1.000000
446,(Wine_Wine),"(Apple_Apple, Chips_Chips, Bread_Bread, Milk_M...",0.714286,0.190476,0.142857,0.200000,1.0500,0.006803,1.011905,0.166667


Displaying the Top Association Rules:
Finally, the code sorts the association rules based on their lift values in descending order and displays the top 10 rules using the head() function.

In [None]:
print(rules.sort_values('lift', ascending=False).head(10))

                                 antecedents  \
416    (Bread_Bread, Apple_Apple, Milk_Milk)   
233               (Bread_Bread, Chips_Chips)   
427  (Wine_Wine, Chips_Chips, Butter_Butter)   
236             (Apple_Apple, Butter_Butter)   
276  (Wine_Wine, Chips_Chips, Butter_Butter)   
279               (Bread_Bread, Apple_Apple)   
273  (Apple_Apple, Wine_Wine, Butter_Butter)   
422    (Bread_Bread, Milk_Milk, Chips_Chips)   
421  (Apple_Apple, Wine_Wine, Butter_Butter)   
282               (Bread_Bread, Chips_Chips)   

                                 consequents  antecedent support  \
416  (Wine_Wine, Chips_Chips, Butter_Butter)            0.380952   
233             (Apple_Apple, Butter_Butter)            0.380952   
427    (Bread_Bread, Apple_Apple, Milk_Milk)            0.238095   
236               (Bread_Bread, Chips_Chips)            0.476190   
276               (Bread_Bread, Apple_Apple)            0.238095   
279  (Wine_Wine, Chips_Chips, Butter_Butter)            0.52381