# What is Market Basket Analysis ?
- Market basket analysis is a data mining technique used by retailers to increase sales by better understanding customer purchasing patterns. 
- It involves analyzing large data sets, such as purchase history, to reveal product groupings, as well as products that are likely to be purchased together.

# Types of market basket analysis
Retailers should understand the following types of market basket analysis:

- **Predictive market basket analysis**. This type considers items purchased in sequence to determine cross-sell.
- **Differential market basket analysis**. This type considers data across different stores, as well as purchases from different customer groups during different times of the day, month or year. If a rule holds in one dimension, such as store, time period or customer group, but does not hold in the others, analysts can determine the factors responsible for the exception. These insights can lead to new product offers that drive higher sales.

# Algorithms for market basket analysis
- In market basket analysis, association rules are used to predict the likelihood of products being purchased together.
- Association rules count the frequency of items that occur together, seeking to find associations that occur far more often than expected.

# Examples of market basket analysis
- Amazon's website uses a well-known example of market basket analysis. On a product page, Amazon presents users with related products, under the headings of **"Frequently bought together"** and **"Customers who bought this item also bought."**

# Benefits of market basket analysis
- Market basket analysis can **increase sales and customer satisfaction**. Using data to determine that products are often purchased together, retailers can **optimize product placement**, offer special deals and create new product bundles to encourage further sales of these combinations.

- These improvements can generate additional sales for the retailer, while **making the shopping experience more productive** and valuable for customers. By using market basket analysis, customers may feel a stronger sentiment or brand loyalty toward the company.

# About Project:
- Here we are going to use create a dummy data which will imitate the actual store transactional data.
- Since this is dummy data created by us there is no need for preprocessing and EDA.
- But we do need to convert the data into Transaction format using TransactionEncoder.
- Apply the association rule and display the associations which are formed.
- Finally a custom function to predict on unseen data.

# Code:

In [1]:
# Install the necessary library need
!pip install mlxtend



In [2]:
# Imports are written here
import joblib
import random
import pandas as pd

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Creating Dummy Data: 

In [3]:
# List of possible items
items = ['Bread', 'Milk', 'Eggs', 'Cheese', 'Yogurt', 'Butter']

# Number of transactions
num_transactions = 100000

# Generate dummy data
data = []
for transaction_id in range(1, num_transactions + 1):
    num_items_in_transaction = random.randint(1, len(items))
    items_purchased = random.sample(items, num_items_in_transaction)
    data.append({'Transaction ID': transaction_id, 'Items Purchased': ', '.join(items_purchased)})

In [4]:
# Splitting the 'Items Purchased' string in each dictionary of 'data' into individual items,
# creating a list of transactions where each transaction is represented as a list of items.
transactions = [d['Items Purchased'].split(', ') for d in data]

In [5]:
# Displaying the transactions

transactions[:10]

[['Butter'],
 ['Bread', 'Cheese', 'Milk'],
 ['Milk', 'Yogurt', 'Cheese', 'Butter', 'Eggs'],
 ['Milk', 'Yogurt'],
 ['Butter', 'Bread'],
 ['Milk', 'Yogurt', 'Bread'],
 ['Cheese'],
 ['Eggs', 'Milk', 'Butter', 'Yogurt', 'Bread'],
 ['Yogurt', 'Cheese', 'Butter', 'Bread', 'Eggs', 'Milk'],
 ['Bread']]

- Since the Data Creation is done we can convert it to dataframe but before that we need to encode it.

In [6]:
# Convert the data to the format required by mlxtend
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
joblib.dump(te, './transaction_encoder.joblib') # Save the encoder to a file
df = pd.DataFrame(te_ary, columns=te.columns_)  # converting it to dataframe

In [7]:
# Displaying the columns of the dataframe
te.columns_

['Bread', 'Butter', 'Cheese', 'Eggs', 'Milk', 'Yogurt']

In [8]:
# Displaying the actual dataframe
df

Unnamed: 0,Bread,Butter,Cheese,Eggs,Milk,Yogurt
0,False,True,False,False,False,False
1,True,False,True,False,True,False
2,False,True,True,True,True,True
3,False,False,False,False,True,True
4,True,True,False,False,False,False
...,...,...,...,...,...,...
99995,True,False,True,False,True,True
99996,True,True,True,True,True,True
99997,False,False,False,False,False,True
99998,True,True,True,False,True,True


In [9]:
# Saving the dataframe into csv file for later use
df.to_csv("./transctionalData.csv", index=False)

In [10]:
# Find frequent itemsets using Apriori
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

In [11]:
# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

In [12]:
# Frequent Itemsets
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.58510,(Bread)
1,0.58372,(Butter)
2,0.58483,(Cheese)
3,0.58257,(Eggs)
4,0.58409,(Milk)
...,...,...
58,0.19453,"(Butter, Cheese, Milk, Bread, Yogurt)"
59,0.19335,"(Butter, Eggs, Milk, Bread, Yogurt)"
60,0.19386,"(Eggs, Cheese, Milk, Bread, Yogurt)"
61,0.19455,"(Butter, Eggs, Cheese, Milk, Yogurt)"


In [13]:
# Association Rules
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Bread),(Butter),0.58510,0.58372,0.38990,0.666382,1.141612,0.048365,1.247774,0.298977
1,(Butter),(Bread),0.58372,0.58510,0.38990,0.667957,1.141612,0.048365,1.249538,0.297986
2,(Bread),(Cheese),0.58510,0.58483,0.39098,0.668228,1.142602,0.048796,1.251370,0.300806
3,(Cheese),(Bread),0.58483,0.58510,0.39098,0.668536,1.142602,0.048796,1.251720,0.300610
4,(Eggs),(Bread),0.58257,0.58510,0.38846,0.666804,1.139641,0.047598,1.245213,0.293536
...,...,...,...,...,...,...,...,...,...,...
408,"(Eggs, Cheese, Yogurt)","(Milk, Bread, Butter)",0.29205,0.29278,0.16609,0.568704,1.942428,0.080584,1.639755,0.685331
409,"(Milk, Bread, Yogurt)","(Eggs, Butter, Cheese)",0.29160,0.29303,0.16609,0.569582,1.943766,0.080642,1.642518,0.685396
410,"(Milk, Bread, Cheese)","(Eggs, Butter, Yogurt)",0.29244,0.29025,0.16609,0.567946,1.956746,0.081209,1.642733,0.691033
411,"(Milk, Cheese, Yogurt)","(Eggs, Bread, Butter)",0.29070,0.29063,0.16609,0.571345,1.965885,0.081604,1.654874,0.692687


In [19]:
# Prediction on Unseen Data:

def recommend_items(unseen_data):
    """
        This function is written to take the data encode it and get the Association as output 

        :param unseen_data: Usually a list of strings containing the products.
        :return : dict of the assocations for the unseen_data.
    """
    te_unseen_ary = te.transform([unseen_data])
    df_unseen = pd.DataFrame(te_unseen_ary, columns=te.columns_)

    filtered_rules = rules[rules['antecedents'].apply(lambda x: set(unseen_data).issubset(set(x)))]

    output_dict = {'response': {}}

    for _, rule in filtered_rules.iterrows():
        antecedent = ', '.join(rule['antecedents'])
        consequent = ', '.join(rule['consequents'])
        score = rule['confidence']
        output_dict['response'][f"{antecedent} -> {consequent}"] = score

    return output_dict
  

In [20]:
# Convert the unseen data to the format required by mlxtend
unseen_data = ['Milk', 'Bread'] 
output_dict = recommend_items(unseen_data)
 
# Display the output dictionary
print(output_dict)

{'response': {'Milk, Bread -> Butter': 0.7496415403523146, 'Milk, Bread -> Cheese': 0.74877099549365, 'Milk, Bread -> Eggs': 0.7468506759524784, 'Milk, Bread -> Yogurt': 0.7466202376075379, 'Milk, Bread, Butter -> Cheese': 0.8006694446341963, 'Milk, Bread, Cheese -> Butter': 0.8016003282724662, 'Milk, Bread -> Butter, Cheese': 0.6002150757886112, 'Eggs, Bread, Milk -> Butter': 0.7976619013336076, 'Milk, Bread, Butter -> Eggs': 0.7946922604003006, 'Milk, Bread -> Eggs, Butter': 0.595734330192544, 'Milk, Bread, Butter -> Yogurt': 0.7974929981556118, 'Milk, Bread, Yogurt -> Butter': 0.8007201646090535, 'Milk, Bread -> Butter, Yogurt': 0.5978338795575584, 'Eggs, Bread, Milk -> Cheese': 0.8003702560937982, 'Milk, Bread, Cheese -> Eggs': 0.7983176036109972, 'Milk, Bread -> Eggs, Cheese': 0.5977570667759114, 'Milk, Bread, Yogurt -> Cheese': 0.7983196159122085, 'Milk, Bread, Cheese -> Yogurt': 0.7960265353576803, 'Milk, Bread -> Cheese, Yogurt': 0.5960415813191314, 'Eggs, Bread, Milk -> Yogurt

In [22]:
# Convert the unseen data to the format required by mlxtend
unseen_data = [ 'Bread', 'Milk', 'Butter'] 
output_dict = recommend_items(unseen_data)
 
# Display the output dictionary
print(output_dict)

{'response': {'Milk, Bread, Butter -> Cheese': 0.8006694446341963, 'Milk, Bread, Butter -> Eggs': 0.7946922604003006, 'Milk, Bread, Butter -> Yogurt': 0.7974929981556118, 'Eggs, Bread, Milk, Butter -> Cheese': 0.8362057850174067, 'Milk, Bread, Butter, Cheese -> Eggs': 0.8299633137104343, 'Milk, Bread, Butter -> Eggs, Cheese': 0.6645262654552907, 'Milk, Bread, Butter, Cheese -> Yogurt': 0.8298353382817166, 'Milk, Bread, Butter, Yogurt -> Cheese': 0.8331406055933873, 'Milk, Bread, Butter -> Yogurt, Cheese': 0.6644237994398525, 'Eggs, Bread, Milk, Butter -> Yogurt': 0.831005286457214, 'Milk, Bread, Butter, Yogurt -> Eggs': 0.8280868559681357, 'Milk, Bread, Butter -> Eggs, Yogurt': 0.6603934694992828, 'Butter, Eggs, Milk, Bread, Yogurt -> Cheese': 0.8590121541246444, 'Butter, Eggs, Milk, Bread, Cheese -> Yogurt': 0.8536698190789472, 'Butter, Milk, Bread, Yogurt, Cheese -> Eggs': 0.8538014702102502, 'Eggs, Bread, Milk, Butter -> Cheese, Yogurt': 0.7138436412085787, 'Milk, Bread, Butter, Yog

In [None]:
# That's it folks, you can play round with it.
# Try EDA make on large Dataset.
# Twerk around the confidence level and metric used for associations.
# Thats a wrap!!!

# [Check out previous Market Basket Analysis](https://www.kaggle.com/code/meetnagadia/market-basket-analysis)

In [None]:
# End Note:
# -----------------------------------------------------------------------------
# Market Basket Analysis Project Summary:

# In this project, we leveraged market basket analysis to uncover associations
# between items in transactional data. The workflow involved exploring and
# preprocessing the dataset, transforming it into a suitable format for
# association rule mining, and applying the Apriori algorithm to generate
# meaningful rules.

# Key Steps:
# 1. Data Creation: We created dummy data that would imitate the actual Store transctions.
# 2. Data Preprocessing: Prepared the data by encoding transactions.
# 3. Association Rule Mining: Used Apriori to identify frequent itemsets and generate rules.
# 4. Serialization: Saved the TransactionEncoder for potential future use.
# 5. Prediction on Unseen Data: Applied the saved encoder to make predictions on new data.

# The project aimed to reveal insights into item associations, allowing for data-driven
# decision-making in areas such as inventory management, marketing strategies, and
# customer recommendations.

# Moving Forward:
# The workflow and insights gained from this analysis serve as a foundation for further
# exploration and optimization. Considerations for future work may include refining
# preprocessing steps, experimenting with different association rule algorithms, or
# integrating the findings into business processes.

# Thank you for joining this journey through market basket analysis!

# -----------------------------------------------------------------------------