<a href="https://colab.research.google.com/github/raviteja-padala/Business_Analytics/blob/main/Association_Based_Strategies_E_commerce_data_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# "Implementing Association-Based Strategies for Enhanced Customer Engagement and Sales Optimization"

### **Objective:**

The objective of this Market Basket Analysis is to uncover hidden patterns and relationships within transactional data using association rule mining. By identifying associations between items frequently purchased together, we aim to improve marketing strategies, optimize product placement, and enhance the customer shopping experience.

### **Business Case:**

In the era of data-driven decision-making, businesses must harness the power of transactional data to enhance operational efficiency and customer satisfaction. Association rule mining offers an opportunity to reveal insights that can guide various aspects of the business.


**Association rules** are a data mining technique used to uncover relationships, patterns, and associations within large datasets. These relationships highlight which items or events tend to occur together in transactions or events. Association rule mining is commonly applied to transactional data, such as customer purchase histories, web clickstreams, and more, to reveal hidden insights that can guide decision-making and strategy development.

The fundamental concept behind association rule mining is the discovery of rules of the form "If X, then Y," where X and Y are sets of items. These rules help us understand which items are frequently purchased or accessed together. The strength of an association rule is measured by metrics like support, confidence, and lift.

- **Support:** The support of an itemset is the proportion of transactions in which the itemset appears. It indicates how frequently an itemset occurs in the dataset.

- **Confidence:** The confidence of a rule "X → Y" measures the likelihood that itemset Y is purchased given that itemset X is purchased. It's calculated as the proportion of transactions containing both X and Y over the transactions containing X.

- **Lift:** The lift of a rule "X → Y" measures how much more likely Y is to be purchased when X is purchased, compared to when Y is purchased independently. It's calculated as the ratio of the confidence of the rule to the support of Y.

Association rule mining is commonly used for various purposes:

1. **Market Basket Analysis:** Understanding which products are frequently purchased together helps in optimizing product placement, cross-selling, and upselling strategies in retail environments.

2. **Web Clickstream Analysis:** Revealing the sequences of web pages that users tend to visit can enhance website design and user experience.

3. **Customer Behavior Analysis:** Discovering patterns in customer behaviors can lead to personalized recommendations, loyalty programs, and customer segmentation.

4. **Supply Chain Optimization:** Identifying relationships in supply chain data can lead to more efficient inventory management and logistics planning.

5. **Fraud Detection:** Detecting unusual sequences of events can help uncover fraudulent activities.

Association rule mining algorithms, such as Apriori and FP-Growth, are employed to efficiently mine large datasets for interesting and actionable associations. These rules provide valuable insights that guide decision-making, enabling businesses to make informed strategies, optimize operations, and improve customer experiences.


**Challenges:**

1. **Pattern Discovery:** Uncovering meaningful associations from large and complex transactional datasets requires sophisticated analysis techniques.

2. **Product Placement:** Effectively placing related items together can lead to increased cross-selling and improved customer experience.

3. **Customized Recommendations:** Personalized product recommendations can greatly influence purchasing decisions and overall revenue.

**Business Strategy:**

By employing association rule mining, we can identify frequently co-occurring items and create actionable insights to drive strategic decision-making.

**Benefits:**

1. **Cross-Selling Opportunities:** Discovering associations between products can facilitate strategic cross-selling campaigns.

2. **Optimized Inventory:** Improved understanding of item relationships can guide inventory management and stock placement.

3. **Personalized Experience:** Leveraging associations for product recommendations enhances customer satisfaction and loyalty.

**Expected Outcomes:**

1. **Increased Revenue:** Implementing targeted cross-selling strategies based on association rules can lead to higher average transaction values.

2. **Improved Customer Satisfaction:** Providing relevant and personalized product suggestions enhances the shopping experience.

3. **Efficient Inventory Management:** Optimized product placement minimizes stockouts and reduces excess inventory.


# Steps involved in executing association rule mining.
Here's a general outline of the process:

**1. Data Preparation:**

- Import the relevant dataset containing transactional or event data.

**2. Data Preprocessing:**

- Handle missing values, outliers, and inconsistencies in the dataset.
- Transform the data into a suitable format for analysis, often called the "basket" format.

**3. Itemset Generation:**

- Identify all unique items in the dataset.
- Generate frequent itemsets: sets of items that appear together frequently in transactions.

**4. Rule Generation:**

- Based on the frequent itemsets, generate association rules that meet predefined support and confidence thresholds.

**5. Rule Evaluation:**

- Evaluate the generated rules using metrics like support, confidence, and lift.
- Filter out rules that do not meet the desired quality criteria.

**6. Interpretation**

- Analyze the generated association rules to understand the insights and patterns they reveal.
- Identify meaningful and actionable associations that can guide decision-making.

**7. Strategy Formulation:**

- Based on the insights gained from the association rules, develop strategies tailored to your business goals.
- These strategies could involve cross-selling, upselling, product placement, customer segmentation, and more.



# 1. Data Preparation:

In [None]:
#Loading neccesary packages
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import warnings
warnings.filterwarnings('ignore')

In [None]:
#ignoring Deprecation waring
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [None]:
#importing dataset
sales_df = pd.read_excel('http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx')

In [None]:
#shape of dataset
sales_df.shape

(541909, 8)

In [None]:
#copying dataset to carry out analysis and modifications on copied dataset
myretaildata = sales_df.copy()

# 2. Data Preprocessing

In [None]:
#Data Cleaning
myretaildata['Description'] = myretaildata['Description'].str.strip() #removes spaces from beginning and end
myretaildata.dropna(axis=0, subset=['InvoiceNo'], inplace=True) #removes duplicate invoice
myretaildata['InvoiceNo'] = myretaildata['InvoiceNo'].astype('str') #converting invoice number to be string
myretaildata = myretaildata[~myretaildata['InvoiceNo'].str.contains('C')] #remove the credit transactions
myretaildata.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [None]:
#checking the country with highest customers
myretaildata['Country'].value_counts()

United Kingdom          487622
Germany                   9042
France                    8408
EIRE                      7894
Spain                     2485
Netherlands               2363
Belgium                   2031
Switzerland               1967
Portugal                  1501
Australia                 1185
Norway                    1072
Italy                      758
Channel Islands            748
Finland                    685
Cyprus                     614
Sweden                     451
Unspecified                446
Austria                    398
Denmark                    380
Poland                     330
Japan                      321
Israel                     295
Hong Kong                  284
Singapore                  222
Iceland                    182
USA                        179
Canada                     151
Greece                     145
Malta                      112
United Arab Emirates        68
European Community          60
RSA                         58
Lebanon 

In [None]:
# Separating transactions for Germany
basket_germany = (myretaildata[myretaildata['Country'] =="Germany"]
         .groupby(['InvoiceNo', 'Description'])['Quantity']
         .sum().unstack().reset_index().fillna(0)
         .set_index('InvoiceNo'))

# filters the dataset to include only transactions from customers in Germany.
# groups the filtered data by 'InvoiceNo' and 'Description'  and sums up the 'Quantity' of each item in each transaction.
# unstacks the data, essentially pivoting the 'Description' column to become individual columns for each unique item description.
# The rows represent transactions, and the values in the columns are the quantities of items in those transactions.
# the 'InvoiceNo' column is set as the index of the DataFrame

In [None]:
#basket_germany = myretaildata.groupby(['InvoiceNo','Description']).agg({'Quantity':'sum'}).reset_index().pivot(index='InvoiceNo',columns='Description').fillna(0)

In [None]:
basket_germany.head()

Description,10 COLOUR SPACEBOY PEN,12 COLOURED PARTY BALLOONS,12 IVORY ROSE PEG PLACE SETTINGS,12 MESSAGE CARDS WITH ENVELOPES,12 PENCIL SMALL TUBE WOODLAND,12 PENCILS SMALL TUBE RED RETROSPOT,12 PENCILS SMALL TUBE SKULL,12 PENCILS TALL TUBE POSY,12 PENCILS TALL TUBE RED RETROSPOT,12 PENCILS TALL TUBE SKULLS,...,YULETIDE IMAGES GIFT WRAP SET,ZINC HEART T-LIGHT HOLDER,ZINC STAR T-LIGHT HOLDER,ZINC BOX SIGN HOME,ZINC FOLKART SLEIGH BELLS,ZINC HEART LATTICE T-LIGHT HOLDER,ZINC METAL HEART DECORATION,ZINC T-LIGHT HOLDER STAR LARGE,ZINC T-LIGHT HOLDER STARS SMALL,ZINC WILLIE WINKIE CANDLE STICK
InvoiceNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536527,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536840,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536861,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536983,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
#converting all positive vaues to 1 and everything else to 0
def my_encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

my_basket_sets = basket_germany.applymap(my_encode_units)

In [None]:
my_basket_sets.drop('POSTAGE', inplace=True, axis=1) #Remove "postage" as an item

# 3. Itemset Generation

In [None]:
#Generatig frequent itemsets using APRIORI algorithm
my_frequent_itemsets = apriori(my_basket_sets, min_support=0.05, use_colnames=True)

In [None]:
my_frequent_itemsets

Unnamed: 0,support,itemsets
0,0.102845,(6 RIBBONS RUSTIC CHARM)
1,0.070022,(ALARM CLOCK BAKELIKE PINK)
2,0.065646,(CHARLOTTE BAG APPLES DESIGN)
3,0.050328,(CHILDRENS CUTLERY DOLLY GIRL)
4,0.061269,(COFFEE MUG APPLES DESIGN)
5,0.063457,(FAWN BLUE HOT WATER BOTTLE)
6,0.07221,(GUMBALL COAT RACK)
7,0.056893,(IVORY KITCHEN SCALES)
8,0.063457,(JAM JAR WITH PINK LID)
9,0.091904,(JAM MAKING SET PRINTED)


# 4. Rule Generation

In [None]:
#generating rules using ASSOSIATION RULES
my_rules = association_rules(my_frequent_itemsets, metric="lift", min_threshold=1)

# metric="lift": This parameter specifies the metric to be used for evaluating the association rules. In this case, the "lift" metric is used.
# Lift is a measure of how much more likely two items are to be purchased together compared to if they were purchased independently.
# A lift value greater than 1 indicates a positive association.

# min_threshold=1: This parameter sets the minimum threshold for the lift value. Only rules with a lift value greater than or equal to 1 will be considered.

# 5. Rule Evaluation:

In [None]:
# Evaluate and print the association rules
for index, row in my_rules.iterrows():
    print("Rule:", row['antecedents'], "->", row['consequents'])
    print("Support:", row['support'])
    print("Confidence:", row['confidence'])
    print("Lift:", row['lift'])
    print("Leverage:", row['leverage'])
    print("Conviction:", row['conviction'])
    print("="*50)


Rule: frozenset({'PLASTERS IN TIN CIRCUS PARADE'}) -> frozenset({'PLASTERS IN TIN WOODLAND ANIMALS'})
Support: 0.06783369803063458
Confidence: 0.5849056603773585
Lift: 4.242887091943696
Leverage: 0.05184607060603594
Conviction: 2.0769842848617466
Rule: frozenset({'PLASTERS IN TIN WOODLAND ANIMALS'}) -> frozenset({'PLASTERS IN TIN CIRCUS PARADE'})
Support: 0.06783369803063458
Confidence: 0.4920634920634921
Lift: 4.242887091943696
Leverage: 0.05184607060603594
Conviction: 1.7404266958424508
Rule: frozenset({'PLASTERS IN TIN CIRCUS PARADE'}) -> frozenset({'ROUND SNACK BOXES SET OF 4 FRUITS'})
Support: 0.05032822757111598
Confidence: 0.4339622641509434
Lift: 2.754454926624738
Leverage: 0.03205665337157468
Conviction: 1.4883296863603208
Rule: frozenset({'ROUND SNACK BOXES SET OF 4 FRUITS'}) -> frozenset({'PLASTERS IN TIN CIRCUS PARADE'})
Support: 0.05032822757111598
Confidence: 0.3194444444444445
Lift: 2.7544549266247382
Leverage: 0.03205665337157468
Conviction: 1.2989773589961149
Rule: fro

In [None]:
#viewing top 100 rules
my_rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(PLASTERS IN TIN CIRCUS PARADE),(PLASTERS IN TIN WOODLAND ANIMALS),0.115974,0.137856,0.067834,0.584906,4.242887,0.051846,2.076984,0.86458
1,(PLASTERS IN TIN WOODLAND ANIMALS),(PLASTERS IN TIN CIRCUS PARADE),0.137856,0.115974,0.067834,0.492063,4.242887,0.051846,1.740427,0.886524
2,(PLASTERS IN TIN CIRCUS PARADE),(ROUND SNACK BOXES SET OF 4 FRUITS),0.115974,0.157549,0.050328,0.433962,2.754455,0.032057,1.48833,0.720512
3,(ROUND SNACK BOXES SET OF 4 FRUITS),(PLASTERS IN TIN CIRCUS PARADE),0.157549,0.115974,0.050328,0.319444,2.754455,0.032057,1.298977,0.75607
4,(PLASTERS IN TIN CIRCUS PARADE),(ROUND SNACK BOXES SET OF4 WOODLAND),0.115974,0.245077,0.056893,0.490566,2.001685,0.02847,1.481887,0.56607
5,(ROUND SNACK BOXES SET OF4 WOODLAND),(PLASTERS IN TIN CIRCUS PARADE),0.245077,0.115974,0.056893,0.232143,2.001685,0.02847,1.15129,0.662876
6,(PLASTERS IN TIN SPACEBOY),(PLASTERS IN TIN WOODLAND ANIMALS),0.107221,0.137856,0.061269,0.571429,4.145125,0.046488,2.01167,0.849877
7,(PLASTERS IN TIN WOODLAND ANIMALS),(PLASTERS IN TIN SPACEBOY),0.137856,0.107221,0.061269,0.444444,4.145125,0.046488,1.607002,0.880076
8,(ROUND SNACK BOXES SET OF4 WOODLAND),(PLASTERS IN TIN WOODLAND ANIMALS),0.245077,0.137856,0.074398,0.303571,2.202098,0.040613,1.237951,0.723103
9,(PLASTERS IN TIN WOODLAND ANIMALS),(ROUND SNACK BOXES SET OF4 WOODLAND),0.137856,0.245077,0.074398,0.539683,2.202098,0.040613,1.640006,0.633174


# 6. Interpretation

Rules were generated and evaluated using metrics such as support, confidence, lift, leverage, conviction, and a custom metric called "zhangs_metric". Each row corresponds to a rule and contains information about the antecedents (items before the arrow), consequents (items after the arrow), and the calculated metrics. Let's interpret the first few rules:

1. **Rule 0:**
   - Antecedents: PLASTERS IN TIN CIRCUS PARADE
   - Consequents: PLASTERS IN TIN WOODLAND ANIMALS
   - Support: 0.067834 (67.83% of transactions contain both items)
   - Confidence: 0.584906 (58.49% of transactions with the antecedent also have the consequent)
   - Lift: 4.242887 (Items are 4.24 times more likely to be purchased together than independently)
   - Leverage: 0.051846 (Measures the difference between observed and expected co-occurrence)
   - Conviction: 2.076984 (Higher conviction implies stronger association)
   - Zhang's Metric: 0.864580 (A custom metric indicating association)

2. **Rule 1:**
   - Antecedents: PLASTERS IN TIN WOODLAND ANIMALS
   - Consequents: PLASTERS IN TIN CIRCUS PARADE
   - Support: 0.067834
   - Confidence: 0.492063
   - Lift: 4.242887
   - Leverage: 0.051846
   - Conviction: 1.740427
   - Zhang's Metric: 0.886524

3. **Rule 2:**
   - Antecedents: PLASTERS IN TIN CIRCUS PARADE
   - Consequents: ROUND SNACK BOXES SET OF 4 FRUITS
   - Support: 0.050328
   - Confidence: 0.433962
   - Lift: 2.754455
   - Leverage: 0.032057
   - Conviction: 1.488330
   - Zhang's Metric: 0.720512

And so on for the remaining rules.

These rules provide insights into item associations based on their occurrence in transactions. The metrics help assess the strength and significance of these associations. For example, a high confidence and lift indicate a strong association between the antecedent and consequent items. These rules can guide strategic decisions such as product recommendations, cross-selling, and marketing campaigns to maximize the benefits of identified associations.

# 7. Strategy Formulation

Here are some strategies that can be formulated based on the above association rules:

1. **Strategy: Cross-Promotion of Plasters in Tin**
   - Rule 0: If customers buy "PLASTERS IN TIN CIRCUS PARADE," recommend "PLASTERS IN TIN WOODLAND ANIMALS."
   - Strategy: Create a promotional offer where purchasing the "CIRCUS PARADE" plasters leads to a discounted bundle with "WOODLAND ANIMALS" plasters. This encourages customers to explore related products.

2. **Strategy: Bundle Offer for Plasters**
   - Rule 2: If customers buy "PLASTERS IN TIN CIRCUS PARADE," consider suggesting "ROUND SNACK BOXES SET OF 4 FRUITS."
   - Strategy: Offer a special deal where purchasing "CIRCUS PARADE" plasters includes a complimentary "FRUITS" snack box set. This leverages the strong association and encourages larger purchases.

3. **Strategy: Plasters and Snack Box Combo**
   - Rule 4: Customers purchasing "PLASTERS IN TIN CIRCUS PARADE" also tend to buy "ROUND SNACK BOXES SET OF4 WOODLAND."
   - Strategy: Promote a combo package that includes both "CIRCUS PARADE" plasters and "WOODLAND" snack box sets at a discounted price. Highlight the convenience of having both items together.

4. **Strategy: Targeted Product Recommendations**
   - Rule 6: Customers buying "PLASTERS IN TIN SPACEBOY" are associated with "PLASTERS IN TIN WOODLAND ANIMALS."
   - Strategy: Implement personalized recommendations where customers who purchase space-themed products receive suggestions for related items, such as the "WOODLAND ANIMALS" plasters.

5. **Strategy: Upsell with Confidence**
   - Rule 8: Customers purchasing "ROUND SNACK BOXES SET OF4 WOODLAND" have a higher likelihood of buying "PLASTERS IN TIN WOODLAND ANIMALS."
   - Strategy: Offer an upsell opportunity by recommending "WOODLAND ANIMALS" plasters to customers who purchase snack box sets. Emphasize the matching themes of the products.

6. **Strategy: Mutual Promotion of Plasters**
   - Rule 9: If customers buy "PLASTERS IN TIN WOODLAND ANIMALS," promote "ROUND SNACK BOXES SET OF4 WOODLAND."
   - Strategy: Implement a reciprocal promotion where customers who purchase "WOODLAND ANIMALS" plasters are suggested to add "WOODLAND" snack box sets, creating a mutually beneficial promotion.

These strategies leverage the associations identified in the rules to enhance the shopping experience, increase transaction value, and encourage customers to explore related products. Remember to track the implementation of these strategies, monitor customer responses, and make adjustments based on the outcomes and evolving customer preferences.

## **Conclusion:**

Leveraging association rule mining as part of our business strategy holds the potential to drive significant improvements across various aspects of our operations. By uncovering hidden patterns in transactional data, we can create actionable insights that influence marketing efforts, inventory management, and customer engagement. This data-driven approach aligns with our goal of enhancing customer satisfaction and maximizing revenue.

## Thank you for Reading till the end

## Raviteja
https://www.linkedin.com/in/raviteja-padala/