<a href="https://colab.research.google.com/github/raviteja-padala/Business_Analytics/blob/main/Association_rules_Practice1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementation of Association-APRIORI Algorithm


## **Objective:**
To leverage association rule mining to identify meaningful patterns and relationships within customer purchase behaviors. By applying association rule mining techniques with a minimum support of 2 and a confidence of 50%, we aim to formulate targeted cross-selling strategies that increase sales and customer engagement.


## Association rules
 Association rules are a way to find patterns in data, especially when things tend to happen together. Imagine there is a dataset of customer purchases. Association rules help to discover which items are often bought together. For example, if people often buy bread and butter together, association rules would point out this connection.

In simple terms, association rules tell you that when one thing happens, there's a good chance another thing will happen too. This can be super useful for businesses to understand customer behaviors and make smart decisions, like suggesting related products or creating special deals to increase sales.

# Steps involved in executing association rule mining.
Here's a general outline of the process:

**1. Data Preparation:**

- Import the relevant dataset containing transactional or event data.

**2. Data Preprocessing:**

- Handle missing values, outliers, and inconsistencies in the dataset.
- Transform the data into a suitable format for analysis, often called the "basket" format.

**3. Itemset Generation:**

- Identify all unique items in the dataset.
- Generate frequent itemsets: sets of items that appear together frequently in transactions.

**4. Rule Generation:**

- Based on the frequent itemsets, generate association rules that meet predefined support and confidence thresholds.

**5. Rule Evaluation:**

- Evaluate the generated rules using metrics like support, confidence, and lift.
- Filter out rules that do not meet the desired quality criteria.

**6. Interpretation**

- Analyze the generated association rules to understand the insights and patterns they reveal.
- Identify meaningful and actionable associations that can guide decision-making.

**7. Strategy Formulation:**

- Based on the insights gained from the association rules, develop strategies tailored to your business goals.
- These strategies could involve cross-selling, upselling, product placement, customer segmentation, and more.



## Apply the association rule mining to get the association rules with min support of 2 and confidence of 50%

# 1. Data Preparation:

In [None]:
import pandas as pd # to create Dataframe
import warnings # to ignore warnings
warnings.filterwarnings('ignore')
#ignoring Deprecation waring
warnings.filterwarnings("ignore", category=DeprecationWarning)

data = {
    'Transaction_ID': ['T100', 'T200', 'T300', 'T400', 'T500', 'T600', 'T700', 'T800', 'T900'],
    'Item_ID': ['I1, I2, I5', 'I2, I4', 'I2, I3', 'I1, I2, I4', 'I1, I3', 'I2, I3', 'I1, I3', 'I1, I2, I3, I5', 'I1, I2, I3']
}

# Create a DataFrame
df = pd.DataFrame(data, columns=['Transaction_ID', 'Item_ID'])

# Replace 'I' with 'Item-' in the Item_ID column
df['Item_ID'] = df['Item_ID'].str.replace('I', 'Item-')

# Display the modified DataFrame
df

Unnamed: 0,Transaction_ID,Item_ID
0,T100,"Item-1, Item-2, Item-5"
1,T200,"Item-2, Item-4"
2,T300,"Item-2, Item-3"
3,T400,"Item-1, Item-2, Item-4"
4,T500,"Item-1, Item-3"
5,T600,"Item-2, Item-3"
6,T700,"Item-1, Item-3"
7,T800,"Item-1, Item-2, Item-3, Item-5"
8,T900,"Item-1, Item-2, Item-3"


# 2. Data Preprocessing:

In [None]:
# Splitting each comma-separated item list into individual items
item_lists = [items.split(', ') for items in df['Item_ID']]

# Creating a one-hot encoded DataFrame from the item lists
# Prefix and prefix separator are set to empty to avoid column prefixes
encoded_df = pd.get_dummies(pd.DataFrame(item_lists), prefix='', prefix_sep='')

# Adding back the 'Transaction_ID' column from the original DataFrame
encoded_df['Transaction_ID'] = df['Transaction_ID']

# Grouping the one-hot encoded DataFrame by 'Transaction_ID'
# and summing up the occurrences of each item
grouped_df = encoded_df.groupby('Transaction_ID').sum()

grouped_df

Unnamed: 0_level_0,Item-1,Item-2,Item-2,Item-3,Item-4,Item-3,Item-4,Item-5,Item-5
Transaction_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
T100,1,0,1,0,0,0,0,1,0
T200,0,1,0,0,1,0,0,0,0
T300,0,1,0,1,0,0,0,0,0
T400,1,0,1,0,0,0,1,0,0
T500,1,0,0,1,0,0,0,0,0
T600,0,1,0,1,0,0,0,0,0
T700,1,0,0,1,0,0,0,0,0
T800,1,0,1,0,0,1,0,0,1
T900,1,0,1,0,0,1,0,0,0


# 3. Itemset Generation:

In [None]:
#importing libraries
from mlxtend.frequent_patterns import apriori, association_rules

# Applying the Apriori algorithm to find frequent itemsets
# min_support is set to a relative value (2 divided by the total number of transactions)
# use_colnames is set to True to use the actual item names in the resulting DataFrame
frequent_itemsets = apriori(grouped_df, min_support=2/grouped_df.shape[0], use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.666667,(Item-1)
1,0.333333,(Item-2)
2,0.444444,(Item-2)
3,0.444444,(Item-3)
4,0.222222,(Item-3)
5,0.444444,"(Item-2, Item-1)"
6,0.222222,"(Item-3, Item-1)"
7,0.222222,"(Item-3, Item-1)"
8,0.222222,"(Item-3, Item-2)"
9,0.222222,"(Item-3, Item-2)"


# 4. Assosiation Rule Generation:

In [None]:

# The metric 'confidence' is used to evaluate the strength of the rules
# min_threshold is set to 0.5, ensuring that only rules with at least 50% confidence are included

association_rules_df = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.5)

# 5. Rule Evaluation:

In [None]:
# Filter rules based on  min support of 2 and confidence of 50%

min_support = 0.2
min_confidence = 0.5


filtered_rules = association_rules_df[
    (association_rules_df['support'] >= min_support) &
    (association_rules_df['confidence'] >= min_confidence)
]

# Sort the rules by lift in descending order
sorted_rules = filtered_rules.sort_values(by='lift', ascending=False)

# Display the filtered and sorted rules
sorted_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
3,(Item-3),(Item-2),0.222222,0.444444,0.222222,1.0,2.25,0.123457,inf,0.714286
4,(Item-2),(Item-3),0.444444,0.222222,0.222222,0.5,2.25,0.123457,1.555556,1.0
6,"(Item-3, Item-1)",(Item-2),0.222222,0.444444,0.222222,1.0,2.25,0.123457,inf,0.714286
7,"(Item-2, Item-1)",(Item-3),0.444444,0.222222,0.222222,0.5,2.25,0.123457,1.555556,1.0
8,(Item-3),"(Item-2, Item-1)",0.222222,0.444444,0.222222,1.0,2.25,0.123457,inf,0.714286
9,(Item-2),"(Item-3, Item-1)",0.444444,0.222222,0.222222,0.5,2.25,0.123457,1.555556,1.0
0,(Item-2),(Item-1),0.444444,0.666667,0.444444,1.0,1.5,0.148148,inf,0.6
1,(Item-1),(Item-2),0.666667,0.444444,0.444444,0.666667,1.5,0.148148,1.666667,1.0
2,(Item-3),(Item-1),0.222222,0.666667,0.222222,1.0,1.5,0.074074,inf,0.428571
5,"(Item-2, Item-3)",(Item-1),0.222222,0.666667,0.222222,1.0,1.5,0.074074,inf,0.428571


# 6. Interpretation



1. If a customer buys "Item-3," there's a 22.22% chance that they will also buy "Item-2." The confidence is 100%, indicating that whenever a customer buys "Item-3," they always buy "Item-2" as well. The lift is 2.25, suggesting a positive association between these items. The leverage is 0.123457, indicating a small positive impact on the confidence due to the association.

2. If a customer buys "Item-2," there's a 22.22% chance that they will also buy "Item-3." The confidence is 50%, indicating that whenever a customer buys "Item-2," there's a 50% chance they will buy "Item-3." The lift, leverage, and conviction metrics indicate a similar pattern of association as the previous rule.

3. If a customer buys "Item-3" and "Item-1," there's a 22.22% chance they will also buy "Item-2." The confidence is 100%, implying a strong association between these items in this combination.

4. If a customer buys "Item-2" and "Item-1," there's a 22.22% chance they will also buy "Item-3." The confidence is 50%, indicating that whenever a customer buys "Item-2" and "Item-1," there's a 50% chance they will also buy "Item-3."

5. If a customer buys "Item-3," there's a 22.22% chance they will also buy "Item-2" and "Item-1." The confidence is 100%, suggesting a strong association between these items when bought together.

6. If a customer buys "Item-2," there's a 22.22% chance they will also buy "Item-3" and "Item-1." The confidence is 50%, indicating that whenever a customer buys "Item-2," there's a 50% chance they will also buy "Item-3" and "Item-1."

7. If a customer buys "Item-2," there's a 44.44% chance they will also buy "Item-1." The confidence is 100%, indicating that whenever a customer buys "Item-2," they always buy "Item-1" as well.

8. If a customer buys "Item-1," there's a 44.44% chance they will also buy "Item-2." The confidence is 66.67%, suggesting a moderate association between these items.

9. If a customer buys "Item-3," there's a 22.22% chance they will also buy "Item-1." The confidence is 100%, indicating that whenever a customer buys "Item-3," they always buy "Item-1" as well.

10. If a customer buys "Item-2" and "Item-3," there's a 22.22% chance they will also buy "Item-1." The confidence is 100%, indicating a strong association between these items when bought together.

These rules provide insights into item associations based on transaction data. The interpretation helps identify patterns of co-occurrence and potential cross-selling opportunities.

# 7. Strategy Formulation


1. **Cross-Sell Enhancement**:
   - Strategy: Promote the purchase of "Item-2" alongside "Item-3."
   - Approach: Leverage the strong association between "Item-2" and "Item-3" to create bundled offers or recommendations. Encourage customers who buy "Item-3" to also consider adding "Item-2" to their cart.

2. **Complementary Product Promotion**:
   - Strategy: Recommend "Item-3" to customers who purchase "Item-2."
   - Approach: Utilize the association between "Item-2" and "Item-3" to suggest complementary products. When a customer buys "Item-2," provide personalized recommendations for "Item-3" to enhance their shopping experience.

3. **Bundle Purchase Incentives**:
   - Strategy: Encourage the purchase of "Item-2" when both "Item-3" and "Item-1" are in the cart.
   - Approach: Capitalize on the rule that customers who buy "Item-3" and "Item-1" are likely to buy "Item-2" as well. Offer discounts or incentives for adding "Item-2" to their cart when both "Item-3" and "Item-1" are selected.

4. **Cross-Category Recommendations**:
   - Strategy: Suggest "Item-3" to customers who buy "Item-2" and "Item-1."
   - Approach: Since customers purchasing "Item-2" and "Item-1" tend to buy "Item-3" as well, provide targeted recommendations for "Item-3" to users who have "Item-2" and "Item-1" in their purchase history.

5. **Promote Combinations**:
   - Strategy: Highlight the association between "Item-2" and "Item-1."
   - Approach: Create marketing campaigns that emphasize the frequent purchase pattern of "Item-2" and "Item-1." Showcase the benefits of using these items together to encourage customers to buy both.

6. **Enhanced Product Bundles**:
   - Strategy: Bundle "Item-1" with "Item-2" to promote combined purchases.
   - Approach: Build special packages or bundles containing both "Item-1" and "Item-2." Promote these bundles as value deals to encourage customers to buy both items at once.



### **Conclusion:**
Through the implementation of association rule mining with defined thresholds, we have successfully identified strong and meaningful relationships between various products in our dataset. This insight has allowed us to create targeted cross-selling strategies that promote related product purchases. By recommending complementary products to customers, offering bundle deals, and highlighting frequently co-purchased items, we can influence purchase decisions and increase sales. Continuous monitoring and adaptation of these strategies based on customer responses will enable us to maintain a competitive edge and provide a more personalized shopping journey for our customers.

## Thank you for reading till the end

## **-Raviteja**
https://www.linkedin.com/in/raviteja-padala/