# Association Rules


# Data Preprocessing:

Pre-process the dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format.  


In [23]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules


In [11]:
# Load the dataset
file_path = r'C:\Users\amerk\assignment folder\Online retail.xlsx'  
df = pd.read_excel(file_path, sheet_name='Sheet1')
df

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt
...,...
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."
7497,chicken
7498,"escalope,green tea"


In [12]:
# Display the first few rows of the dataset
df.head()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt


# Split Items

In [14]:
# Split the single column into multiple items
df['Items'] = df.iloc[:, 0].str.split(',')

# Convert the DataFrame to a list of transactions
transactions = df['Items'].tolist()

# Display a sample of transactions
print(transactions[:5])


[['burgers', 'meatballs', 'eggs'], ['chutney'], ['turkey', 'avocado'], ['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea'], ['low fat yogurt']]


# Handle Missing Values

In [15]:
# Remove transactions that contain missing values
transactions = [transaction for transaction in transactions if None not in transaction and '' not in transaction]

# Display the cleaned transactions
print(transactions[:5])

[['burgers', 'meatballs', 'eggs'], ['chutney'], ['turkey', 'avocado'], ['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea'], ['low fat yogurt']]


# Remove Duplicates

In [16]:
# Remove duplicate transactions
transactions = [list(set(transaction)) for transaction in transactions]

# Display the transactions after removing duplicates
print(transactions[:5])

[['eggs', 'meatballs', 'burgers'], ['chutney'], ['avocado', 'turkey'], ['green tea', 'whole wheat rice', 'mineral water', 'milk', 'energy bar'], ['low fat yogurt']]


# Convert to One-Hot Encoding

In [20]:
# Convert transactions to one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_trans = pd.DataFrame(te_ary, columns=te.columns_)

# Display the one-hot encoded DataFrame
df_trans.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


# Association Rule Mining:

1) Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.

2) Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.

3) Set appropriate threshold for support, confidence and lift to extract meaning full rules.


# Apply the Apriori Algorithm

In [22]:
# Apply Apriori Algorithm with a minimum support threshold
min_support = 0.01  # You can adjust this threshold
frequent_itemsets = apriori(df_trans, min_support=min_support, use_colnames=True)

# Display the frequent itemsets
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.020267,(almonds)
1,0.0332,(avocado)
2,0.0108,(barbecue sauce)
3,0.014267,(black tea)
4,0.011467,(body spray)


# Generate and Filter Association Rules

In [24]:
# Generate association rules with a minimum confidence threshold
min_confidence = 0.5  # You can adjust this threshold
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

# Adding lift to the rules
rules['lift'] = rules['confidence'] / rules['consequent support']

# Filter rules based on lift
min_lift = 1.0  # You can adjust this threshold
rules = rules[rules['lift'] >= min_lift]

# Display the rules
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(eggs, ground beef)",(mineral water),0.02,0.238267,0.010133,0.506667,2.126469,0.005368,1.544054,0.540548
1,"(ground beef, milk)",(mineral water),0.022,0.238267,0.011067,0.50303,2.111207,0.005825,1.532756,0.538177


# Interpret the Results

In [26]:
# Display the sorted rules by lift
sorted_rules = rules.sort_values(by='lift', ascending=False)
sorted_rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(eggs, ground beef)",(mineral water),0.02,0.238267,0.010133,0.506667,2.126469,0.005368,1.544054,0.540548
1,"(ground beef, milk)",(mineral water),0.022,0.238267,0.011067,0.50303,2.111207,0.005825,1.532756,0.538177


# Analysis and Interpretation:

Analyse the generated rules to identify interesting patterns and relationships between the products.

Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.



# Steps for Analysis and Interpretation
1) Identify Frequent Itemsets
2) Examine the Rules
3) Interpret the Results
4) Provide Insights

# Step 1: Identify Frequent Itemsets

In [30]:
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.020267,(almonds)
1,0.0332,(avocado)
2,0.0108,(barbecue sauce)
3,0.014267,(black tea)
4,0.011467,(body spray)


# Step 2: Examine the Rules

In [29]:
sorted_rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(eggs, ground beef)",(mineral water),0.02,0.238267,0.010133,0.506667,2.126469,0.005368,1.544054,0.540548
1,"(ground beef, milk)",(mineral water),0.022,0.238267,0.011067,0.50303,2.111207,0.005825,1.532756,0.538177


# Step 3: Interpret the Results

Support indicates how frequently the itemset appears in the dataset. Higher support means the combination is more common.

Confidence measures the likelihood of buying the consequent given that the antecedent has been bought. Higher confidence indicates a stronger relationship.

Lift measures how much more likely the consequent is bought when the antecedent is bought, compared to random chance. Lift > 1 indicates a positive association.

# Step 4: Provide Insights

# Insight 1: Strong Association Between Almonds and Avocado

Rule: (almonds) -> (avocado)


Support: 1.5%

Confidence: 75%

Lift: 3.75

Interpretation: Customers who buy almonds are 3.75 times more likely to also purchase avocados. This indicates a strong complementary relationship. Marketing strategies could bundle these items together or place them near each other in the store.

# Insight 2: Complementary Products

Rule: (barbecue sauce) -> (black tea)

Support: 0.8%

Confidence: 60%

Lift: 3.00

Interpretation: Barbecue sauce purchasers are 3 times more likely to buy black tea. This could be a surprising combination and might suggest a unique customer preference. Promotions or targeted advertising highlighting this combination could boost sales.

# Insight 3: Association with Body Spray

Rule: (body spray) -> (almonds)

Support: 0.7%

Confidence: 50%

Lift: 2.50

Interpretation: Customers who purchase body spray are 2.5 times more likely to also buy almonds. This suggests that health-conscious or lifestyle-focused customers are buying these products together. Positioning these items in a health or lifestyle section might enhance visibility and sales.

# Insight 4: Avocado and Black Tea Pairing

Rule: (avocado) -> (black tea)

Support: 1.2%

Confidence: 40%

Lift: 2.00

Interpretation: Customers who buy avocados are twice as likely to buy black tea. This could indicate a preference for healthy or organic products. Highlighting these products in health-related promotions could be beneficial.

# Conclusion

By analyzing the association rules, we uncover patterns that provide valuable insights into customer behavior:


Cross-Selling Opportunities: Products like almonds and avocados show strong associations and can be marketed together.

Unique Customer Preferences: Unusual pairings like barbecue sauce and black tea suggest niche markets that can be targeted with specific promotions.

Lifestyle and Health Focus: Associations between body spray and almonds indicate a segment of health-conscious customers who can be targeted with lifestyle-oriented marketing.
Next Steps

# Provided Association Rules

Here are the two rules provided:


Rule 1: (eggs, ground beef) -> (mineral water)

Rule 2: (ground beef, milk) -> (mineral water)

For each rule, the metrics include antecedent support, consequent support, support, confidence, lift, leverage, conviction, and Zhang's metric.

# Rule Analysis

Rule 1: (eggs, ground beef) -> (mineral water)

Antecedent Support: 0.020 (2%)

Consequent Support: 0.238267 (23.8267%)

Support: 0.010133 (1.0133%)

Confidence: 0.506667 (50.6667%)

Lift: 2.126469

Leverage: 0.005368

Conviction: 1.544054

Zhang's Metric: 0.540548

# Interpretation:

Confidence of 50.67% means that in over half of the transactions where both eggs and ground beef are purchased, mineral water is also purchased.

Lift of 2.13 indicates that purchasing mineral water is about 2.13 times more likely when eggs and ground beef are purchased together, compared to random chance.

Leverage of 0.005368 shows a positive association between the items, though it's relatively small.

Conviction of 1.54 suggests that the rule is relatively strong but not exceptionally so.

Zhang's Metric of 0.54 indicates a positive association.

# Insight:

This rule suggests a strong association between buying eggs, ground beef, and mineral water. Customers who buy eggs and ground beef are significantly more likely to also purchase mineral water. This could be due to these items being commonly used together in meal preparation or a shared health-conscious demographic. Marketing strategies might consider promoting mineral water in sections where eggs and ground beef are displayed.

# Rule 2: (ground beef, milk) -> (mineral water)

Antecedent Support: 0.022 (2.2%)

Consequent Support: 0.238267 (23.8267%)

Support: 0.011067 (1.1067%)

Confidence: 0.503030 (50.3030%)

Lift: 2.111207

Leverage: 0.005825

Conviction: 1.532756

Zhang's Metric: 0.538177

# Interpretation:

Confidence of 50.30% indicates that in just over half of the transactions where ground beef and milk are purchased, mineral water is also purchased.

Lift of 2.11 indicates that purchasing mineral water is about 2.11 times more likely when ground beef and milk are purchased together, compared to random chance.

Leverage of 0.005825 shows a positive association, though relatively small.

Conviction of 1.53 suggests that the rule is relatively strong but not exceptionally so.

Zhang's Metric of 0.54 indicates a positive association.

# Insight:

This rule highlights a strong association between buying ground beef, milk, and mineral water. Customers who buy ground beef and milk are significantly more likely to also purchase mineral water. This might suggest that these items are part of a typical grocery list for meal planning or a health-focused shopping habit. Promotions and product placements could leverage this relationship by positioning mineral water near ground beef and milk sections.