#  Association Rules Analysis: Online Retail Transactions


# 
### In this notebook, we perform market basket analysis using the Apriori algorithm
### to discover associations between products.


In [2]:

# ## 📥 Import necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules


In [3]:
# ## 📄 Load the dataset
file_path = 'Online retail.xlsx'  # Change if your path is different
df = pd.read_excel(file_path)

In [4]:
# ## 🔎 Explore the data
df.head()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt


In [None]:
# ##  Preprocess the data
# Split each row (comma-separated string) into a list of items
transactions = df.iloc[:, 0].apply(lambda x: x.split(','))

In [None]:
# Convert to list of lists
transactions_list = transactions.tolist()

In [7]:


# Get unique items
all_items = set(item.strip() for sublist in transactions_list for item in sublist)
all_items = list(all_items)  # Convert set to list for DataFrame columns

In [9]:


# Create one-hot encoded DataFrame
encoded_df = pd.DataFrame(0, index=range(len(transactions_list)), columns=all_items)


# Fill in 1 where item is present
for i, items in enumerate(transactions_list):
    for item in items:
        encoded_df.at[i, item.strip()] = 1

# Check the encoded data
encoded_df.head()

Unnamed: 0,cider,bacon,french fries,corn,melons,cereals,antioxydant juice,pickles,spaghetti,yams,...,oatmeal,almonds,fromage blanc,brownies,candy bars,mineral water,whole wheat rice,white wine,low fat yogurt,protein bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,1,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


In [10]:


# ## ⚡ Apply Apriori algorithm
frequent_itemsets = apriori(encoded_df, min_support=0.02, use_colnames=True)



In [11]:


# ## 🔗 Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Show first few rules
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(french fries),(green tea),0.170933,0.132,0.028533,0.166927,1.264596,1.0,0.00597,1.041925,0.252373,0.103984,0.040238,0.191544
1,(green tea),(french fries),0.132,0.170933,0.028533,0.216162,1.264596,1.0,0.00597,1.057701,0.241053,0.103984,0.054553,0.191544
2,(french fries),(eggs),0.170933,0.179733,0.0364,0.212949,1.184803,1.0,0.005678,1.042202,0.188136,0.115825,0.040493,0.207735
3,(eggs),(french fries),0.179733,0.170933,0.0364,0.202522,1.184803,1.0,0.005678,1.039611,0.190155,0.115825,0.038102,0.207735
4,(french fries),(milk),0.170933,0.1296,0.023733,0.138846,1.071339,1.0,0.00158,1.010736,0.080318,0.085742,0.010622,0.160987


In [12]:


# ## 
# Analyze example rules
# We can look at key columns: antecedents, consequents, support, confidence, lift
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].sort_values(by='lift', ascending=False).head(10)

Unnamed: 0,antecedents,consequents,support,confidence,lift
20,(spaghetti),(ground beef),0.0392,0.225115,2.290857
21,(ground beef),(spaghetti),0.0392,0.398915,2.290857
27,(spaghetti),(olive oil),0.022933,0.1317,2.003547
26,(olive oil),(spaghetti),0.022933,0.348884,2.003547
68,(soup),(mineral water),0.023067,0.456464,1.915771
69,(mineral water),(soup),0.023067,0.09681,1.915771
71,(frozen vegetables),(milk),0.0236,0.247552,1.910127
70,(milk),(frozen vegetables),0.0236,0.182099,1.910127
48,(burgers),(eggs),0.0288,0.330275,1.837585
49,(eggs),(burgers),0.0288,0.160237,1.837585




###  Interview Questions

### What is lift, and why is it important?
#### Lift measures how much more likely two items are bought together than if they were independent.
#### Lift > 1 indicates a positive association.

### What is support and confidence? How do you calculate them?
#### Support: Fraction of transactions containing an item or set of items.
#### Confidence: Likelihood of buying B when A is bought.
#### confidence(A→B) = support(A ∩ B) / support(A)

### What are some limitations of association rule mining?
#### - Too many trivial rules
#### - High computation on large data
#### - Only correlations, not causation
#### - Requires careful threshold tuning



Key Observations
Spaghetti and ground beef

High lift (2.29) and strong confidence (about 40% when ground beef is antecedent).

Suggests customers often buy these together to prepare pasta dishes.

Possible strategy: Create combo promotions or place these items near each other.

Spaghetti and olive oil

Lift around 2.00, showing a strong complementary relationship.

Customers may associate spaghetti with healthy or Italian-style cooking.

Soup and mineral water

Lift of 1.91 with very high confidence (almost 46% from soup to water).

Indicates that customers buying soup are health-conscious and likely to also pick mineral water.

Possible in-store placement: Position mineral water near soups to increase cross-selling.

Frozen vegetables and milk

Lift of 1.91, suggesting these are often purchased together.

Reflects customers preparing balanced, home-cooked meals.

Burgers and eggs

Lift of 1.84 and strong confidence (33% from burgers to eggs).

Customers may be preparing protein-rich or hearty breakfasts.

 Overall Insights
Products with lift greater than 1 show positive associations, and higher lifts highlight stronger connections.

Confidence values help understand the likelihood of products being purchased together.

These insights can be used to:

Plan product placement and store layout.

Design targeted promotions and combo offers.

Understand customer meal-prep patterns and preferences.

 Conclusion
Market basket analysis using Apriori revealed meaningful associations that can drive marketing strategies and improve sales through personalized recommendations and smarter inventory planning.