In [1]:
import pandas as pd
df = pd.read_excel('Online retail.xlsx')
df.head()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt


In [2]:
# Handle missing values: fill NaN with an empty string or drop the rows
df.fillna('', inplace=True)

# Remove duplicate rows
df.drop_duplicates(inplace=True)

# Convert the dataframe into a list of lists (transactions)
transactions = []
for index, row in df.iterrows():
    # Split the single string of items into a list of items
    items = [item.strip() for item in row[0].split(',') if item.strip()] # Split and remove leading/trailing whitespace, and exclude empty items
    if items: # Only add transaction if it's not empty after stripping
        transactions.append(items)

print("Number of transactions after preprocessing:", len(transactions))
print("First 5 transactions after preprocessing:", transactions[:5])

  items = [item.strip() for item in row[0].split(',') if item.strip()] # Split and remove leading/trailing whitespace, and exclude empty items


Number of transactions after preprocessing: 5175
First 5 transactions after preprocessing: [['burgers', 'meatballs', 'eggs'], ['chutney'], ['turkey', 'avocado'], ['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea'], ['low fat yogurt']]


In [None]:
!pip install apyori

In [5]:
from apyori import apriori
# Apply the Apriori algorithm
rules = apriori(transactions, min_support=0.004, min_confidence=0.2, min_lift=3, min_length=2)

# Convert the rules to a list
results = list(rules)

# Print the first few rules
print("Number of rules found:", len(results))
print("First 5 rules:")
for i in range(min(5, len(results))):
    print(results[i])

Number of rules found: 30
First 5 rules:
RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.006570048309178744, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29310344827586204, lift=3.4949547115843)])
RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.006956521739130435, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.28800000000000003, lift=3.4341013824884796)])
RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.00502415458937198, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.28888888888888886, lift=3.444700460829493)])
RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.004830917874396135, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add

In [6]:
# Function to extract rule information and lift
def extract_rule_info(relation_record):
    rules_list = []
    for ordered_stat in relation_record.ordered_statistics:
        rules_list.append({
            "items_base": list(ordered_stat.items_base),
            "items_add": list(ordered_stat.items_add),
            "support": relation_record.support,
            "confidence": ordered_stat.confidence,
            "lift": ordered_stat.lift
        })
    return rules_list

# Extract all rules with their metrics
all_rules = []
for record in results:
    all_rules.extend(extract_rule_info(record))

# Sort the rules by lift in descending order
sorted_rules = sorted(all_rules, key=lambda x: x['lift'], reverse=True)

# Display the sorted rules
print("Association Rules (Sorted by Lift):")
for i, rule in enumerate(sorted_rules):
    print(f"Rule {i+1}: {rule['items_base']} -> {rule['items_add']}")
    print(f"Support: {rule['support']:.4f}, Confidence: {rule['confidence']:.4f}, Lift: {rule['lift']:.4f}")
    print("-" * 20)

Association Rules (Sorted by Lift):
Rule 1: ['frozen vegetables', 'soup'] -> ['milk', 'mineral water']
Support: 0.0044, Confidence: 0.3833, Lift: 5.6517
--------------------
Rule 2: ['mineral water', 'whole wheat pasta'] -> ['olive oil']
Support: 0.0054, Confidence: 0.3944, Lift: 4.5052
--------------------
Rule 3: ['olive oil', 'frozen vegetables'] -> ['milk', 'mineral water']
Support: 0.0048, Confidence: 0.2941, Lift: 4.3363
--------------------
Rule 4: ['pasta'] -> ['shrimp']
Support: 0.0071, Confidence: 0.4111, Lift: 4.1634
--------------------
Rule 5: ['milk', 'soup'] -> ['mineral water', 'frozen vegetables']
Support: 0.0044, Confidence: 0.2072, Lift: 4.1084
--------------------
Rule 6: ['tomato sauce'] -> ['spaghetti', 'ground beef']
Support: 0.0044, Confidence: 0.2212, Lift: 3.9601
--------------------
Rule 7: ['milk', 'mineral water', 'frozen vegetables'] -> ['soup']
Support: 0.0044, Confidence: 0.2771, Lift: 3.9075
--------------------
Rule 8: ['shrimp', 'ground beef'] -> ['he

## Conclusion

In this assignment, we applied Association Rule Mining using the Apriori algorithm on the Online Retail dataset.  
We successfully preprocessed the data, generated frequent itemsets, and extracted association rules with support, confidence, and lift measures.

Key takeaways:
- Rules such as `{pasta} → {shrimp}` and `{fromage blanc} → {honey}` reveal meaningful patterns in customer purchasing behavior.
- High lift values indicate strong associations that can guide **product bundling**, **cross-selling**, **store layout optimization**, and **targeted promotions**.
- Insights from these rules can help retailers improve the shopping experience, boost sales, and optimize inventory management.

Overall, association rules provide a powerful way to uncover hidden patterns in transactional data and translate them into actionable business strategies.


**Q1. What is Lift and why is it important in Association Rules?**  
- **Lift** measures how much more likely two items are bought together compared to random chance.  
- **Formula:**  
  \[
  Lift(A → B) = \frac{Confidence(A → B)}{Support(B)}
  \]  
- **Importance:**  
  - Lift > 1 → Positive association (good for cross-selling).  
  - Lift = 1 → No association (independent).  
  - Lift < 1 → Negative association (items rarely bought together).


**Q2. What is Support and Confidence? How are they calculated?**  
- **Support(A → B):** Fraction of transactions that include both A and B.  
  \[
  Support(A → B) = \frac{Transactions(A ∩ B)}{Total Transactions}
  \]  

- **Confidence(A → B):** Probability that a customer buys B given that they bought A.  
  \[
  Confidence(A → B) = \frac{Support(A ∩ B)}{Support(A)}
  \]


**Q3. What are some limitations or challenges of Association Rule Mining?**  
1. **Too many rules:** If thresholds are low, the algorithm generates many trivial rules.  
2. **No sequence/timing:** It only finds co-occurrence, not order of purchase.  
3. **Computationally expensive:** Large datasets with many items require heavy computation.  
4. **Business relevance:** Some rules may be statistically strong but not useful in practice.
