# **Association Rule Mining with Simulated Data**

##### Importing Necessary Libraries

In [19]:
# Import pandas for data manipulation and analysis
# Import apriori and association_rules from mlxtend for association rule mining
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

### **1. Simulate Transaction Data**

In [20]:
# Simulated transaction data:
# Each sublist represents a shopping transaction containing 2–5 items.
# The items are selected from a pool of at least 10 unique grocery products.
transactions = [
    ['Bread', 'Milk'],
    ['Bread', 'Eggs', 'Cheese'],
    ['Milk', 'Eggs', 'Butter'],
    ['Milk', 'Cheese', 'Apples'],
    ['Bread', 'Butter', 'Eggs'],
    ['Milk', 'Cheese', 'Juice'],
    ['Bread', 'Milk', 'Butter'],
    ['Cheese', 'Eggs', 'Tomatoes'],
    ['Bread', 'Butter', 'Apples'],
    ['Milk', 'Bread', 'Cheese', 'Eggs']
]

___
- Created at least 10 fake transactions using Python.
- Each transaction contains 2 to 5 items.
- Items are selected from a pool of at least 8 unique grocery items (e.g., Bread, Milk, Eggs, Cheese, etc.).
____



### **2. Analyze with Apriori**

In [21]:
# One-hot encode the transaction data:
# 1. Extract all unique items across all transactions and sort them.
items = sorted(set(item for transaction in transactions for item in transaction))

# 2. Create a list of dictionaries where each transaction is represented by True/False 
#    for the presence of each item (one-hot encoding).
encoded_transactions = [{item: (item in transaction) for item in items} for transaction in transactions]

# 3. Convert the list of dictionaries to a pandas DataFrame.
df = pd.DataFrame(encoded_transactions)

# Apply the Apriori algorithm to find frequent itemsets with minimum support of 0.3 (30%).
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

____
- Converted the transaction data into one-hot encoded format using pandas.
  - Each unique item becomes a column, and transactions are represented as rows with `True/False` values.
- Applied the Apriori algorithm (from `mlxtend.frequent_patterns`) to identify frequent itemsets.
- Set the minimum support threshold to **0.3 (30%)** to filter itemsets that appear in at least 3 out of 10 transactions.
____


### **3.Generate Rules and show results**

In [22]:
# Generate association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

# Show results
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

# Pick and explain one rule
for _, row in rules.iterrows():
    print(f"\nRule: If a customer buys {set(row['antecedents'])}, they are likely to also buy {set(row['consequents'])} with {round(row['confidence']*100, 2)}% confidence.")
    break  # Only show one example


Frequent Itemsets:
    support         itemsets
0      0.6          (Bread)
1      0.4         (Butter)
2      0.5         (Cheese)
3      0.5           (Eggs)
4      0.6           (Milk)
5      0.3  (Bread, Butter)
6      0.3    (Bread, Eggs)
7      0.3    (Bread, Milk)
8      0.3   (Eggs, Cheese)
9      0.3   (Milk, Cheese)

Association Rules:
   antecedents consequents  support  confidence  lift
0    (Butter)     (Bread)      0.3        0.75  1.25

Rule: If a customer buys {'Butter'}, they are likely to also buy {'Bread'} with 75.0% confidence.


____
- Generated association rules from the frequent itemsets using the `association_rules` function.
- Used **confidence** as the metric with a minimum threshold of **0.7 (70%)**.
- Displayed the rules along with support, confidence, and lift values.
- Interpreted one rule in plain language to explain what it means in a real-world shopping scenario.
  - Example: If a customer buys Butter, they are likely to also buy Bread with 75% confidence.
_____

```python
  
