In [None]:
# Importing required libraries
import pandas as pd
import plotly.express as px
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

In [None]:
# Sample dataset
data = {'Transaction': [1, 2, 3, 4],
       'Milk': [1, 0, 1, 0],
       'Bread': [1, 1, 0, 1],
       'Butter': [0, 1, 1, 1]}
df = pd.DataFrame(data).set_index('Transaction')
# Step 1: Apply Apriori to find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
print("Frequent Itemsets:\n", frequent_itemsets)
# Step 2: Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print("\nAssociation Rules:\n", rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

Frequent Itemsets:
    support         itemsets
0     0.50           (Milk)
1     0.75          (Bread)
2     0.75         (Butter)
3     0.50  (Bread, Butter)

Association Rules:
 Empty DataFrame
Columns: [antecedents, consequents, support, confidence, lift]
Index: []




In the penguin-themed souvenir shop.  It has transaction data representing which items customers purchased together. Each transaction is a collection of items (e.g., Penguin Plush Toy, Penguin Mug, etc.).
1. **Clustering:**
* *Goal*: Group similar transactions based on the items purchased.
* *Method*: Techniques like K-means clustering could be applied to identify groups of similar purchase behaviors. For example, customers who frequently buy plush toys might cluster together, while those who buy calendars and mugs form another group.
2. **Association Rule Learning:**
* *Goal*: Discover relationships between items purchased together.
* *Method*: This is what you did with the Apriori algorithm, which is a form of unsupervised learning. It identified that customers who buy a Penguin Mug are also likely to buy a Penguin Plush Toy. This reveals a hidden pattern of purchasing behavior that can inform marketing strategies.

**Insights Gained**
1. *Targeted Marketing:* By understanding which items are frequently purchased together, the shop can create targeted marketing campaigns. For instance, they might offer discounts on plush toys when a customer buys a mug.
2. *Inventory Management:* The shop can optimize inventory based on the insights gained. If certain items are often bought together, they can ensure these items are stocked near each other or bundled for promotions.

**Conclusion**

Unsupervised learning, as demonstrated through the example of the penguin-themed souvenir shop, allows businesses to uncover hidden patterns and relationships in the data. By leveraging these insights, they can improve customer satisfaction and optimize sales strategies without needing predefined labels or categories.


In [13]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Sample transaction data
data1 = {
    'Transaction': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Penguin Plush Toy': [1, 1, 0, 1, 1, 0, 1, 0, 0, 1],
    'Penguin Mug': [1, 1, 1, 0, 1, 1, 1, 1, 1, 0],
    'Penguin Calendar': [0, 1, 0, 0, 0, 1, 0, 1, 0, 1],
    'Penguin T-Shirt': [0, 0, 1, 1, 0, 0, 1, 1, 0, 0],
    'Penguin Keychain': [0, 0, 0, 1, 1, 0, 0, 0, 1, 0]
}

# Create DataFrame and set Transaction as index
df1 = pd.DataFrame(data1).set_index('Transaction')

# Convert DataFrame to boolean type
df1 = df1.astype(bool)

# Step 1: Apply Apriori to find frequent itemsets
frequent_itemsets = apriori(df1, min_support=0.3, use_colnames=True)  # Lowered support threshold
print("Frequent Itemsets:\n", frequent_itemsets)

# Step 2: Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4)  # Lowered threshold
print("\nAssociation Rules:\n", rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

Frequent Itemsets:
    support                          itemsets
0      0.6               (Penguin Plush Toy)
1      0.8                     (Penguin Mug)
2      0.4                (Penguin Calendar)
3      0.4                 (Penguin T-Shirt)
4      0.3                (Penguin Keychain)
5      0.4  (Penguin Mug, Penguin Plush Toy)
6      0.3   (Penguin Calendar, Penguin Mug)
7      0.3    (Penguin Mug, Penguin T-Shirt)

Association Rules:
            antecedents          consequents  support  confidence      lift
0        (Penguin Mug)  (Penguin Plush Toy)      0.4    0.500000  0.833333
1  (Penguin Plush Toy)        (Penguin Mug)      0.4    0.666667  0.833333
2   (Penguin Calendar)        (Penguin Mug)      0.3    0.750000  0.937500
3    (Penguin T-Shirt)        (Penguin Mug)      0.3    0.750000  0.937500


Let's analyze the results of your frequent itemsets and association rules for the penguin-themed souvenir shop.
Frequent Itemsets:
| support | itemsets                                 |
|---------|------------------------------------------|
| 0.6     | (Penguin Plush Toy)                      |
| 0.8     | (Penguin Mug)                            |
| 0.4     | (Penguin Calendar)                       |
| 0.4     | (Penguin T-Shirt)                        |
| 0.3     | (Penguin Keychain)                       |
| 0.4     | (Penguin Mug, Penguin Plush Toy)         |
| 0.3     | (Penguin Calendar, Penguin Mug)          |
| 0.3     | (Penguin Mug, Penguin T-Shirt)           |
``


Association Rules:
| antecedents          | consequents          | support | confidence | lift     |
|----------------------|----------------------|---------|------------|----------|
| (Penguin Mug)        | (Penguin Plush Toy)  | 0.4     | 0.500000   | 0.833333 |
| (Penguin Plush Toy)  | (Penguin Mug)        | 0.4     | 0.666667   | 0.833333 |
| (Penguin Calendar)   | (Penguin Mug)        | 0.3     | 0.750000   | 0.937500 |
| (Penguin T-Shirt)    | (Penguin Mug)        | 0.3     | 0.750000   | 0.937500 |
### Frequent Itemsets Analysis

1. **Single Item Frequencies**:
   - **Penguin Mug**: Support of 0.8 (80% of transactions include this item)  
     - This is the most popular item, indicating it has broad appeal and possibly high demand.
   - **Penguin Plush Toy**: Support of 0.6 (60% of transactions)  
     - This item is also very popular, but not as much as the mug.
   - **Penguin Calendar**: Support of 0.4 (40% of transactions)  
   - **Penguin T-Shirt**: Support of 0.4 (40% of transactions)  
   - **Penguin Keychain**: Support of 0.3 (30% of transactions)  
     - These items have moderate popularity, suggesting they are still relevant but less frequently purchased.

2. **Item Combinations**:
   - **(Penguin Mug, Penguin Plush Toy)**: Support of 0.4  
     - A significant number of customers buy both items together, indicating a strong association between them.
   - **(Penguin Calendar, Penguin Mug)**: Support of 0.3  
   - **(Penguin Mug, Penguin T-Shirt)**: Support of 0.3  
     - The presence of the mug in these combinations suggests it is a central item in customer purchases.

### Association Rules Analysis

1. **Rule 1**: If a customer buys **Penguin Mug**, they are likely to buy **Penguin Plush Toy**.
   - **Support**: 0.4  
   - **Confidence**: 0.5 (50% of those who buy a mug also buy a plush toy)  
   - **Lift**: 0.83  
     - This indicates a positive relationship, but the lift being less than 1 suggests they are not strongly dependent on each other. 

2. **Rule 2**: If a customer buys **Penguin Plush Toy**, they are likely to buy **Penguin Mug**.
   - **Support**: 0.4  
   - **Confidence**: 0.67 (67% of plush toy buyers also buy a mug)  
   - **Lift**: 0.83  
     - This shows a mutual relationship, with a stronger confidence when buying the plush toy.

3. **Rule 3**: If a customer buys **Penguin Calendar**, they are likely to buy **Penguin Mug**.
   - **Support**: 0.3  
   - **Confidence**: 0.75 (75% of calendar buyers also buy a mug)  
   - **Lift**: 0.94  
     - The high confidence indicates a strong association, and the lift close to 1 shows they are somewhat related but also frequently bought independently.

4. **Rule 4**: If a customer buys **Penguin T-Shirt**, they are likely to buy **Penguin Mug**.
   - **Support**: 0.3  
   - **Confidence**: 0.75 (75% of T-shirt buyers also buy a mug)  
   - **Lift**: 0.94  
     - Similar to the previous rule, this indicates that the mug is a common item in conjunction with the T-shirt.

### Overall Insights

- **Central Item**: The **Penguin Mug** is a key item in purchases, serving as a strong anchor for other items. Marketing strategies could focus on promoting the mug, possibly bundling it with plush toys or calendars to increase sales.

- **Cross-Promotion Opportunities**: Since the mug appears frequently with other items, cross-promotional strategies (e.g., discounts or bundle deals) could be employed to encourage customers to buy multiple items together.

- **Stock Management**: Given the high support for the mug, ensure it is well-stocked. Items like the plush toy, calendar, and T-shirt should also be stocked, but their inventory can be adjusted based on their lower support values.

- **Targeted Advertising**: Use the association rules to create targeted advertising campaigns, suggesting related items based on customer purchase history.

In summary,

In [14]:
# Create the DataFrame
data = {
    "Transaction ID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Items Purchased": [
        "Milk, Bread, Eggs",
        "Bread, Butter",
        "Milk, Bread, Butter, Eggs",
        "Eggs, Bacon",
        "Milk, Bread, Bacon",
        "Bread, Butter, Eggs",
        "Milk, Eggs",
        "Bread, Eggs",
        "Milk, Butter",
        "Bacon, Bread"
    ]
}

df = pd.DataFrame(data)

# Display the DataFrame
print(df)

   Transaction ID            Items Purchased
0               1          Milk, Bread, Eggs
1               2              Bread, Butter
2               3  Milk, Bread, Butter, Eggs
3               4                Eggs, Bacon
4               5         Milk, Bread, Bacon
5               6        Bread, Butter, Eggs
6               7                 Milk, Eggs
7               8                Bread, Eggs
8               9               Milk, Butter
9              10               Bacon, Bread


In [None]:
# -----------------------------
# 1. Prepare Transaction Data
# -----------------------------
transactions = [
    ["Milk", "Bread", "Eggs"],
    ["Bread", "Butter"],
    ["Milk", "Bread", "Butter", "Eggs"],
    ["Eggs", "Bacon"],
    ["Milk", "Bread", "Bacon"],
    ["Bread", "Butter", "Eggs"],
    ["Milk", "Eggs"],
    ["Bread", "Eggs"],
    ["Milk", "Butter"],
    ["Bacon", "Bread"]
]

# Convert to one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

# -----------------------------
# 2. Apply Apriori Algorithm
# -----------------------------
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)
# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

# Display results
print("Frequent Itemsets:")
print(frequent_itemsets)

print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
   support         itemsets
0      0.3          (Bacon)
1      0.7          (Bread)
2      0.4         (Butter)
3      0.6           (Eggs)
4      0.5           (Milk)
5      0.3  (Bread, Butter)
6      0.4    (Bread, Eggs)
7      0.3    (Bread, Milk)
8      0.3     (Milk, Eggs)

Association Rules:
  antecedents consequents  antecedent support  consequent support  support  \
0    (Butter)     (Bread)                 0.4                 0.7      0.3   
1     (Bread)      (Eggs)                 0.7                 0.6      0.4   
2      (Eggs)     (Bread)                 0.6                 0.7      0.4   
3      (Milk)     (Bread)                 0.5                 0.7      0.3   
4      (Milk)      (Eggs)                 0.5                 0.6      0.3   
5      (Eggs)      (Milk)                 0.6                 0.5      0.3   

   confidence      lift  representativity  leverage  conviction  \
0    0.750000  1.071429               1.0      0.02    1.200000   


Here's an analysis of the **Apriori output** you provided, broken down into two parts: **frequent itemsets** and **association rules**.

---

### 📦 **Frequent Itemsets Analysis**

| Itemset             | Support | Interpretation |
|---------------------|---------|----------------|
| (Bread)             | 0.7     | Appears in 70% of transactions — most frequent item. |
| (Eggs)              | 0.6     | Commonly purchased, in 60% of transactions. |
| (Milk)              | 0.5     | Bought in half of all transactions. |
| (Butter)            | 0.4     | Moderately frequent. |
| (Bacon)             | 0.3     | Least frequent among single items. |
| (Bread, Eggs)       | 0.4     | Strong pair — appears in 40% of transactions. |
| (Bread, Butter)     | 0.3     | Classic combo, but less frequent. |
| (Bread, Milk)       | 0.3     | Also appears in 30% of transactions. |
| (Milk, Eggs)        | 0.3     | Indicates a common breakfast pairing. |

---

### 🔗 **Association Rules Analysis**

| Rule                        | Confidence | Lift     | Interpretation |
|-----------------------------|------------|----------|----------------|
| Butter → Bread              | 0.75       | 1.07     | Strong rule: 75% of Butter buyers also buy Bread. Slightly better than random. |
| Bread → Eggs                | 0.57       | 0.95     | Moderate confidence, but **lift < 1** suggests weak association. |
| Eggs → Bread                | 0.67       | 0.95     | Similar to above; not a strong dependency. |
| Milk → Bread                | 0.60       | 0.86     | Confidence is decent, but **lift < 1** implies Bread is already common. |
| Milk → Eggs                 | 0.60       | 1.00     | Neutral lift: no added value in prediction. |
| Eggs → Milk                 | 0.50       | 1.00     | Same as above — no strong predictive power. |

---

### 📊 **Key Insights**

- **Bread** is the most frequent item and appears in many combinations, but its high baseline support makes it less useful for strong predictive rules (lift < 1).
- **Butter → Bread** is the strongest rule with **lift > 1**, meaning Butter buyers are more likely to also buy Bread than by chance.
- **Milk and Eggs** co-occur often, but their mutual prediction power is neutral (lift = 1).
- Rules with **lift < 1** suggest that the consequent is already common and not strongly dependent on the antecedent.

---

Would you like a **visualization** (e.g., support-confidence scatter plot or network graph of rules)?

In [None]:

# -----------------------------
# 3. Prepare Data for Plotting
# -----------------------------
# Convert itemsets to string for bar chart
frequent_itemsets['itemsets_str'] = frequent_itemsets['itemsets'].apply(lambda x: ', '.join(list(x)))

# Add readable rule column to association rules
rules['rule'] = rules.apply(lambda row: f"{', '.join(list(row['antecedents']))} → {', '.join(list(row['consequents']))}", axis=1)

# -----------------------------
# 4. Visualization
# -----------------------------
# Bar chart for frequent itemsets
fig_bar = px.bar(
    frequent_itemsets,
    x='itemsets_str',
    y='support',
    labels={'itemsets_str': 'Itemsets', 'support': 'Support'},
    title='Frequent Itemsets Support',
    color='support'
)
fig_bar.show()

# Scatter plot for association rules
fig_scatter = px.scatter(
    rules,
    x='support',
    y='confidence',
    text='rule',  # Now this column exists
    size='lift',
    color='lift',
    labels={'support': 'Support', 'confidence': 'Confidence', 'lift': 'Lift'},
    title='Association Rules: Support vs Confidence'
)
fig_scatter.show()