# 🧾 Apriori Algorithm: From Concept to Code

**Goal:** Learn what Apriori is, how it works, why its results make sense in data mining, and how to interpret frequent itemsets and association rules in Python.

**You will:**
1) Build a small transaction dataset  
2) One-hot encode it for Apriori  
3) Mine frequent itemsets with different supports  
4) Generate association rules and filter by confidence/lift  
5) Interpret results and run “what-if” experiments


## 0. Setup & Imports

We’ll use `pandas` for data handling and `mlxtend` for Apriori and association rule helpers.  
If `mlxtend` isn’t installed, run the cell below.


In [None]:
# If needed, uncomment to install:
# !pip install -q mlxtend

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

pd.set_option("display.max_colwidth", None)


## 1. What is Apriori? (Concept)

**Apriori** is a frequent itemset mining algorithm. It relies on the **Apriori property**:

> If an itemset is frequent, **all of its subsets** must also be frequent.  
> Contraposition: If an itemset is **not** frequent, none of its supersets can be frequent.

**Pipeline:**
1. Count support for single items; keep those ≥ `min_support`.
2. Join them to form 2-item candidates; count support; prune.
3. Repeat for size 3, 4, … until no more frequent sets.
4. Generate **association rules** from frequent itemsets and evaluate with **confidence** and **lift**.

- **Support**: fraction of transactions containing an itemset.  
- **Confidence**: P(consequent | antecedent).  
- **Lift**: Confidence divided by the baseline frequency of the consequent; >1 means positive association beyond chance.
- 
## 2. Build a Small Transaction Dataset

We’ll start with a toy “market basket” dataset. Feel free to edit or expand.



In [None]:
transactions = [
    ['milk', 'bread', 'eggs'],
    ['milk', 'bread'],
    ['milk', 'diapers', 'beer', 'bread'],
    ['bread', 'diapers', 'beer', 'cola'],
    ['milk', 'bread', 'diapers', 'beer'],
    ['milk', 'eggs'],
    ['bread', 'eggs'],
    ['milk', 'cola', 'chips'],
    ['bread', 'butter'],
    ['milk', 'butter', 'bread'],
]

print(f"Number of transactions: {len(transactions)}")
for i, t in enumerate(transactions[:5], start=1):
    print(f"T{i}: {t}")


## 3. One-Hot Encode the Transactions

`mlxtend`’s `TransactionEncoder` converts a list of lists into a boolean (0/1) matrix that Apriori expects.


In [None]:
# Your code:



## 4. Mine Frequent Itemsets with Apriori

Start with a reasonable `min_support`.  
Try changing it (e.g., `0.3`, `0.5`, `0.6`) and see how the results change.


In [None]:
min_support = 0.3  # <-- experiment here

frequent_itemsets = apriori(
    df,
    min_support=min_support,
    use_colnames=True  # show item labels instead of column indices
)
# Sort by length then support for readability
# Your code:

print(f"Frequent itemsets with min_support = {min_support}:")
display(frequent_itemsets)


In [None]:
min_support = 0.5  # <-- experiment here

frequent_itemsets = apriori(
    df,
    min_support=min_support,
    use_colnames=True  # show item labels instead of column indices
)
# Sort by length then support for readability
# Your code:

print(f"Frequent itemsets with min_support = {min_support}:")
display(frequent_itemsets)

In [None]:
min_support = 0.6  # <-- experiment here

frequent_itemsets = apriori(
    df,
    min_support=min_support,
    use_colnames=True  # show item labels instead of column indices
)
# Sort by length then support for readability
# Your code:

print(f"Frequent itemsets with min_support = {min_support}:")
display(frequent_itemsets)

### Why these results match Apriori
- Itemsets appear only if **all their subsets** were also frequent at previous levels.
- Lowering `min_support` admits more itemsets (including larger ones), while raising it prunes aggressively.

## 5. Generate Association Rules

We convert frequent itemsets into rules and compute **confidence** and **lift**.  
Filter rules to show only informative ones (e.g., confidence ≥ 0.6).


In [None]:

rules = association_rules(
    # Your code:
    
).sort_values(["confidence", "lift"], ascending=False)

print("All rules (filtered by confidence >= 0.6):")
cols_to_show = ["antecedents", "consequents", "support", "confidence", "lift",]
display(rules[cols_to_show].reset_index(drop=True))


### Interpreting the metrics
- **support**: fraction of all transactions containing `antecedents ∪ consequents`
- **confidence**: P(consequents | antecedents)
- **lift**: confidence / support(consequents).  
  - If **lift > 1** → positive association beyond chance  
  - If **lift ≈ 1** → independent  
  - If **lift < 1** → negative association

## 7. Focus on High-Value Rules

Let’s filter to rules with **lift > 1** (stronger than chance) and **confidence ≥ 0.7**.


In [None]:
lift_min = 1.0
conf_min = 0.7

strong_rules = rules.query("lift > @lift_min and confidence >= @conf_min") \
                    .sort_values(["lift","confidence"], ascending=False)

print(f"Rules with lift > {lift_min} and confidence >= {conf_min}:")
display(strong_rules[cols_to_show].reset_index(drop=True))


# Implementing Apriori algorithm in Python

[Apriori Algorithm](https://www.geeksforgeeks.org/machine-learning/apriori-algorithm/) is a machine learning algorithm used for market basket analysis. It helps to find associations or relationships between items in large transactional datasets. A common real-world application is product recommendation where items are suggested to users based on their shopping cart contents. Companies like Walmart have used this algorithm to improve product suggestions and drive sales.

In this article we’ll do step-by-step implementation of the Apriori algorithm in Python using the mlxtend library.

### Step 1: Importing Required Libraries
Before we begin we need to import the necessary Python libraries like Pandas , Numpy and mlxtend.

In [None]:
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

### Step 2: Loading and exploring the data
We start by loading a popular groceries dataset. This dataset contains customer transactions with details like customer ID, transaction date, and the item purchased.

In [None]:
import pandas as pd
df = pd.read_csv("Groceries_dataset.csv")
print(df.head())

Each row represents one item in a customer's basket on a given date.
To use the Apriori algorithm we must convert this into full transactions per customer per visit.

### Step 3: Group Items by Transaction
We group items purchased together by the same customer on the same day to form one transaction.

In [None]:
basket = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).reset_index()
transactions = basket['itemDescription'].tolist()
print(transactions)

### Step 4: Convert to One-Hot Format
Apriori needs data in True/False format like Did the item appear in the basket?. We use Transaction Encoder for this:


In [None]:
from mlxtend.preprocessing import TransactionEncoder
# Your code:


### Step 5: Run Apriori Algorithm
Now we find frequent itemsets combinations of items that often occur together. Here min_support=0.01 means itemsets that appear in at least 1% of transactions. This gives us common combinations of items.

In [None]:
from mlxtend.frequent_patterns import apriori
# Your code:


### Step 6: Generate Association Rules
Now we find rules like If bread and butter are bought, milk is also likely to be bought.

- Support: How often the rule appears in the dataset.
- Confidence: Probability of buying item B if item A is bought.
- Lift: Strength of the rule over random chance. (>1 means it's a good rule)

In [None]:
from mlxtend.frequent_patterns import association_rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)
rules = rules[rules['antecedents'].apply(lambda x: len(x) >= 1) & rules['consequents'].apply(lambda x: len(x) >= 1)]
print("Association Rules:", rules.shape[0])
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head(5)

### Step 7: Visualize the Most Popular Items
Let’s see which items are most frequently bought:

In [None]:
import matplotlib.pyplot as plt
top_items = df['itemDescription'].value_counts().head(10)

#Please add your name on title
top_items.plot(kind='bar', title='<Your Name>+ Top 10 Most Purchased Items')

#Please set fontsize
plt.xlabel("Item", fontsize)
plt.ylabel("Count",fontsize)
plt.show()

As shown in the above output Whole milk is the most frequently bought item, followed by other vegetables, rolls/bun and soda.