Problem Statement
Apriori discovers frequent itemsets and association rules in transactional data. Used for market basket analysis. Currently missing from aprender.
Use Cases:
- Market basket analysis ("customers who bought X also bought Y")
- Recommendation systems
- Cross-selling strategies
- Web usage mining
Example Rules:
- {milk, bread} → {butter} (support=0.3, confidence=0.8)
- "30% of transactions contain milk, bread, butter"
- "80% of transactions with milk and bread also have butter"
Proposed Solution
Implement Apriori algorithm following EXTREME TDD.
Algorithm
Steps:
- Find frequent 1-itemsets (items above min_support)
- Generate candidate k-itemsets from frequent (k-1)-itemsets
- Prune candidates using Apriori principle:
- If itemset infrequent, all supersets are infrequent
- Generate association rules from frequent itemsets
- Filter rules by min_confidence
Implementation
API Design:
pub struct Apriori {
min_support: f32,
min_confidence: f32,
frequent_itemsets: Option<Vec<ItemSet>>,
rules: Option<Vec<AssociationRule>>,
}
pub struct ItemSet {
items: Vec<usize>,
support: f32,
}
pub struct AssociationRule {
antecedent: Vec<usize>, // If
consequent: Vec<usize>, // Then
support: f32,
confidence: f32,
lift: f32,
}
impl Apriori {
pub fn fit(&mut self, transactions: &[Vec<usize>]) -> Result<(), &'static str>;
pub fn frequent_itemsets(&self) -> &[ItemSet];
pub fn association_rules(&self) -> &[AssociationRule];
}
Success Criteria
- ✅ Apriori with frequent itemset mining
- ✅ Association rule generation
- ✅ Support, confidence, lift metrics
- ✅ 10+ tests (including retail dataset)
- ✅ Zero clippy warnings
- ✅ Example: examples/market_basket.rs
Estimated Effort
Timeline: 3-4 days
Complexity: Medium (combinatorial enumeration, pruning)
Problem Statement
Apriori discovers frequent itemsets and association rules in transactional data. Used for market basket analysis. Currently missing from aprender.
Use Cases:
Example Rules:
Proposed Solution
Implement Apriori algorithm following EXTREME TDD.
Algorithm
Steps:
Implementation
API Design:
Success Criteria
Estimated Effort
Timeline: 3-4 days
Complexity: Medium (combinatorial enumeration, pruning)