# Introduction to Association Rules Mining in Python


Welcome to this tutorial on Market Basket Analysis, focusing on the basics of implementing the Apriori Algorithm and Association Rule Mining in Python.
We'll explore how to use Python to perform Market Basket Analysis, a popular application of Association Rules Mining, which is a powerful technique used to uncover interesting relationships and patterns in transactional data.



## Sources

These materials have been adapted from: <br/>

- [Frequent itemsets via the Apriori algorithm](https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)

- [Association rules generation from frequent itemsets](https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/)

## What is Market Basket Analysis?

Market Basket Analysis (MBA) is a data mining technique used to identify the relationships between items purchased together by customers. It originated from the retail industry, particularly from the concept of analyzing transactions in a physical or online store. The name "market basket" comes from the analogy of customers' shopping baskets filled with items they intend to purchase.


- **Market basket analysis**
    - Construct association rules
        - Association rules are generated to identify patterns in the dataset.
        - These rules reveal which items are frequently purchased together, providing insights into customer behavior.

- **Association rules**
    - Association rules are logical expressions that capture relationships between sets of items in a transaction.
        - **{antecedent}→{consequent}**
            - An association rule is typically represented as {antecedent}→{consequent}
            - For example, if customers frequently purchase fiction books ({antecedent}), they are also likely to buy biography books ({consequent}). {fiction}→{biography}


### Some cases of market basket analysis

1. **Improve product recommendations on an e-commerce store.**
    
    MBA helps e-commerce platforms suggest related or complementary products to customers based on their purchase history.

2. **Developing Netflix-style Recommendations Engine:**

    Media streaming platforms like Netflix utilize MBA to recommend movies or TV shows to users based on their viewing history, preferences, and similar viewing patterns of other users.

3. **Optimizing Inventory Management:**

    MBA assists in optimizing inventory levels by identifying which products are often purchased together.



## Overview of Apriori Algorithm

The **Apriori algorithm** is a widely recognized machine learning technique employed for association rule learning. Association rule learning involves analyzing a dataset to identify relationships between items within the data. For instance, in a dataset containing grocery store items, association rule learning can help uncover items frequently bought together. 



### Metrics and Pruning


In order to select the most relevant rules from the multitude of possibilities in a business scenario, we rely on metrics:

- A metric serves as a measure of performance for rules, providing insights into their significance.
  - Example:
    - {humor} → {poetry}: 0.81 &#x2713;
    - {fiction} → {travel}: 0.23 &#x2717;
- Pruning involves using metrics to discard irrelevant or uninformative rules, ensuring that only meaningful rules are retained.
  - Example:
    - Retain: {humor} → {poetry} &#x2713;
    - Discard: {fiction} → {travel}

#### The Support Metric (the Simplest Metric):

- The support metric measures the proportion of transactions that contain a specific itemset.
- Formula:
  $$
  \text{support} = \frac{\text{number of transactions with item(s)}}{\text{total number of transactions}}
  $$

#### The Confidence Metric:

- Confidence complements support by providing a more comprehensive understanding of the relationship between items.
- It indicates the probability of purchasing item $Y$ given that item $X$ has been purchased.
- Formula:
  $$
  \text{confidence}(X \rightarrow Y) = \frac{\text{support}(X \cap Y)}{\text{support}(Y)}
  $$

#### The Lift Metric:

- Lift offers another perspective on item relationships, considering the proportion of transactions containing both items relative to random and independent assignment.
- Lift values greater than 1 suggest a significant association between items.
- Formula:
  $$
  \text{lift}(X \rightarrow Y) = \frac{\text{support}(X \cap Y)}{\text{support}(X) \times \text{support}(Y)}
  $$

#### The Conviction Metric:

- Conviction evaluates the impact of the absence of the antecedent on the consequent, indicating the degree to which the consequent would be hurt.
- Higher conviction values signify a stronger interest in the rule.
- Formula:
  $$
  \text{conviction}(X \rightarrow Y) = \frac{1 - \text{support}(Y)}{1 - \text{confidence}(X \rightarrow Y)}
  $$

#### The Leverage Metric:

- Leverage, akin to lift, measures the difference between the observed and expected frequency of co-occurrence.
- It provides a more interpretable metric within the range of [-1, 1].
- Formula:
  $$
  \text{leverage}(X \rightarrow Y) = \text{support}(X \cap Y) - \text{support}(X) \times \text{support}(Y)
  $$


### Algorithm steps

### Apriori Algorithm: Explained and Expanded

The Apriori algorithm is a fundamental technique used in Market Basket Analysis to discover association rules among items in transactional data. Let's delve into each step of the algorithm and illustrate with examples:

#### Step 1: Generate Initial Itemsets

- **Description:** Start with itemsets containing just a single item (individual items).
- **Example:**

| Itemset |
|---------|
| {Milk}  |
| {Bread} |
| {Eggs}  |
| {Cheese}|

#### Step 2: Determine Support for Itemsets

- **Description:** Calculate the support for each itemset, indicating the frequency of occurrence in the dataset.
- **Example:**

| Itemset | Support |
|---------|---------|
| {Milk}  | 0.7     |
| {Bread} | 0.6     |
| {Eggs}  | 0.5     |
| {Cheese}| 0.2     |

#### Step 3: Prune Itemsets

- **Description:** Keep the itemsets that meet the minimum support threshold (0.4) and discard itemsets that do not meet the minimum support.
- **Example:**

| Itemset | Support | Pruned? |
|---------|---------|---------|
| {Milk}  | 0.7     | ✓       |
| {Bread} | 0.6     | ✓       |
| {Eggs}  | 0.5     | ✓       |
| {Cheese}| 0.2     |         |

#### Step 4: Generate Candidate Itemsets

- **Description:** Using the itemsets kept from Step 3, generate all possible combinations of itemsets.
- **Example:**

| Candidate Itemset | Support | Pruned? |
|-------------------|---------|---------|
| {Milk, Bread}     | 0.5     | ✓       |
| {Milk, Eggs}      | 0.4     | ✓       |
| {Bread, Eggs}     | 0.3     |         |


#### Step 5: Repeat Until Convergence

- **Description:** Repeat steps 1 to 4 until there are no more new itemsets generated.
- **Example:** Continue iterating until convergence is reached, and no new itemsets are generated.


#### Final Result: 

- After several iterations, the algorithm converges to the following frequent itemsets:

| Itemset       | Support |
|---------------|---------|
| {Milk}        | 0.7     |
| {Bread}       | 0.6     |
| {Eggs}        | 0.5     |
| {Milk, Bread} | 0.5     |
| {Milk, Eggs}  | 0.4     |

By following these steps, the Apriori algorithm efficiently identifies frequent itemsets and generates association rules from transactional data, providing valuable insights into customer purchasing patterns.


## Using the Apriori algorithm in Python


In [None]:
import warnings
warnings.filterwarnings("ignore")