Association rule mining is the `study of "what goes with what else"`, or `"which items tend to go with other items"`. Association rule mining is also known as **market basket analysis** because it is used to find the association between items of transaction records.

While shopping online, you may have noticed the shopping application’s user interface shows pictures of two or more products side by side along with the caption `"Frequently bought together"` or a list of products with the caption `"People who bought this item also bought"`. This is called **cross-selling**.

An online retailer can try to sell a more expensive product with more features in the frequently bought list. This is called **upselling**.

Market basket analysis uses transaction data to analyze and categorize customer-purchasing patterns. The aim is to find actionable information that can improve business strategy. We can determine customer groups based on the combinations of products they buy. We can then direct promotions by the profitability of the various customer profiles.

For example, cosmetics and healthy foods may define a "beauty-conscious” profile. However, how the goods are related is important as well. For instance, most customers who buy healthy foods may also purchase cosmetics, while only a few customers buying cosmetics may also purchase healthy foods. In such a scenario, the promotion of healthy foods may aid in increasing cosmetics sales, but the promotion of cosmetics may not be successful in promoting additional sales of healthy foods.

**Here are the different uses of association rules in marketing:**

- Optimal layout of stores, catalogs, and web pages
- Selecting products for promotions 
- Product placement and space allocation in stores
 

**Possible solutions in marketing include:**

**1. Affinity positioning**

Placing products that are likely to be purchased close to one another
For example, coffee and coffee filters

**2. Cross-selling**

Selling a product with another related product
For example, people who buy cold medicines and Kleenex may also buy orange juice.

In market basket analysis, the input is the list of purchases in a transaction (where the purchaser’s identity may or may not be known). The goal is to identify purchase patterns and uncover seasonal and geographical patterns, i.e., items that tend to be purchased together or items that are purchased sequentially, for example bread and butter, milk and bread, cold drink and snacks etc

### Association Rule Mining

Association rule mining works on the concept of IF-THEN statements, that is, “If A Then B” and it is denoted as A → B. You may have come across this notation in mathematical logic. This is also known as a conditional statement.

<div>
<img src="attachment:276d6e19-ea48-4407-9010-9a7a418fa724-img3.png" alt="Drawing" style="height: 200px;"/>
</div>


Consider the rule {Bread} → {Milk}. Here, the “IF” part ({Bread}) is called the antecedent, and the “THEN” part ({Milk}) is called the consequent.

<div>
<img src="attachment:ARM1-2.png" alt="Drawing" style="width: 600px;"/>
</div>

The general notation of an association rule is {item 𝑖, item 𝑗, …} → {item 𝑘, item 𝑙, …}. Here, {item 𝑖, item 𝑗, …} is the antecedent set and {item 𝑘, item 𝑙, …} is the consequent set.

**Here are some of the association rules the given data supports:**

- {Bread} → {Milk}
- {Milk, Bread} → {Coke}
- {Bread, Diapers} → {Milk, Coke}

So, given a set of transactions, we can find rules that will speak to the frequency of occurrence of an item or a set of items based on the occurrence of other items or sets of items in the transaction.

**There are two key problems while developing good rules:**

- Computation: Searching for patterns in large databases is computationally expensive, and there are many potential rules to evaluate.
- Evaluation: How do we determine whether a rule is “good” or “potentially useful”?

A practical solution is to consider only those itemsets that occur with high frequency in the database. These sets are called frequent itemsets.

**Let's understand this with an example:**

<div>
<img src="attachment:eae4c3eb-55c5-419a-9419-af5d10351731-img3.png" alt="Drawing" style="height: 200px;"/>
</div>

The table below shows the frequency of valid itemsets for the data with exactly one item per itemset.

![ARM2.png](attachment:ARM2.png)

The following table shows the frequency of valid itemsets for the data with exactly two items per itemset.

<div>
<img src="attachment:ARM3.png" alt="Drawing" style="height: 400px;"/>
</div>

The itemsets which are marked are less than the cut-off `threshold of 2`.

As we can see, the frequency of combinations of items is less than or equal to that of individual items. For example, the frequency of `{Bread, Diaper} is 3`, whereas the frequency of `{Bread} is 4`.

In general, if the set `{item A, item B}` is not frequent, then no set containing both item A and item B is frequent. Therefore, they do not need to be considered and should be excluded. In other words, a superset of an itemset that is not frequent is also not frequent. Similarly, a subset of a frequent itemset is also a frequent itemset.

Usually, the frequency of occurrence of an itemset is converted into a proportion called **`support`**. The support of an itemset is equal to the ratio of the frequency of an itemset to the total number of transactions in the data set.

<div>
<img src="attachment:ARM4.png" alt="Drawing" style="height: 200px;"/>
</div>

### Support 

**`Support = Number of transactions in which all the items in the rule appear / Total number of transactions in the data`**

<div>
<img src="attachment:bf69a96f-e65e-49f4-b698-717441845146-img3.png" alt="Drawing" style="height: 200px;"/>
</div>


**Let’s consider the rule {Diapers} → {Beer}**.
**The support for this rule is calculated as:**

- Number of transactions in which Diapers and Beer appear = 3
- Total number of transactions = 6
- Support = 3/6 = 0.5 = 50%
 

**Consider another rule {Milk, Diapers} → {Beer}**.
**The support for this rule is calculated as:**

- Number of transactions in which Milk, Diapers, and Beer appear = 2
- Total number of transactions = 6
- Support = 2/6 = 0.33 = 33.33%
 

**Consider another rule {Bread, Diapers} → {Beer}**.
**The support for this rule is calculated as:**

- Number of transactions in which Bread, Diapers, and Beer appear = 2
- Total number of transactions = 6
- Support = 2/6 = 0.33 = 33.33%

Now, consider the rules `{Diapers} → {Beer}` and `{Beer} → {Diapers}`. Both these rules will have the same support of 50%. How do we differentiate between these two rules? Are they both equally good rules? Is support a good enough measure of the strength of a rule?

### Confidence

Confidence expresses the degree of uncertainty of a rule.

**Confidence** = `Number of transactions in which all the items in the rule appear / Number of transactions with items in the antecedent itemset`.

**Consider the rule {Diapers} → {Beer}. The confidence for this rule is calculated as:**

- Number of transactions with Diapers and Beer = 3
- Number of transactions with Diapers = 4
- Confidence = 3/4 = 0.75 = 75%
 

**Consider another rule {Milk, Diapers} → {Beer}**.

- Number of transactions with Milk, Diapers, and Beer = 2
- Number of transactions with Milk and Diapers = 3
- Confidence = 2/3 = 0.67 = 66.67%
 

**Consider another rule {Bread, Diapers} → {Beer}. The confidence for this rule is calculated as:**

- Number of transactions with Bread, Diapers, and Beer = 2
- Number of transactions with Bread and Diapers = 3
- Confidence = 2/3 = 0.67 = 66.67%

**In the above transcations**:

- Support({Milk, Diapers} → {Beer})  = 2/6 = 33.33%
- Support({Beer} → {Milk, Diapers}) = 2/6 = 33.33%

**But,**

- Confidence({Milk, Diapers} → {Beer})  = 2/3 = 66.67% 
- Confidence({Beer} → {Milk, Diapers})  = 2/4 = 50%

Hence, we can conclude `Confidence({Milk, Diapers} → {Beer})` ≠ `Confidence({Beer} → {Milk, Diapers})`. In fact, `{Milk, Diapers} → {Beer}` seems to be a better rule in terms of confidence.

### Lift

If everybody buys milk, then any antecedent combined with milk as the consequent would have higher support and confidence values. There is a measure called **lift ratio** which addresses this issue.

**Consider the rule {Bread} → {Milk}. In terms of probability, we have the following:**

- Confidence({Bread} → {Milk}) = P({Milk} | {Bread}) = P({Milk} and {Bread}) / P({Bread}) = (3/6) / (4/6) = 3/4 = 0.75
- Also, consider the probability of the consequent P({Milk}) = 5/6 = 0.83.
- If {Milk} and {Bread} were independent events, then P({Milk} and {Bread}) would be equal to P({Milk})P({Bread}), and the confidence of the rule would reduce to P({Milk}). So, if {Milk} and {Bread} were independent events, the confidence would be higher at 0.83. Among transactions that include Bread, Milk has a smaller chance of showing up than in the full set of transactions. 

The lift ratio allows us to judge the strength of an association rule against a benchmark value. If the antecedent set and the consequent set are independent, we can rewrite the confidence of a rule as follows: Confidence(X → Y) = P(Y). Here, P(Y) is the benchmark confidence.
 
So, in the example above, P(Milk) = 0.83 is the benchmark confidence. There is an 83% chance a randomly selected transaction will include Milk.

**Lift ratio = Confidence / Benchmark confidence**

**Let’s calculate the lift ratio of the rule {Coke} → {Milk}:**

- Benchmark confidence = P(consequent) = P(Milk) = 5/6
- Confidence of {Coke} → {Milk} = 2/2 = 1
- Lift ratio = (1) / (5/6) = 6/5 = 1.2
- Among transactions that include Coke, there is a 100% chance that Milk is also present. This is better than the 5/6 probability of Milk in a transaction picked at random. The probability is greater by a factor of 1.2.
 

**Let’s calculate the lift ratio of the rule {Bread} → {Milk}:**

- Benchmark confidence = P(consequent) = P(Milk) = 5/6
- Confidence of {Bread} → {Milk} = 3/4
- Lift ratio = (3/4) / (5/6) = 0.9
 

**Let’s calculate the lift ratio of the rule {Diapers} → {Beer}:**

- Benchmark confidence = P(consequent) = P(Beer) = 4/6
- Confidence of {Diaper} → {Beer} = 3/4
- Lift ratio = (3/4) / (4/6)  = 9/8 = 1.125
 

**Let’s calculate the lift ratio of the rule {Eggs} → {Milk}:**

- Benchmark confidence = P(consequent) = P(Milk) = 5/6
- Confidence of {Eggs} → {Milk} = 1/2
- Lift ratio = (1/2) / (5/6)  = 6/10 = 0.6 

**Let’s try to understand what the numbers above mean:**

- Consider the rule {Eggs} → {Milk}. Its lift ratio is 0.6. In other words, this rule leads to transactions with Milk less often than expected by chance alone.
- Consider the rules {Coke} → {Milk} and {Diapers} → {Beer}. Their lift ratios are 1.2 and 1.1.25, respectively. In other words, these rules lead to transactions with Milk more often than expected by chance alone.
- Consider the rule {Bread} → {Milk}. Its lift ratio is 0.9. In other words, this rule leads to transactions with Milk almost as often (though strictly, less often) than expected by chance alone.

![ARM5.png](attachment:ARM5.png)

The rules `{Bread} → {Milk} and {Diapers} → {Beer}` have the same support (0.5) and confidence (0.75), but the rule `{Diapers} → {Beer}` has a lift ratio of 1.125, whereas the rule `{Bread} → {Milk}` has a lift ratio of 0.9. This means that the rule `{Diapers} → {Beer}` occurs more often than expected by pure chance as opposed to the rule `{Bread} → {Milk}`, which appears less often than expected by pure chance. So, we can conclude that `{Diapers} → {Beer}` is a better rule than `{Bread} → {Milk}`.


In fact, the rule `{Coke} → {Milk}` has a **low support (0.33) but high confidence and lift**. Therefore, it is imperative that association rules are analyzed using all of these performance metrics carefully.