# Association Rules


_An association rule is an implication expression of the form $X \rightarrow Y$, where $X$ and $Y$ are disjoint itemsets_ .

Rule generation is a common task in the mining of frequent patterns.

Eg: 
1. $\{Apple\} \rightarrow \{Orange\}$ suggesting that people who buy Apple are also likely to buy Orange.
2. $\{Onion, Potato\} \rightarrow \{Burger\}$ found in the sales data of a supermarket would indicate that if a customer buys Onions and Potatoes together, they are likely to also buy Burger.   


* `Association rule` learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. 
  * It is intended to identify strong rules discovered in databases using some measures of interestingness.

* Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swam introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. 
  * Such information can be used as the basis for decisions about marketing activities like promotional pricing or product placements.

## Applications

* Association rules are employed in many application areas including: 
  * Web usage mining
  * Intrusion detection
  * Continuous production
  * Bioinformatics, inter alia. 
  
In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions. 

To evaluate the "interest" of an association rule, different metrics have been developed.

## Terminologies (Metrics) in Association Rules 

$$\begin{array}{lcc}\hline										
	ID	&	Item	&		&		&		\\\hline
	1	&	Apple	&	Orange	&		&		\\
	2	&	Apple	&	Banana	&		&		\\
	3	&	Apple	&	Coke	&	Orange	&		\\
	4	&	Orange	&	Coke	&		&		\\
	5	&	Orange	&	Coffee	&		&		\\
	6	&	Apple	&	Coffee	&	Orange	&	Coke	\\
\hline										
\end{array}$$
	



## Support (Frequency constraint)


<img src="https://raw.githubusercontent.com/tec03/Datasets/main/images/p_of_a.png" width="500" align="left">

Support = Area of the orange circle



 Support is an indication of `how frequently` the itemset appears in the dataset.
 Support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. 
 



$$\text{Support(x)} = \cfrac{\text{Number of transactions in which x appears}}{\text{Total number of transactions}}  \in  [0,1] 
$$


Define support count.  ( = min. support threshold)
* We set it as 3. 

 We refer to an itemset as a "frequent itemset" if you support is larger than a specified minimum-support threshold. 

$$\begin{array}{lcc}\hline										
	ID	&	\text{support_count	}						\\\hline
	1	&	4							\\
	2	&	1							\\
	3	&	2							\\
	4	&	3							\\
	5	&	5							\\								
\hline										
\end{array} $$									

  * At least three times, items should be purchsed. Else, we are not considering.
    * Its main feature is that it possesses the property of down-ward closure which means that all sub sets of a frequent set (support > min. support threshold) are also frequent. 
    * This property (actually, the fact that no super set of a infrequent set can be frequent) is used to prune the search space (usually a tree of item sets with increasing size) in level-wise algorithms (e.g., the APRIORI algorithm). 

$$\begin{array}{lcc}\hline										
	Item	&	\text{support_count}	&	\text{support}					\\\hline
	Apple	&	4	&	0.67			(4/6)		\\
	Coke	&	3	&	0.50			(3/6)		\\
	Orange	&	5	&	0.84		(5/6)			\\							
\hline										
\end{array}$$										

* The support metric is defined for itemsets, not assocication rules. 

In general, all subsets of a frequent itemset are also frequent, due to the *downward closure* property (_support of an item set is less than a particular threshold if the support of any subset of this item set is less than that threshold_). 

The table produced by the association rule mining algorithm contains three different support metrics: 
	 
* 'antecedent support' : computes the proportion of transactions that contain the antecedent `x`
* 'consequent support': computes the support for the itemset of the consequent `y`
* 'support' : computes the support of the combined itemset `x` $\cap$ `y`

NB: 'support' depends on 'antecedent support' and 'consequent support' via min('antecedent support', 'consequent support')


* The disadvantage of `support` is the rare item problem. 
    * Items that occur very infrequently in the data set are pruned although they would still produce interesting and potentially valuable rules.


###  Confidence

Confidence is an indication of how often the rule has been found to be true.

<img src = 'https://raw.githubusercontent.com/tec03/Datasets/main/images/p_of_ab.png' width = 400>

$\begin{align} \text{Confidence(x}→	\text{y)} & =  \cfrac{Support(x \cap y)}{Support(x)} \in [0, 1]\\
& = P (y/x)\\ 
& =  \cfrac{P(x \cap y) }{ P(x)}
\end{align}
$	

* It signifies the likelihood of `item y` being purchased when `item x` is purchased.
* The confidence of a rule `x`$\rightarrow$ `y` is the probability of seeing the `y` in a transaction given that it also contains the `x`. 	
* The confidence of a rule indicates the `probability` of both the antecedent (`x`) and the consequent (`y`) appearing in the same transaction. 
* Confidence is the `conditional probability P(y/x)` of the `consequent` given the `antecedent`. 						

Confidence(Apple → Organge) = 75% 
* 75% of the customers who purchased an Apple also bought Orange
* Out of 4 transactions with Apple, 3 transactions (75%) might have Oranges
* Apple implies Orage with 75% confidence

 						
NB: The metric is not symmetric or directed
 * confidence(`x` $\rightarrow$ `y` ) $\neq$ confidence(`y`$\rightarrow$ `x`)
 * confidence(`x` $\rightarrow$ `y` ) = 1, if `x` and `y` always occur together. 

Confidence is not down-ward closed and was developed together with support (the so-called support-confidence framework). 

While `support` is used to prune the search space and only leave potentially interesting rules, `confidence` is used in a second step to filter rules that exceed a min confidence threshold. 


* A disadvantage with `confidence` is that, it is sensitive to the frequency of the consequent (y) in the data set. 
  * Caused by the way `confidence` is calculated, `y`s with higher `support` will automatically produce higher `confidence` values even if they exists no association between the items.


###   Lift
  The ratio of the `observed support` to that `expected` if `x` and `y` were independent.

  Used to measure how much more often the `x` and `y` of a rule `x` → `y` occur together than we would expect if they were statistically independent. 

  $\begin{align} 
  \text{Lift(x}→	\text{y)} & =  \cfrac{Confidence(x → y)}{Support(y)} \in [0, ∞]\\
    & = \cfrac{\text{Support (x } \cap \text{y}) }{\text{Support (x)*Support (y)}}\\
    & = \cfrac{P(x \cap y)}{P{(x)}*P{(y)}}
\end{align}
$	

* Measures the difference of`x` and `y`appearing together in the data set 
* Measures what would be expected if `x` and `y` where statistically dependent
* The rational in a sales setting is to find out how many more units (items `x` and `y` together) are sold than expected from the independent sells. 




#### Takeaways
* `Lift(x→	y) = 1`, which implies no association between items. 
    * `x` and `y` appear almost as often together as expected
    * the occurrence of `x` has almost no effect on the occurrence of `y`.
    * If `x` and `y` are independent, the Lift score will be exactly 1.

* `Lift(x→	y) > 1`,  item `y` is likely to be bought if item `x` is bought
    * `x` and `y` appear more often together than expected.							
    * the occurrence of the  `x` has a positive effect on the occurrence of `y`						
* `Lift(x→	y) < 1`, item `y`  is unlikely to be bought if item `x` is bought. 
   * `x` and `y`  appear less often together than expected							
   * the occurrence of `x` has a negative effect on the occurrence of `y`
   * `x` is not the reason to buy `y`. 						

### Lift Case1 - Independent : (There is intersection - Can be large / can be small)

 If `x` and `y` are independent, then P(x $\cap$ y) = P(x)*P(y)

i.e., \begin{align*}  
    \text{Lift (x} \rightarrow \text{y)} = 1
    \end{align*}
    
    
If buying item `x` is independent of buying item `y`, then know the customer brought item `x` will offer no support (or confidence) to guess they will also buy item `y`.

### What if $P(x\cap y) = P(x) = P(y)$


\begin{align*}  
    \text{Lift (x} \rightarrow \text{y)} 
    & = \cfrac{P(x \cap y)}{P{(x)}*P{(y)}}\\
    & = \cfrac{P(x)}{P{(x)}*P{(y)}}\\
    & = \cfrac{1}{P{(y)}} = \cfrac{1}{P{(x)}}
\end{align*} 

#### If P(y) is large (say 0.99): 

\begin{align*}  
    \text{Lift (x} \rightarrow \text{y)} 
    & = \cfrac{1}{P{(y)}} = \cfrac{1}{0.99} = 1.0101
\end{align*}

*  Buying item `y` is very common (`x` as well). 
*  Even if they do both appear in a single transaction, it is more likely due to `commonality` rather than `association`, so the Lift from one to the other is close to no association (independent).

#### If P(y) is small (say 0.01): 

\begin{align*}  
    \text{Lift (x} \rightarrow \text{y)} 
    & = \cfrac{1}{P{(y)}} = \cfrac{1}{0.002} = 500
\end{align*}

*  Buying item `y` is very rare (`x` as well)
* If they do both appear in a single transaction, it is very unlikely to happen together, hence there is a strong association, so the Lift from one to the other is large.

### Lift Case2 - Mutually exclusive:  (No intersection at all / intersection at minuscule scale)

 If `x` and `y` are mutually exclusive, then P(x $\cap$ y) = 0

i.e., \begin{align*}  
    \text{Lift (x} \rightarrow \text{y)} = 0
    \end{align*}
    
Buying item `x` will immediately indicate the customer will not buy item `x`.

\begin{align*}  
    P(x \cap y) = 0.01
\end{align*} 


\begin{align*}  
    \text{Lift (x} \rightarrow \text{y)} 
    & = \cfrac{P(x \cap y)}{P{(x)}*P{(y)}}\\
    & = \cfrac{0.01}{P{(x)}*P{(y)}} = \text{small value irrespective of value of P(x) or P(y)}
\end{align*} 



* In the case P(y) and  P(x) is `large` (say 0.9), the Lift is small (i.e. 0.01/0.9 = 0.011). 
* Individually each item are `commonly brought`, but they are not often brought together.


* In the case P(y) and P(x) is `small` (say 0.01), the Lift is close to 1 (i.e. 0.01/0.01 = 1). 
* Two uncommon items are brought together, they `may be associated or may not be`. 
   * there is not enough evidence to suggest either.

### Conclusion - Lift

*  Large overlap indicates strong association 
*  Small overlap indicates small association
*  The result does depends on the probability (frequency of transactions within the total transactions) of buying individual items. 


*  Lift close to 1, it is almost indifferent as buying item A affects buying item B. 
    * no association between items.
*  Large Lift, buying item A are likely to buy/brought item B as well. 
*  Small Lift (less than 1), buying item A decrease the likelihood (discourage) buying item B.

###   Conviction

$$\text{conviction}(x\rightarrow y) = \frac{1 - \text{support}(y)}{1 - \text{confidence}(x\rightarrow y)}, \;\;\; \in [0, \infty]$$



The ratio of the expected frequency that `x` occurs without `y` (that is to say, the frequency that the rule makes an incorrect prediction) if `x` and `y` were independent divided by the observed frequency of incorrect predictions.

* Conviction compares the probability that `x` appears without `y` if they were dependent with the actual frequency of the appearance of `x` without `y`. 
  * In that respect it is similar to lift, however, it contrast to lift it is a directed measure.


* A high conviction value means that the `y` is highly depending on the `x`. 
  *  For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as 'inf'. 
 

* Similar to lift, if items are independent, the conviction is 1.


###   Leverage


$$\text{levarage}(x\rightarrow y) = \text{support}(x\rightarrow y) - \text{support}(x) \times \text{support}(y), \;\;\; \in [-1, 1]$$



* Computes the difference between the observed frequency of `x` and `y` appearing together and the frequency that would be expected if `x` and `y` were independent. 
* A `leverage` value of 0 indicates independence.
 

The rational in a sales setting is to find out how many more units (items `x` and `y` together) are sold than expected from the independent sells.  

## Data mining functions

### 1. Possible combinations

If we have 4 items, a, b, c, & d; 
What are the maximum number of unique item-set? 

{ a, b, c, d, ab, ac, ... ,abcd}

$$2^4 - 1 = 15$$

### 2. TransactionEncoder

we can transform a dataset into an array format

In [3]:
from mlxtend.preprocessing import TransactionEncoder # mlxtend : machine learning extensions

dataset = [['Apple', 'Beer', 'Rice', 'Chicken'],
           ['Apple', 'Beer', 'Rice'],
           ['Apple', 'Beer'],
           ['Apple', 'Bananas'],
           ['Milk', 'Beer', 'Rice', 'Chicken'],
           ['Milk', 'Beer', 'Rice'],
           ['Milk', 'Beer'],
           ['Apple', 'Bananas']]

In [4]:
import pandas as pd

In [5]:
df = pd.DataFrame(dataset)
df

Unnamed: 0,0,1,2,3
0,Apple,Beer,Rice,Chicken
1,Apple,Beer,Rice,
2,Apple,Beer,,
3,Apple,Bananas,,
4,Milk,Beer,Rice,Chicken
5,Milk,Beer,Rice,
6,Milk,Beer,,
7,Apple,Bananas,,


In [6]:
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
te_ary

array([[ True, False,  True,  True, False,  True],
       [ True, False,  True, False, False,  True],
       [ True, False,  True, False, False, False],
       [ True,  True, False, False, False, False],
       [False, False,  True,  True,  True,  True],
       [False, False,  True, False,  True,  True],
       [False, False,  True, False,  True, False],
       [ True,  True, False, False, False, False]])

In [7]:
te.columns_

['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']

In [8]:
ndf = pd.DataFrame(te_ary, 
             columns=te.columns_
            )
ndf

Unnamed: 0,Apple,Bananas,Beer,Chicken,Milk,Rice
0,True,False,True,True,False,True
1,True,False,True,False,False,True
2,True,False,True,False,False,False
3,True,True,False,False,False,False
4,False,False,True,True,True,True
5,False,False,True,False,True,True
6,False,False,True,False,True,False
7,True,True,False,False,False,False


`True` if the transaction i (row) has the item of corresponding column. 

`False` otherwise

Alternatively, we can use `get_dummies()`

In [9]:
df

Unnamed: 0,0,1,2,3
0,Apple,Beer,Rice,Chicken
1,Apple,Beer,Rice,
2,Apple,Beer,,
3,Apple,Bananas,,
4,Milk,Beer,Rice,Chicken
5,Milk,Beer,Rice,
6,Milk,Beer,,
7,Apple,Bananas,,


In [10]:
dummies = pd.get_dummies(df)
dummies

Unnamed: 0,0_Apple,0_Milk,1_Bananas,1_Beer,2_Rice,3_Chicken
0,1,0,0,1,1,1
1,1,0,0,1,1,0
2,1,0,0,1,0,0
3,1,0,1,0,0,0
4,0,1,0,1,1,1
5,0,1,0,1,1,0
6,0,1,0,1,0,0
7,1,0,1,0,0,0


In [13]:
dummies = dummies.rename(columns = {'0_Apple':'Apple', 
                                    '0_Milk' : 'Milk', 
                                    '1_Bananas': 'Bananas',
                                    '1_Beer' : 'Beer', 
                                    '2_Rice' : 'Rice', 
                                    '3_Chicken': 'Chicken'}
            )
dummies

Unnamed: 0,Apple,Milk,Bananas,Beer,Rice,Chicken
0,1,0,0,1,1,1
1,1,0,0,1,1,0
2,1,0,0,1,0,0
3,1,0,1,0,0,0
4,0,1,0,1,1,1
5,0,1,0,1,1,0
6,0,1,0,1,0,0
7,1,0,1,0,0,0


In [14]:
ndf

Unnamed: 0,Apple,Bananas,Beer,Chicken,Milk,Rice
0,True,False,True,True,False,True
1,True,False,True,False,False,True
2,True,False,True,False,False,False
3,True,True,False,False,False,False
4,False,False,True,True,True,True
5,False,False,True,False,True,True
6,False,False,True,False,True,False
7,True,True,False,False,False,False


### Nested `for loop`

In [15]:
df

Unnamed: 0,0,1,2,3
0,Apple,Beer,Rice,Chicken
1,Apple,Beer,Rice,
2,Apple,Beer,,
3,Apple,Bananas,,
4,Milk,Beer,Rice,Chicken
5,Milk,Beer,Rice,
6,Milk,Beer,,
7,Apple,Bananas,,


In [16]:
type(df.values[1][3])

NoneType

In [17]:
dfValueListA = []
type(dfValueListA)

list

In [18]:
for i in range(df.shape[0]):
    for j in range(df.shape[1]):
        #print(i,j)
        dfValueListA.append(df.values[i][j])

In [19]:
type(dfValueListA[7])

NoneType

To change to `list` of `string`s: 

In [20]:
dfB = df.applymap(str)
type(dfB.values[1][3])

str

In [21]:
dfValueListB = []
type(dfValueListB)

list

In [22]:
for i in range(dfB.shape[0]):
    for j in range(dfB.shape[1]):
        #print(i,j)
        dfValueListB.append(dfB.values[i][j])

In [23]:
#dfValueListB # is a list of strings. 

In [24]:
type(dfValueListB[7])

str

Here, we have a list of 'str's. 

Alternatively: 

In [25]:
dfListC = []
for i in range(df.shape[0]):
    dfListC.append( [
                    str(df.values[i,j])  for j in range (df.shape[1])
                    ]
                  )
dfListC

[['Apple', 'Beer', 'Rice', 'Chicken'],
 ['Apple', 'Beer', 'Rice', 'None'],
 ['Apple', 'Beer', 'None', 'None'],
 ['Apple', 'Bananas', 'None', 'None'],
 ['Milk', 'Beer', 'Rice', 'Chicken'],
 ['Milk', 'Beer', 'Rice', 'None'],
 ['Milk', 'Beer', 'None', 'None'],
 ['Apple', 'Bananas', 'None', 'None']]

## References

[1] Tan, Steinbach, Kumar. Introduction to Data Mining. Pearson New International Edition. Harlow: Pearson Education Ltd., 2014. (pp. 327-414).

[2] Michael Hahsler, http://michael.hahsler.net/research/association_rules/measures.html

[3] R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, Washington D.C., May 1993

[4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data

[5]  Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 1991: p. 229-248.

[6] Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Turk. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA, May 1997

<!--NAVIGATION-->
< [previous](prev) | [Contents](toc.ipynb) | [next](next.ipynb) >