## Evaluating quality of Association Rules

## Overview

A strong association rule may or may not be interesting for a specific application. Some measures have been developed to help evaluate association rules. `mlxtend` implements two such measures, Kulczynski Measure and Imbalance Ratio.

#### Kulczynski Measure:

The Kulczynski measure $K_{A,B}$ can be interpreted as the average between the confidence that $A ⇒ B$ and the confidence that $B ⇒ A$

The Kulczynski measure $K_{A,B} ∈ [0, 1]$ of the itemsets $A ⊆ I$ and
$B ⊆ I$ such that $A ∩ B = \varnothing$ is given by

$$K_{A,B} = \frac{V_{A⇒B} + V_{B⇒A}}{2}$$

$$K_{A,B} = \frac{1}{2} \Bigg[\frac{sup(A \cup B)}{sup(A)} + \frac{sup(A \cup B)}{sup(B)} \Bigg]$$

- If $K_{A,B} = 0$, then $A ⊆ T$ implies that $B \nsubseteq T$ for any transaction $T$
- If $K_{A,B} = 1$, then $A ⊆ T$ implies that $B ⊆ T$ for any transaction $T$
- Note that the Kulczynski measure is symmetric: $K_{A,B} = K_{B,A}$

#### Imbalance Ratio:
The imbalance ratio $I_{A,B}$ can be interpreted as the ratio between the absolute difference between the support count of $A$ and the support count of $B$ and the number of transactions that contain $A$, $B$, or both $A$ and $B$
- The imbalance ratio $I_{A,B} ∈ [0, 1]$ of the itemsets $A ⊆ I$ and $B ⊆ I$ is given by

$$I_{A,B} =\frac{|N_A − N_B|}{N_A + N_B − N_{A∪B}}$$
- If $I_{A,B} = 0$, then $A$ and $B$ have the same support
- If $I_{A,B} = 1$, then either $A$ or $B$ has zero support
- Note that the imbalance ratio is symmetric: $I_{A,B} = I_{B,A}$

## References

[1] Chapter 6 of J. Han, M. Kamber, J. Pei, “Data Mining: Concepts and Techniques”, 3rd edition, Elsevier/Morgan Kaufmann, 2012

## Example 1 -- Evaluate Kulczynski Measure of an Association rule:


In [6]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.frequent_patterns import metrics

dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit_transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
freq_items = apriori(df, min_support=0.6, use_colnames=True)
rules = association_rules(freq_items, metric="confidence", min_threshold=0.7)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,0.0,inf
1,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0
2,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
3,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
4,(Milk),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
5,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
6,(Yogurt),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
7,"(Eggs, Onion)",(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
8,"(Eggs, Kidney Beans)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
9,"(Onion, Kidney Beans)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf


In [7]:
a = frozenset(['Onion'])
b = frozenset(['Kidney Beans', 'Eggs'])
metrics.kulczynski_measure(rules, a, b)

0.875

## Example 2 -- Evaluate Imabalance Ratio of an Association rule:

In [8]:
a = frozenset(['Onion'])
b = frozenset(['Kidney Beans', 'Eggs'])
metrics.imbalance_ratio(freq_items, a, b)

0.2500000000000001