## **APRIORI**

[apriori documentation (mlxtend)](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)

---
#### **Support**
The support metric is defined for itemsets, not assocication rules. The table produced by the association rule mining algorithm contains three different support metrics: 'antecedent support', 'consequent support', and 'support'. Here, 'antecedent support' computes the proportion of transactions that contain the antecedent A, and 'consequent support' computes the support for the itemset of the consequent C. The 'support' metric then computes the support of the combined itemset A ∪ C -- note that 'support' depends on 'antecedent support' and 'consequent support' via min('antecedent support', 'consequent support').

**Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. We refer to an itemset as a "frequent itemset" if you support is larger than a specified minimum-support threshold. Note that in general, due to the downward closure property, all subsets of a frequent itemset are also frequent.**

---
#### **Confidence**
The confidence of a rule A->C is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence for A->C is different than the confidence for C->A. The confidence is 1 (maximal) for a rule A->C if the consequent and antecedent always occur together.

---
#### **Lift**
The lift metric is commonly used to measure how much more often the antecedent and consequent of a rule A->C occur together than we would expect if they were statistically independent. If A and C are independent, the Lift score will be exactly 1.

---
#### **Leverage**
Leverage computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. A leverage value of 0 indicates independence.

---
#### **Conviction**
A high conviction value means that the consequent is highly depending on the antecedent. For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as 'inf'. Similar to lift, if items are independent, the conviction is 1.

---

In [8]:
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder
import pandas as pd

te = TransactionEncoder()

df_full = pd.read_csv("random_data.csv")
df_agg = df_full.groupby('transaction_id').item.agg(list).reset_index(name="items")

In [10]:
def get_binary(item_no):
    sample_cart = [item_no]

    included_carts = []
    for basket in df_agg["items"]:
        if set(sample_cart).issubset(set(basket)):
            included_carts.append(basket)

    binary_matrix = te.fit(included_carts).transform(included_carts)
    binary_matrix = pd.DataFrame(binary_matrix, columns=te.columns_)
    return binary_matrix

In [12]:
bi_mtx = get_binary("T8WS1V")

In [13]:
def binary_sum(matrix):
    matrix = matrix.T
    matrix["sum"] = matrix.sum(axis=1)
    matrix = matrix.sort_values("sum", ascending=False)
    matrix["position"] = list(range(len(matrix)))
    return matrix

In [14]:
binary_sum(bi_mtx)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,sum,position
T8WS1V,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,98,0
T68NQ6,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,True,13,1
WMZTKY,False,False,False,False,True,False,False,False,False,False,...,False,False,False,True,False,False,False,False,11,2
CI7DTE,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,11,3
AYRJQX,False,False,False,False,False,True,False,False,False,False,...,False,False,True,False,True,False,False,False,10,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75J8MU,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,1,45702
TV3NFS,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,1,45703
TV40DU,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,1,45704
M36LHC,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,1,45705


In [17]:
def get_apriori(item_no, sup):
    data = apriori(get_binary(item_no), use_colnames=True, min_support=sup).sort_values("support", ascending=False)
    return data

In [19]:
get_apriori("T8WS1V", 0.08)

Unnamed: 0,support,itemsets
131,1.000000,(T8WS1V)
282,0.132653,"(T8WS1V, T68NQ6)"
130,0.132653,(T68NQ6)
294,0.112245,"(T8WS1V, WMZTKY)"
143,0.112245,(WMZTKY)
...,...,...
123,0.081633,(R7DBT0)
124,0.081633,(R9P0AZ)
125,0.081633,(RB6O6K)
126,0.081633,(RI0K4M)
