### **Association Analysis Metrics**

* Let's implement support, confidence, and lift metrics from scratch.

In [1]:
def calculate_support(transactions, itemset):
    '''Calculate the support for a given itemset in a list of transactions.
    
    Support is defined as the proportion of transactions in the list that contain
    the given itemset. This function counts how many transactions contain the 
    itemset and divides that by the total number of transactions.
    
    Parameters:
        transactions: A list of transactions, each transaction is a list of items.
        itemset: A list of items that constitutes the itemset to calculate support for.
    
    Returns:
    float: The support value for the itemset, a number between 0 and 1.'''
    num_transactions = len(transactions)
    num_transactions_containing_itemset = sum(1 for transaction in transactions if all(item in transaction for item in itemset))
    
    return num_transactions_containing_itemset / num_transactions

In [2]:
def calculate_confidence(transactions, antecedent, consequent):
    '''Calculate the confidence for a rule defined by an antecedent leading to a consequent.
    
    Confidence is defined as the proportion of transactions containing the antecedent
    that also contain the consequent. This function uses the support of the antecedent
    and the support of the combined itemset (antecedent and consequent) to calculate confidence.
    
    Parameters:
        transactions (list of list): A list of transactions, each transaction is a list of items.
        antecedent (list): The antecedent itemset in the association rule.
        consequent (list): The consequent itemset in the association rule.
    
    Returns:
    float: The confidence value for the rule, a number between 0 and 1.'''
    support_X_and_Y = calculate_support(transactions, antecedent + consequent)
    support_X = calculate_support(transactions, antecedent)
    return support_X_and_Y / support_X

In [3]:
def calculate_lift(transactions, antecedent, consequent):
    '''Calculate the lift for a rule defined by an antecedent leading to a consequent.
    
    Lift measures how much more often the antecedent and consequent of a rule occur together
    than we would expect if they were statistically independent. This function calculates lift
    by dividing the confidence of the rule by the support of the consequent.
    
    Parameters:
    transactions (list of list): A list of transactions, each transaction is a list of items.
    antecedent (list): The antecedent itemset in the association rule.
    consequent (list): The consequent itemset in the association rule.
    
    Returns:
    float: The lift value for the rule. A value greater than 1 indicates a positive association.'''
    confidence = calculate_confidence(transactions, antecedent, consequent)
    support_Y = calculate_support(transactions, consequent)
    return confidence / support_Y

In [4]:
def calculate_leverage(transactions, antecedent, consequent):
    """
    Calculate the leverage for an association rule.
    
    Leverage provides a measure of the difference in probability between 
    the observed frequency of A and B appearing together and the frequency 
    that would be expected if A and B were independent.
    
    Parameters:
        transactions (list of list): A list of transactions, each transaction is a list of items.
        antecedent (list): The antecedent itemset in the association rule.
        consequent (list): The consequent itemset in the association rule.
    
    Returns:
    float: The leverage value for the rule.
    """
    support_X_and_Y = calculate_support(transactions, antecedent + consequent)
    support_X = calculate_support(transactions, antecedent)
    support_Y = calculate_support(transactions, consequent)
    return support_X_and_Y - (support_X * support_Y)


In [5]:
def calculate_conviction(transactions, antecedent, consequent):
    """
    Calculate the conviction of an association rule.
    
    Conviction compares the probability of X appearing without Y if they 
    were dependent with the actual frequency of the appearance of X without Y. 
    A high conviction value means that the consequent is highly dependent on the antecedent.
    
    Parameters:
        transactions (list of list): A list of transactions, each transaction is a list of items.
        antecedent (list): The antecedent itemset in the association rule.
        consequent (list): The consequent itemset in the association rule.
    
    Returns:
    float: The conviction value for the rule, which is high if consequent is highly dependent on antecedent.
    """
    support_X = calculate_support(transactions, antecedent)
    support_Y = calculate_support(transactions, consequent)
    support_X_and_Y = calculate_support(transactions, antecedent + consequent)
    confidence_X_to_Y = calculate_confidence(transactions, antecedent, consequent)
    
    # To handle the case when confidence is 1, which would cause division by zero,
    # we return infinity.
    if confidence_X_to_Y == 1:
        return float('inf')
    
    return (1 - support_Y) / (1 - confidence_X_to_Y)

#### **Sample Dataset**

In [6]:
# Sample transaction database
transactions = [
    ['milk', 'bread', 'orange juice'],
    ['milk', 'bread'],
    ['milk', 'cookies'],
    ['bread', 'butter'],
    ['milk', 'bread', 'butter'],
    ['bread', 'cookies'],
]

* **Support**

In [7]:
support_milk = calculate_support(transactions, ['milk'])
support_bread = calculate_support(transactions, ['bread'])
print(f"Support for 'milk': {support_milk}")

Support for 'milk': 0.6666666666666666


In [8]:
print(f"Support for 'bread': {support_bread}")

Support for 'bread': 0.8333333333333334


* **Confidence**

In [9]:
confidence_milk_to_bread = calculate_confidence(transactions, ['milk'], ['bread'])
print(f"Confidence for 'milk' -> 'bread': {confidence_milk_to_bread}")

Confidence for 'milk' -> 'bread': 0.75


* **Lift**

In [10]:
lift_milk_to_bread = calculate_lift(transactions, ['milk'], ['bread'])
print(f"Lift for 'milk' -> 'bread': {lift_milk_to_bread}")

Lift for 'milk' -> 'bread': 0.8999999999999999


* **Leverage**

In [11]:
# Calculate leverage and conviction for 'milk' -> 'bread'
leverage_milk_bread = calculate_leverage(transactions, ['milk'], ['bread'])
print("Leverage of 'milk' -> 'bread':", leverage_milk_bread)

Leverage of 'milk' -> 'bread': -0.05555555555555558


* **Conviction**


In [12]:
conviction_milk_bread = calculate_conviction(transactions, ['milk'], ['bread'])
print("Conviction of 'milk' -> 'bread':", conviction_milk_bread)

Conviction of 'milk' -> 'bread': 0.6666666666666665


* In this vanilla Python version above, the frequent itemset calculation is not explicitly performed because we are directly computing the support, confidence, and lift for specific itemsets or rules. However, in a typical association rule mining process, finding frequent itemsets is a crucial step before you can calculate confidence or lift. The support metric is actually used to determine whether an itemset is considered "frequent".

* Let's use mlextend library that you have used in your homework, to calculate the same metrics.

In [13]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

# Sample transaction database
transactions = [
    ['milk', 'bread', 'orange juice'],
    ['milk', 'bread'],
    ['milk', 'cookies'],
    ['bread', 'butter'],
    ['milk', 'bread', 'butter'],
    ['bread', 'cookies'],
]

# Initialize TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Use apriori to find frequent itemsets with min_support
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

# Use association_rules to calculate all metrics including leverage and conviction
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.01)

# Add leverage and conviction to the metrics to be calculated
rules = rules[['antecedents', 'consequents', 'support', 'confidence', 'lift', 'leverage', 'conviction']]

print(rules)

              antecedents            consequents   support  confidence  lift  \
0                (butter)                (bread)  0.333333    1.000000  1.20   
1                 (bread)               (butter)  0.333333    0.400000  1.20   
2               (cookies)                (bread)  0.166667    0.500000  0.60   
3                 (bread)              (cookies)  0.166667    0.200000  0.60   
4                  (milk)                (bread)  0.500000    0.750000  0.90   
5                 (bread)                 (milk)  0.500000    0.600000  0.90   
6          (orange juice)                (bread)  0.166667    1.000000  1.20   
7                 (bread)         (orange juice)  0.166667    0.200000  1.20   
8                  (milk)               (butter)  0.166667    0.250000  0.75   
9                (butter)                 (milk)  0.166667    0.500000  0.75   
10                 (milk)              (cookies)  0.166667    0.250000  0.75   
11              (cookies)               

In [14]:
support_milk_mlextend = frequent_itemsets[frequent_itemsets['itemsets'] == {'milk'}]['support'].values[0]

In [15]:
#Select the rule "milk" -> "bread"
selected_rule = rules[(rules['antecedents'] == frozenset({'milk'})) & (rules['consequents'] == frozenset({'bread'}))]

# Display the confidence and lift for the rule "milk" -> "bread"
confidence_milk_to_bread_mlextend = selected_rule['confidence'].values[0]
lift_milk_to_bread_mlextend = selected_rule['lift'].values[0]

In [16]:
# The 'rules' DataFrame now contains the conviction and leverage metrics for each rule
# You can filter the rules for the specific "milk" -> "bread" rule
selected_rule = rules[(rules['antecedents'] == frozenset({'milk'})) & (rules['consequents'] == frozenset({'bread'}))]

# Display the conviction and leverage for the rule "milk" -> "bread"
conviction_milk_to_bread_mlextend = selected_rule['conviction'].values[0]
leverage_milk_to_bread_mlextend = selected_rule['leverage'].values[0]

In [17]:
support_milk_mlextend

0.6666666666666666

In [18]:
confidence_milk_to_bread_mlextend

0.75

In [19]:
lift_milk_to_bread_mlextend

0.8999999999999999

In [20]:
conviction_milk_to_bread_mlextend

0.6666666666666665

In [21]:
leverage_milk_to_bread_mlextend

-0.05555555555555558

#### **Possible Interpretation of Results**

* The negative leverage suggests that "milk" and "bread" occur together less often than would be expected if they were statistically independent, indicating a possible negative association. The conviction value is less than 1, which, in this context, suggests that "bread" does not strongly depend on "milk"; that is, "bread" is likely to be purchased regardless of whether "milk" is purchased.