### Understanding Association Rule Measures

Testing conviction and confidence metrics

    Confidence = S(X,Y)/S(X)
    Conviction = S(X)S(-Y)/S(X,-Y)
    Lift = S(X,Y)/S(X)S(Y)

Write small functions that can calculate those measures. 
Play around with various X and Y values, see how they influence the measures.

In [2]:
def S(X):
    L = len(X)
    return X.count(1) / L

def S_Not(X):
    L = len(X)
    return X.count(0) / L

def S_X_Y(X, Y):
    L = len(X)
    count = 0
    for i in range(L):
        if X[i] == 1 and Y[i] == 1:
            count += 1
    return count / L

def S_X_NotY(X, Y):
    L = len(X)
    count = 0
    for i in range(L):
        if X[i] == 1 and Y[i] == 0:
            count += 1
    return count / L

def Conv(X, Y):
    S_X = S(X)
    S_NotY = S_Not(Y)
    S_X_NotY_val = S_X_NotY(X, Y)
    if S_X_NotY_val != 0:
        return (S_X * S_NotY) / S_X_NotY_val
    else:
        return float('inf')  # Avoid division by zero

def Conf(X, Y):
    S_X = S(X)
    S_X_Y_val = S_X_Y(X, Y)
    if S_X != 0:
        return S_X_Y_val / S_X
    else:
        return float('inf')  # Avoid division by zero

def Lift(X, Y):
    S_X = S(X)
    S_Y = S(Y)
    S_X_Y_val = S_X_Y(X, Y)
    if (S_X * S_Y) != 0:
        return S_X_Y_val / (S_X * S_Y)
    else:
        return float('inf')  # Avoid division by zero

# Example usage
X = [1, 1, 1, 1, 0, 0]
Y = [1, 1, 0, 0, 1, 1]

print('Conviction: {0:5.2f}'.format(Conv(X, Y)))
print('Confidence: {0:5.2f}'.format(Conf(X, Y)))
print('Lift: {0:5.2f}'.format(Lift(X, Y)))

Conviction:  0.67
Confidence:  0.50
Lift:  0.75


In [3]:
#Each column is a transaction 
#mostly uncorrelated X with Y
X=[1,1,1,1,0,0]
Y=[1,1,0,0,1,1]
print('Conviction:{0:5.2f}'.format(Conv(X,Y)))
print('Confidence:{0:5.2f}'.format(Conf(X,Y)))
print('Lift:{0:5.2f}'.format(Lift(X,Y)))


Conviction: 0.67
Confidence: 0.50
Lift: 0.75


In [4]:
#Only some X has Y
X=[1,1,1,1,1,1]
Y=[1,1,0,0,0,0]
print('Conviction:{0:5.2f}'.format(Conv(X,Y)))
print('Confidence:{0:5.2f}'.format(Conf(X,Y)))
print('Lift:{0:5.2f}'.format(Lift(X,Y)))

Conviction: 1.00
Confidence: 0.33
Lift: 1.00


In [5]:
#try many other combinations
#All X has Y;
#Only some X has Y
#every Y has X
#high confidence but lots of Y not related to X 
#high confidence, high conviction

def S(X):
    L = len(X)
    return X.count(1) / L

def S_Not(X):
    L = len(X)
    return X.count(0) / L

def S_X_Y(X, Y):
    L = len(X)
    count = 0
    for i in range(L):
        if X[i] == 1 and Y[i] == 1:
            count += 1
    return count / L

def S_X_NotY(X, Y):
    L = len(X)
    count = 0
    for i in range(L):
        if X[i] == 1 and Y[i] == 0:
            count += 1
    return count / L

def Conv(X, Y):
    S_X = S(X)
    S_NotY = S_Not(Y)
    S_X_NotY_val = S_X_NotY(X, Y)
    if S_X_NotY_val != 0:
        return (S_X * S_NotY) / S_X_NotY_val
    else:
        return float('inf')  # Avoid division by zero

def Conf(X, Y):
    S_X = S(X)
    S_X_Y_val = S_X_Y(X, Y)
    if S_X != 0:
        return S_X_Y_val / S_X
    else:
        return float('inf')  # Avoid division by zero

def Lift(X, Y):
    S_X = S(X)
    S_Y = S(Y)
    S_X_Y_val = S_X_Y(X, Y)
    if (S_X * S_Y) != 0:
        return S_X_Y_val / (S_X * S_Y)
    else:
        return float('inf')  # Avoid division by zero

# Test various combinations
combinations = [
    ([1, 1, 1, 1, 0, 0], [1, 1, 1, 1, 0, 0]),  # All X has Y
    ([1, 1, 1, 1, 0, 0], [1, 0, 1, 0, 1, 0]),  # Only some X has Y
    ([1, 1, 0, 0, 1, 1], [1, 1, 0, 0, 1, 1]),  # Every Y has X
    ([1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 1]),  # High confidence but lots of Y not related to X
    ([1, 1, 1, 1, 0, 0], [1, 1, 1, 1, 0, 0])   # High confidence, high conviction
]

for X, Y in combinations:
    print(f"X: {X}, Y: {Y}")
    print(f"Conviction: {Conv(X, Y):5.2f}")
    print(f"Confidence: {Conf(X, Y):5.2f}")
    print(f"Lift: {Lift(X, Y):5.2f}")
    print()

X: [1, 1, 1, 1, 0, 0], Y: [1, 1, 1, 1, 0, 0]
Conviction:   inf
Confidence:  1.00
Lift:  1.50

X: [1, 1, 1, 1, 0, 0], Y: [1, 0, 1, 0, 1, 0]
Conviction:  1.00
Confidence:  0.50
Lift:  1.00

X: [1, 1, 0, 0, 1, 1], Y: [1, 1, 0, 0, 1, 1]
Conviction:   inf
Confidence:  1.00
Lift:  1.50

X: [1, 1, 1, 0, 0, 0], Y: [1, 1, 1, 1, 1, 1]
Conviction:   inf
Confidence:  1.00
Lift:  1.00

X: [1, 1, 1, 1, 0, 0], Y: [1, 1, 1, 1, 0, 0]
Conviction:   inf
Confidence:  1.00
Lift:  1.50



### Question
1. What can you conclude about them the measures for X->Y:

    - High conviction is when  ....
    - High confidence is when ....
    - High lift is ..

### Answer
1. High Conviction:
    - **Definition**: Conviction measures the degree of implication of the rule, considering the absence of the consequent (Y).
    - **High** **Conviction**: High conviction occurs when the rule X -> Y rarely fails. This means that when X occurs, Y almost always occurs, and the absence of Y when X occurs is rare. High conviction indicates a strong implication of the rule.
    - **Example**: If X = [1, 1, 1, 1, 0, 0] and Y = [1, 1, 1, 1, 0, 0], the conviction is high because the absence of Y when X occurs is rare.

2. High Confidence:
    - **Definition**: Confidence measures the proportion of transactions containing X that also contain Y.
    - High **Confidence**: High confidence occurs when a high proportion of transactions that contain X also contain Y. This means that X is a strong predictor of Y.
    - **Example**: If X = [1, 1, 1, 1, 0, 0] and Y = [1, 1, 1, 1, 0, 0], the confidence is high because all transactions that contain X also contain Y.

3. High Lift:
    - **Definition**: Lift measures how much more likely Y is to occur when X occurs compared to when X does not occur. It is the ratio of the observed support to that expected if X and Y were independent.
    - **High** **Lift**: High lift occurs when the presence of X significantly increases the likelihood of Y occurring. This means that X and Y are strongly associated and not independent.
    - **Example**: If X = [1, 1, 1, 1, 0, 0] and Y = [1, 1, 1, 1, 0, 0], the lift is high because the presence of X significantly increases the likelihood of Y occurring.
