# **What is affinity analysis?**

Affinity analysis is a type of data mining that gives similarity between samples
(objects). This could be the similarity between the following:
  
  •users on a website, in order to provide varied services or targeted advertising

  •items to sell to those users, in order to provide recommended
movies or products
  
  •human genes, in order to find people that share the same ancestors 

We can measure affinity in a number of ways. For instance, we can record how
frequently two products are purchased together. We can also record the accuracy
of the statement when a person buys object 1 and also when they buy object 2.

In [3]:
import numpy as np
dataset_filename = "/content/sample_data/affinity_dataset.txt"
x = np.loadtxt(dataset_filename)

In [17]:
# Numpy shape
num_samples, num_features = x.shape
print(f'{num_samples} lines.')
print(f'{num_features} columns.')

100 lines.
5 columns.


In [4]:
x[:5]

array([[0., 0., 1., 1., 1.],
       [1., 1., 0., 1., 0.],
       [1., 0., 1., 1., 0.],
       [0., 0., 1., 1., 1.],
       [0., 1., 0., 0., 1.]])

In [45]:
features = ["bread", "milk", "cheese", "apples", "bananas"]

## If a person buys product X, then they are likely to purchase product Y.

In [7]:
# Checking how many people bought apples based on the data.
apple_sales = 0
for person in x:
  if person[3] == 1: # This person bought apples.
    apple_sales += 1
print(f'{apple_sales} people bought apples.')

36 people bought apples.


In [20]:
# Checking how many people bought apples and bananas.
rule_valid = 0
rule_invalid = 0
for person in x:
  if person[3] == 1:
    if person[4] == 1:
      rule_valid += 1
    else:
      rule_invalid += 1
print(f'{rule_valid} people bought only apples.')
print(f'{rule_invalid} people bought apples and bananas.')

21 people bought only apples.
15 people bought apples and bananas.


In [30]:
# Now we have the data to build our Support and Confidence cases.
support = rule_valid # Support is the number of times a rule has been discovered.
confidence = rule_valid/apple_sales
print("The support is {0} and the confidence is {1:.3f}.".format(support, confidence))
print("As a percentage that is {0:.1f}%.".format(confidence*100))

The support is 21 and the confidence is 0.583.
As a percentage that is 58.3%.


In [46]:
from collections import defaultdict
# Now compute for all possible rules
valid_rules = defaultdict(int)
invalid_rules = defaultdict(int)
num_occurences = defaultdict(int)

for sample in x:
    for premise in range(num_features):
        if sample[premise] == 0: continue
        # Record that the premise was bought in another transaction
        num_occurences[premise] += 1
        for conclusion in range(num_features):
          if premise == conclusion: # It makes little sense to compare x -> x.
            continue
          if sample[conclusion] == 1:
            # Then it mean that the person bought the conclusion item.
            valid_rules[(premise, conclusion)] += 1
          else:
            # Then the person only bought the premise, but not the conclusion.
            invalid_rules[(premise, conclusion)] += 1

support = valid_rules # Support is simply our valid rules.
confidence = defaultdict(float)
for premise, conclusion in valid_rules.keys():
    confidence[(premise, conclusion)] = valid_rules[(premise, conclusion)] / num_occurences[premise]

for premise, conclusion in confidence:
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    print("Rule: If a person buys {0} they will also buy {1}".format(premise_name, conclusion_name))
    print(" - Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))
    print(" - Support: {0}".format(support[(premise, conclusion)]))
    print("")

Rule: If a person buys cheese they will also buy apples
 - Confidence: 0.610
 - Support: 25

Rule: If a person buys cheese they will also buy bananas
 - Confidence: 0.659
 - Support: 27

Rule: If a person buys apples they will also buy cheese
 - Confidence: 0.694
 - Support: 25

Rule: If a person buys apples they will also buy bananas
 - Confidence: 0.583
 - Support: 21

Rule: If a person buys bananas they will also buy cheese
 - Confidence: 0.458
 - Support: 27

Rule: If a person buys bananas they will also buy apples
 - Confidence: 0.356
 - Support: 21

Rule: If a person buys bread they will also buy milk
 - Confidence: 0.519
 - Support: 14

Rule: If a person buys bread they will also buy apples
 - Confidence: 0.185
 - Support: 5

Rule: If a person buys milk they will also buy bread
 - Confidence: 0.304
 - Support: 14

Rule: If a person buys milk they will also buy apples
 - Confidence: 0.196
 - Support: 9

Rule: If a person buys apples they will also buy bread
 - Confidence: 0.139
 