# Association Rules and Apriori Algorithm

- The *apriori* algorithm



In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Mining Association Rules

![](images/basket.png)

## Formal Statement of Problem

A set of *items*

- $I = {i_1, i_2, ..., i_m}$

A set of *transactions*

- $D$ where each transaction $T$ is a set of items such that $T \subset I$

A unique identifier for transactions.

- $TID$


### EXAMPLE

We have information on 60 customers.

- 10 transactions contain milk
- 20 transactions contain beer
- 30 transactions contain diapers
- 15 transactions that contain diapers also contain beer
- 5 transactions that contain diapers also contain milk

### Support

Think proportion of item.

```
Support(diaper) = (Transactions with (diaper) ) / (Total Transactions)
```

### Confidence

Likelihood item B is bought when item A is bought.  

```
Confidence(A -> B) = (Transactions(A and B))/(Transactions with A)
```


### Lift

The increase in ratio of sale of B when A is sold.

```
Lift(A -> B) = (Confidence(A -> B)/ Support(B))
```


In [3]:
support_diaper = 30/60

In [4]:
confidence_beer_diapers = 15/10
confidence_milk_diapers = 5/10

In [5]:
lift_beer_diaper = confidence_beer_diapers/support_diaper
lift_milk_diaper = confidence_milk_diapers/support_diaper

In [6]:
lift_beer_diaper

3.0

In [7]:
lift_milk_diaper

1.0

In [8]:
for lift in [lift_beer_diaper, lift_milk_diaper]:
    if lift == 1:
        print('No Association')
    elif lift < 1: 
        print('Not likely to be bought together')
    elif lift > 1:
        print('These are likely')

These are likely
No Association


### Implementation

- Restrict support and confidence values.
- Extract subsets larger than threshold
- Determine Confidence and select items
- Order the rules by descending order of lift

In [9]:
import pandas as pd

In [11]:
df = pd.read_csv('data/store_data.csv')

In [12]:
df.head()

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


In [13]:
df.columns

Index(['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes',
       'whole weat flour', 'yams', 'cottage cheese', 'energy drink',
       'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad',
       'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie',
       'spinach', 'olive oil'],
      dtype='object')

In [16]:
#!pip install apyori
from apyori import apriori

In [17]:
records = []
for i in range(df.shape[0]):
    records.append([str(df.values[i, j]) for j in range(0, df.shape[1])])

In [18]:
len(records)

7500

In [19]:
records[0]

['burgers',
 'meatballs',
 'eggs',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan']

In [20]:
records[-1]

['eggs',
 'frozen smoothie',
 'yogurt cake',
 'low fat yogurt',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan']

In [21]:
association_rules = apriori(records, min_support = 0.0045, min_confidence=0.2, min_lift=3, min_length = 3)
results = list(association_rules)

In [22]:
results[0]

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004533333333333334, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.2905982905982906, lift=4.843304843304844)])

In [23]:
for result in results[:10]:
    items = result[0]
    items = [x for x in items]
    print('Rule: ' + items[0] + '  --> ' + items[1])
    print('Support {:.4f}'.format(result[1]))
    print('Confidence {:.4f}'.format(float(result[2][0][2])))
    print('Lift {:.4f}'.format(result[2][0][3]))
    print('============\n')

Rule: light cream  --> chicken
Support 0.0045
Confidence 0.2906
Lift 4.8433

Rule: escalope  --> mushroom cream sauce
Support 0.0057
Confidence 0.3007
Lift 3.7903

Rule: pasta  --> escalope
Support 0.0059
Confidence 0.3729
Lift 4.7002

Rule: herb & pepper  --> ground beef
Support 0.0160
Confidence 0.3235
Lift 3.2916

Rule: ground beef  --> tomato sauce
Support 0.0053
Confidence 0.3774
Lift 3.8401

Rule: whole wheat pasta  --> olive oil
Support 0.0080
Confidence 0.2715
Lift 4.1302

Rule: pasta  --> shrimp
Support 0.0051
Confidence 0.3220
Lift 4.5145

Rule: nan  --> light cream
Support 0.0045
Confidence 0.2906
Lift 4.8433

Rule: frozen vegetables  --> chocolate
Support 0.0053
Confidence 0.2326
Lift 3.2602

Rule: ground beef  --> cooking oil
Support 0.0048
Confidence 0.5714
Lift 3.2816

