# Association Rule Learning

Association Rule Learning helps us to identify the correlation between a set of items. For example, in a grocery, we can identify two items that people usually buy together. Based on a real stroy, it was identified that baby diapers and beer bottles are bought quite frequently together. Thus, this association information can be leveraged and the products can be placed widely apart so that the person has to walk through a lot of other products and thus, can also buy something that was not planned.

There are two methods to identify this associative relationship.
1. Apriori
2. Eclat

### Import libraries and data

In [1]:
import pandas as pd
import numpy as np
from apyori import apriori

In [2]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv')
dataset.head(5)

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


### Apriori

**Key terms**
1. **Support (X)** = Number of documents with X / Total number of documents
2. **Confidence (X -> Y)** = Number of documents with Y and X / Number of documents with X
3. **Lift (X -> Y)** = Confidence (X -> Y) / Support (X)

The algorithm is based on `prior` knowledge. 

**Algorithm**
1. Decide minimum values for confidence and support
2. Consider all transactions with support more than minimum support
3. Take all rules that have confidence more than minimum confidence
4. Sort the rules by decreasing life

I'll first define certain constants to be used for the algorithm.

In [3]:
MIN_SUPPORT = 0.003
MIN_CONFIDENCE = 0.2
MIN_LIFT = 3
TOTAL_LISTS = dataset.shape[0]

Next, I'll extract all the unqiue items that are in the complete dataset.

In [4]:
items = set()
item_docs = dict()
for i in range(dataset.shape[0]):
    for j in range(dataset.shape[1]):
        item = dataset.iloc[i,j]
        if type(item) == str:
            items.add(item)
            if item in item_docs:
                item_docs[item].append(i)
            else:
                item_docs[item] = [i]

Given that I have the list of all items in all lists, the next step is to calculate the `support` for each item. For this, I create a dictionary where the key would be the item and the value for that key would be the support for that item.

In [5]:
support = {}
for item in items:
    support[item] = len(item_docs[item]) / TOTAL_LISTS

Next, I iterate over all the possible pair of items and extract the confidence and lift. After checking that the minimum values are met for support, confidence and list, I append the pair to the result list.
Finally, I sort the result list based on the decreasing order of the lift.

In [6]:
result = []
index = 0
for item1 in items:
    for item2 in items:
        if item1 != item2:
            common_count = len(np.intersect1d(item_docs[item1], item_docs[item2]))
            confidence = common_count / len(item_docs[item1])
            lift = confidence / support[item2]
            if (confidence >= MIN_CONFIDENCE and support[item1] >= MIN_SUPPORT and lift >= MIN_LIFT):
                result.append({})
                result[index]['firstItem'] = item1
                result[index]['secondItem'] = item2
                result[index]['lift'] = lift
                index += 1
associations = sorted(result, key = lambda x: x['lift'], reverse = True)

I can now just print the top 10 associations that we derived using the **Apriori algorithm**.

In [7]:
for i in range(10):
    print('If someone buys \033[1m{}\033[0m, they are also likely to buy \033[1m{}\033[0m with a lift of \033[1m{}\033[0m'
          .format(associations[i]['firstItem'], associations[i]['secondItem'], associations[i]['lift']))

If someone buys [1mhand protein bar[0m, they are also likely to buy [1mprotein bar[0m with a lift of [1m12.45157719977864[0m
If someone buys [1mchocolate bread[0m, they are also likely to buy [1mred wine[0m with a lift of [1m8.886255924170616[0m
If someone buys [1mpet food[0m, they are also likely to buy [1mred wine[0m with a lift of [1m8.70490376245285[0m
If someone buys [1msparkling water[0m, they are also likely to buy [1mhot dogs[0m with a lift of [1m7.880220646178093[0m
If someone buys [1mdessert wine[0m, they are also likely to buy [1msalmon[0m with a lift of [1m5.717552887364208[0m
If someone buys [1mfromage blanc[0m, they are also likely to buy [1mhoney[0m with a lift of [1m5.178127589063794[0m
If someone buys [1mlight cream[0m, they are also likely to buy [1mchicken[0m with a lift of [1m4.843304843304844[0m
If someone buys [1mpasta[0m, they are also likely to buy [1mescalope[0m with a lift of [1m4.700185158809286[0m
If someone bu

### Eclat

**Key terms**
1. **Support (X and Y)** = Number of documents with X and Y both / Total number of documents

**Algorithm**
1. Decide minimum values for support
2. Consider all transactions with support more than minimum support
4. Sort the rules by decreasing support

I calculate the support for each pair of values and then sort them in the decreasing order based on `support`.

In [8]:
support = []
index = 0
for item1 in items:
    for item2 in items:
        if item1 != item2:
            common_count = len(np.intersect1d(item_docs[item1], item_docs[item2]))
            curr_support = common_count / TOTAL_LISTS
            if (curr_support >= MIN_SUPPORT):
                support.append({})
                support[index]['firstItem'] = item1
                support[index]['secondItem'] = item2
                support[index]['support'] = curr_support
                index += 1
associations = sorted(support, key = lambda x: x['support'], reverse = True)

I can now just print the top 10 associations that we derived using the **Eclat algorithm**.

In [9]:
for i in range(10):
    print('If someone buys \033[1m{}\033[0m, they are also likely to buy \033[1m{}\033[0m with a support of \033[1m{}\033[0m'
          .format(associations[i]['firstItem'], associations[i]['secondItem'], associations[i]['support']))

If someone buys [1mspaghetti[0m, they are also likely to buy [1mmineral water[0m with a support of [1m0.05973333333333333[0m
If someone buys [1mmineral water[0m, they are also likely to buy [1mspaghetti[0m with a support of [1m0.05973333333333333[0m
If someone buys [1mchocolate[0m, they are also likely to buy [1mmineral water[0m with a support of [1m0.05266666666666667[0m
If someone buys [1mmineral water[0m, they are also likely to buy [1mchocolate[0m with a support of [1m0.05266666666666667[0m
If someone buys [1meggs[0m, they are also likely to buy [1mmineral water[0m with a support of [1m0.05093333333333333[0m
If someone buys [1mmineral water[0m, they are also likely to buy [1meggs[0m with a support of [1m0.05093333333333333[0m
If someone buys [1mmilk[0m, they are also likely to buy [1mmineral water[0m with a support of [1m0.048[0m
If someone buys [1mmineral water[0m, they are also likely to buy [1mmilk[0m with a support of [1m0.048[0m
