# Association Rule Mining-based Recommender System ---- Apriori algorithm

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Association-Rule-Mining-based-Recommender-System------Apriori-algorithm" data-toc-modified-id="Association-Rule-Mining-based-Recommender-System------Apriori-algorithm-1">Association Rule Mining-based Recommender System ---- Apriori algorithm</a></span><ul class="toc-item"><li><span><a href="#Business-Initiative" data-toc-modified-id="Business-Initiative-1.1">Business Initiative</a></span></li><li><span><a href="#Apriori-Algorithm" data-toc-modified-id="Apriori-Algorithm-1.2">Apriori Algorithm</a></span><ul class="toc-item"><li><span><a href="#Data-Processing" data-toc-modified-id="Data-Processing-1.2.1">Data Processing</a></span></li><li><span><a href="#Applying-Apriori" data-toc-modified-id="Applying-Apriori-1.2.2">Applying Apriori</a></span></li></ul></li><li><span><a href="#Business-Recommendations" data-toc-modified-id="Business-Recommendations-1.3">Business Recommendations</a></span></li></ul></li></ul></div>

## Business Initiative

In this notebook, I generated frequent itemsets and _association rules_ for a recommender system for an online retail company using **Association Rule Mining** via Apriori algorithm. 

Online retail company XYZ sells various products and looking to increase its revenue by promoting cross-selling (i.e. selling related or complementary items) opportunities to its customers. The company is looking to apply advanced analytics on its historical transactional data to answer the following business question:

**When a customer buys an item, what are the related or complementary items that can be presented to them to promote cross-selling?**

## Apriori Algorithm

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# !pip install apyori
from apyori import apriori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25ldone
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=4f6c2cf22c863f8ed373a9293cbb3257712abeb4331bc18faac9a2bf859e78b7
  Stored in directory: /Users/rayna/Library/Caches/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [7]:
data = pd.read_excel('transactions_by_dept.xlsx')

In [8]:
data.head()

Unnamed: 0,POS Txn,Dept,ID,Sales U
0,16120100160021008773,0261:HOSIERY,250,2
1,16120100160021008773,0634:VITAMINS & HLTH AIDS,102,1
2,16120100160021008773,0879:PET SUPPLIES,158,2
3,16120100160021008773,0973:CANDY,175,2
4,16120100160021008773,0982:SPIRITS,176,1


### Data Processing

The Apriori library we are going to use requires our dataset to be in the form of a **list of lists**, where the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list.

In [10]:
data1 = data.groupby('POS Txn')['Dept'].apply(list).reset_index(name='new')

In [12]:
data1.head()

Unnamed: 0,POS Txn,new
0,16120100160021008773,"[0261:HOSIERY, 0634:VITAMINS & HLTH AIDS, 0879..."
1,16120100160021008774,"[0597:HEALTH AIDS, 0604:PERSONAL CARE]"
2,16120100160021008775,"[0819:PRE-RECORDED A/V, 0826:SMALL ELECTRICS, ..."
3,16120100160021008776,[0961:GENERAL GROCERIES]
4,16120100160021008777,[0982:SPIRITS]


In [13]:
records = data1['new'].to_list()

In [14]:
records

[['0261:HOSIERY',
  '0634:VITAMINS & HLTH AIDS',
  '0879:PET SUPPLIES',
  '0973:CANDY',
  '0982:SPIRITS',
  '0983:WINE',
  '0991:TOBACCO'],
 ['0597:HEALTH AIDS', '0604:PERSONAL CARE'],
 ['0819:PRE-RECORDED A/V', '0826:SMALL ELECTRICS', '0982:SPIRITS'],
 ['0961:GENERAL GROCERIES'],
 ['0982:SPIRITS'],
 ['0982:SPIRITS', '0991:TOBACCO'],
 ['0879:PET SUPPLIES', '0982:SPIRITS', '0983:WINE', '0984:BEER'],
 ['0530:SCHOOL/OFFIC SUPP',
  '0597:HEALTH AIDS',
  '0601:VALUE ZONE',
  '0634:VITAMINS & HLTH AIDS',
  '0836:HOUSEHOLD CLEANING'],
 ['0593:PRESTIGE COSMETICS',
  '0597:HEALTH AIDS',
  '0598:BABY CARE',
  '0836:HOUSEHOLD CLEANING',
  '0965:PERISHABLES',
  '0973:CANDY',
  '0983:WINE'],
 ['0837:GENERAL HOUSEWARES', '0982:SPIRITS'],
 ['0879:PET SUPPLIES', '0973:CANDY', '0984:BEER'],
 ['0983:WINE'],
 ['0962:BEVERAGES', '0982:SPIRITS'],
 ['0982:SPIRITS', '0983:WINE'],
 ['0982:SPIRITS'],
 ['0638:GEN SPORTING GOODS',
  '0961:GENERAL GROCERIES',
  '0973:CANDY',
  '0991:TOBACCO'],
 ['0646:SEASONAL', 

### Applying Apriori

In [17]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
# convert the rules found by the apriori class into a list
association_results = list(association_rules)

In [20]:
# We mined 36 rules
print(len(association_results))

36


In [21]:
print(association_results[0])

RelationRecord(items=frozenset({'0590:MASS COSMETICS', '0603:BEAUTY CARE'}), support=0.010174418604651164, ordered_statistics=[OrderedStatistic(items_base=frozenset({'0590:MASS COSMETICS'}), items_add=frozenset({'0603:BEAUTY CARE'}), confidence=0.40384615384615385, lift=6.267206477732794)])


In [23]:
#displays the rule, the support, the confidence, and lift for each rule in a more clear way:
for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: 0590:MASS COSMETICS -> 0603:BEAUTY CARE
Support: 0.010174418604651164
Confidence: 0.40384615384615385
Lift: 6.267206477732794
Rule: 0590:MASS COSMETICS -> 0604:PERSONAL CARE
Support: 0.005813953488372093
Confidence: 0.23076923076923078
Lift: 3.133603238866397
Rule: 0593:PRESTIGE COSMETICS -> 0603:BEAUTY CARE
Support: 0.006298449612403101
Confidence: 0.26
Lift: 4.034887218045113
Rule: 0603:BEAUTY CARE -> 0597:HEALTH AIDS
Support: 0.020348837209302327
Confidence: 0.21
Lift: 3.2589473684210524
Rule: 0597:HEALTH AIDS -> 0604:PERSONAL CARE
Support: 0.031007751937984496
Confidence: 0.31999999999999995
Lift: 4.345263157894736
Rule: 0634:VITAMINS & HLTH AIDS -> 0597:HEALTH AIDS
Support: 0.0048449612403100775
Confidence: 0.43478260869565216
Lift: 4.48695652173913
Rule: 0836:HOUSEHOLD CLEANING -> 0597:HEALTH AIDS
Support: 0.029554263565891473
Confidence: 0.305
Lift: 3.9843037974683546
Rule: 0598:BABY CARE -> 0604:PERSONAL CARE
Support: 0.0048449612403100775
Confidence: 0.28571428571428575


In [48]:
# organize the output into a dataframe and sort by Lift and Support
results = pd.DataFrame(columns = ['item1', 'item2', 'Support', 'Confidence', 'Lift'])

for item in association_results:
    pair = item[0] 
    items = [x for x in pair]
    results = results.append(pd.DataFrame({'item1': items[0], 'item2': items[1], 'Support': item[1], 'Confidence': item[2][0][2], 'Lift': item[2][0][3]}, index=[0]), ignore_index=True)

results.sort_values(by=['Lift', 'Support'])
results

Unnamed: 0,item1,item2,Support,Confidence,Lift
0,0590:MASS COSMETICS,0603:BEAUTY CARE,0.010174,0.403846,6.267206
1,0590:MASS COSMETICS,0604:PERSONAL CARE,0.005814,0.230769,3.133603
2,0593:PRESTIGE COSMETICS,0603:BEAUTY CARE,0.006298,0.26,4.034887
3,0603:BEAUTY CARE,0597:HEALTH AIDS,0.020349,0.21,3.258947
4,0597:HEALTH AIDS,0604:PERSONAL CARE,0.031008,0.32,4.345263
5,0634:VITAMINS & HLTH AIDS,0597:HEALTH AIDS,0.004845,0.434783,4.486957
6,0836:HOUSEHOLD CLEANING,0597:HEALTH AIDS,0.029554,0.305,3.984304
7,0598:BABY CARE,0604:PERSONAL CARE,0.004845,0.285714,3.879699
8,0603:BEAUTY CARE,0604:PERSONAL CARE,0.019864,0.308271,4.185991
9,0836:HOUSEHOLD CLEANING,0604:PERSONAL CARE,0.017926,0.243421,3.17988


## Business Recommendations

Make two business recommendations to deliver business outcomes from this association rule based recommender system.

From the mined association rules, we found that the support for 'Beauty Care' products is 0.01. The confidence is 0.403 which indicates that 40.3% of all transactions with 'Mass Cosmetics' contain 'Beauty Care' as well. The lift is 6.26 which means that 'Beauty Care' is 6.26 times more likely to be bought by the customers that buy 'Mass Cosmetics' as compared to its default sale.

From the mined association rules, we found that the support for 'Candy' products is 0.007. The confidence is 0.516 which indicates that 51.6% of all transactions with 'Spirits' contain 'Candy' as well. The lift is 5.55 which means that 'Candy' is 5.55 times more likely to be bought by the customers that buy 'Spirits' as compared to its default sale.

We recommend **Mass Cosmetics**, **Personal Care** and **Beauty Care** can be placed together so that when a customer buys one of the product he doesn't have to go far away to buy the other product.

People who buy one of the **Mass Cosmetics** can be targeted through an advertisement campaign to buy **Beauty Care**.

Both **Spirits** and **Candy** can be packaged together. Collective discounts can be offered on these products if the customer buys both of them.