# Firdaus Adi Nugroho - Marketing Analytics using Apriori

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules <br><br>
The Apriori Algorithm falls in the Association Rule category. Association Rules are used to identify underlying relations between different items. Apriori Algorithm is a Machine Learning algorithm which is used to gain insight into the structured relationships between different items involved. The most prominent practical application of the algorithm is to recommend products based on the products already present in the user’s cart.<br>
<img src="marketbasketanalysis.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

In [14]:
#Import the library

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

## Data Preprocessing

In [15]:
#Load the dataset

movie_data = pd.read_csv('dataset_MBA_coba.csv')
movie_data.head()

Unnamed: 0,Name,Item 1,Item 2,Item 3
0,Firdaus Adi Nugroho,HP,Racket,Watch
1,faizah,HP,Camera,Watch
2,andrem,Watch,Camera,Music Pad
3,laili,Camera,Watch,Mouse
4,Tara,HP,Watch,Music Pad


In [16]:
movie_data.drop(['Name'],axis=1, inplace=True)

we drop the "Name" collumn because we just consider the three item collumn below to doing apriori algorithm

In [17]:
# See how many rows in the dataset

num_records = len(movie_data)
print(num_records)

24


In [19]:
records = []
for i in range(0, 23):
    records.append([str(movie_data.values[i,j]) for j in range(0,3)])

# Build Apriori Algorithm
Let’s suppose that we want only movies that are purchased at least 23 times.<br>


let's know more detail about the parameter!<br>
- Support: Support is an indication of how frequently the itemset appears in the dataset.<br>
- Confidence: Confidence is an indication of how often the rule has been found to be true.<br>
- Lift: the ratio of the observed support to that expected if X and Y were independent.<br><br>

So, we can define:
- The threshold support is 0.05. <br>
- The minimum confidence for the rules is 2% or 0.05. <br>
- Similarly, we specify the value for lift as 1.2<br>
Since we want at least two products in our rules. These values are mostly just arbitrarily chosen and they need to be fine-tuned empirically.

In [7]:
association_rules = apriori(records, min_support=0.05, min_confidence=0.05, min_lift=1.2)
association_results = list(association_rules)

In [8]:
print(len(association_results))

8


In [12]:
print(association_results[0])

RelationRecord(items=frozenset({'Bag', 'nan'}), support=0.08695652173913043, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Bag'}), items_add=frozenset({'nan'}), confidence=0.6666666666666666, lift=3.0666666666666664), OrderedStatistic(items_base=frozenset({'nan'}), items_add=frozenset({'Bag'}), confidence=0.4, lift=3.066666666666667)])


In [26]:
results=[]
for item in association_results:

    pair = item[0] 
    items = [x for x in pair]
    
    value0=str(items[0])
    value1=str(items[1])
    value2=str(item[1])[:7]
    value3=str(item[2][0][2])[:7]
    value4=str(item[2][0][3])[:7]
    
    rows=(value0, value1, value2, value3, value4)
    results.append(rows)
    
labels=['List 1', 'List 2','Support','Confidence','Lift' ]
movie_suggestion=pd.DataFrame.from_records(results, columns=labels)

movie_suggestion.sort_values(by='Support', ascending=False)

Unnamed: 0,List 1,List 2,Support,Confidence,Lift
3,Watch,HP,0.17391,1.0,1.53333
1,Racket,Guitar,0.13043,0.375,1.4375
0,Bag,,0.08695,0.66666,3.06666
2,Guitar,Soap,0.08695,0.25,1.4375
4,Racket,Soap,0.08695,0.33333,1.91666
5,Camera,Guitar,0.08695,0.4,1.31428
6,Camera,Guitar,0.08695,0.4,1.53333
7,Camera,Watch,0.08695,1.0,1.53333


# Conclusion
The conclusion of Apriori Algorithm is we can see at the result above, there are 7 row of data appear. The data appear based on each threshold that has defined before. There are:
- Support: 0.05 <br>
- Confidence: 0.05 <br>
- Lift: 1.2 <br><br>

Watch and Handphone has a lot of number transaction it shows from its supports, if we set Handphone over or next to Watch the probability of customer buy Handphone has big probability. And if we combine or cross-selling between Watch and Handphone, it will increasing number of transaction because of number of lift is 1.5<br><br>
Racket and Guitar has a lot of number transaction too that shows from its supports, but unfortunetly has little confidence, so the probability Racket customer will buy Guitar is not too much.