# APRIORI MODELING FOR MARKET BASKET ANALYSIS
One of the algorithm for Market Basket Analysis (MBA).<BR>
Proposed by Agrawal and Srikant in 1994. <BR>
Terminologies that should be known in Apriori is support, confidence, lift and conviction. <br>
Apriori is one of Association machine learning model, so in this model Apriori Algorithm will be used with Association Rule Learning.

# I. Import Libraries
To get started, let's import the libraries.

In [0]:
import pandas as pd
from mlxtend.preprocessing import OnehotTransactions
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder

# II. Load Data
Read data to perform Market Basket Analysis modeling by input data as 'data' variable.<br>
Here it is the item

![Item](Item.jpg)

In [62]:
data = pd.read_csv('form.csv')
data.head()

Unnamed: 0,Timestamp,Name,Item 1,Item 2,Item 3
0,2019/09/17 8:58:22 AM GMT+7,Firdaus Adi Nugroho,HP,Racket,Watch
1,2019/09/17 8:58:24 AM GMT+7,faizah,HP,Camera,Watch
2,2019/09/17 8:58:30 AM GMT+7,andrem,Watch,Camera,Music Pad
3,2019/09/17 8:58:30 AM GMT+7,laili,Camera,Watch,Mouse
4,2019/09/17 8:58:33 AM GMT+7,Tara,HP,Watch,Music Pad


In [7]:
data.shape

(24, 5)

To perform Apriori Algorithm modeling, we only use data that contain only items. So, we need to drop the 'Timestamp' and 'Name' columns.

In [63]:
datafix=data.drop(['Timestamp', 'Name'], axis=1)
datafix.head()

Unnamed: 0,Item 1,Item 2,Item 3
0,HP,Racket,Watch
1,HP,Camera,Watch
2,Watch,Camera,Music Pad
3,Camera,Watch,Mouse
4,HP,Watch,Music Pad


In [32]:
datafix.tail()

Unnamed: 0,Item 1,Item 2,Item 3
19,Soap,Bag,Guitar
20,Router,Bag,
21,Watch,Racket,Soap
22,Music Pad,Soap,Watch
23,Mouse,Camera,Soap


# III. Convert to List
In Apriori Algorithm, we use data in **list** form. So, in this case, dataframe need to be converted to list as preprocessing step before modeling.

In [64]:
datafixx = datafix[['Item 1','Item 2', 'Item 3']].values.tolist() #values.tolist is used to change dataframe to list
datafixx

[['HP', 'Racket', 'Watch'],
 ['HP', 'Camera', 'Watch'],
 ['Watch', 'Camera', 'Music Pad'],
 ['Camera', 'Watch', 'Mouse'],
 ['HP', 'Watch', 'Music Pad'],
 ['Watch', 'Racket', 'Camera'],
 ['HP', 'Camera', 'Watch'],
 ['Watch', 'Camera', 'Music Pad'],
 ['Racket', 'Soap', 'Guitar'],
 ['Racket', 'Camera', 'Guitar'],
 ['Camera', 'Bag', nan],
 ['Music Pad', 'Guitar', 'Camera'],
 ['Camera', 'Watch', nan],
 ['Guitar', 'Camera', 'Music Pad'],
 ['Camera', 'Watch', 'Music Pad'],
 ['Camera', 'Racket', 'Guitar'],
 ['Guitar', 'Camera', 'Watch'],
 ['Guitar', 'Watch', nan],
 ['Camera', 'Watch', nan],
 ['Soap', 'Bag', 'Guitar'],
 ['Router', 'Bag', nan],
 ['Watch', 'Racket', 'Soap'],
 ['Music Pad', 'Soap', 'Watch'],
 ['Mouse', 'Camera', 'Soap']]

# IV. Modeling by Apriori Algorithm
Apriori algorithm is a process used to find frequent-itemset by doing iteration on dataset. Frequent-itemset is an indication of itemset that has a frequency of occurrence more than threshold that has been determined. <br>
Association Rule Learning is data mining techniques to find associative rules between a combination of items. <br>
Terminologies in Association Rule Learning :
  1. Support is an indication of how frequently the itemset   appears in the dataset. 
  2. Confidence is an indication of how often the rule has     been found to be true.
  3. Lift is the ratio of the observed support to that         expected if X and Y were independent.
  4. Conviction is the ratio of expected frequency that X     occurs without Y (rule makes an incorrect prediction).
  
Meanwhile, itemsets are consist of two combination, which is X is called antecedent or left-hand-side (LHS) and Y is called consequent or right-hand-side (RHS).

## Transaction Encoder
In the data, there are some 'nan' or 'null'so we need to remove the 'nan' columns first so the data can be processed in machine algorithm.<br>

Then, processing with **TransactionEncoder** to Encodes database transaction data in form of a Python list of lists into a NumPy array for the sake of memory efficiency when working with large datasets such as doing Market Basket Analysis.

Using and **TransactionEncoder** object, we can transform dataset into an array format suitable for typical machine learning APIs. Via the fit method, the **TransactionEncoder** learns the unique labels in the dataset, and via the transform method, it transforms the input dataset (a Python list of lists) into a one-hot encoded boolean array.

In [73]:
datafixx = [['HP', 'Racket', 'Watch'],
           ['HP', 'Camera', 'Watch'],
           ['Watch', 'Camera', 'Music Pad'],
           ['Camera', 'Watch', 'Mouse'],
           ['HP', 'Watch', 'Music Pad'],
           ['Watch', 'Racket', 'Camera'],
           ['HP', 'Camera', 'Watch'],
           ['Watch', 'Camera', 'Music Pad'],
           ['Racket', 'Soap', 'Guitar'],
           ['Racket', 'Camera', 'Guitar'],
           ['Camera', 'Bag'],
           ['Music Pad', 'Guitar', 'Camera'],
           ['Camera', 'Watch'],
           ['Guitar', 'Camera', 'Music Pad'],
           ['Camera', 'Watch', 'Music Pad'],
           ['Camera', 'Racket', 'Guitar'],
           ['Guitar', 'Camera', 'Watch'],
           ['Guitar', 'Watch'],
           ['Camera', 'Watch'],
           ['Soap', 'Bag', 'Guitar'],
           ['Router', 'Bag'],
           ['Watch', 'Racket', 'Soap'],
           ['Music Pad', 'Soap', 'Watch'],
           ['Mouse', 'Camera', 'Soap']]
te = TransactionEncoder()
te_ary = te.fit(datafixx).transform(datafixx)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,Bag,Camera,Guitar,HP,Mouse,Music Pad,Racket,Router,Soap,Watch
0,False,False,False,True,False,False,True,False,False,True
1,False,True,False,True,False,False,False,False,False,True
2,False,True,False,False,False,True,False,False,False,True
3,False,True,False,False,True,False,False,False,False,True
4,False,False,False,True,False,True,False,False,False,True
5,False,True,False,False,False,False,True,False,False,True
6,False,True,False,True,False,False,False,False,False,True
7,False,True,False,False,False,True,False,False,False,True
8,False,False,True,False,False,False,True,False,True,False
9,False,True,True,False,False,False,True,False,False,False


In [81]:
frequent_itemsets = apriori(df, min_support=0.05, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.125,(Bag)
1,0.666667,(Camera)
2,0.333333,(Guitar)
3,0.166667,(HP)
4,0.083333,(Mouse)
5,0.291667,(Music Pad)
6,0.25,(Racket)
7,0.208333,(Soap)
8,0.625,(Watch)
9,0.208333,"(Camera, Guitar)"


In [78]:
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.05)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.5)
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,(Mouse),(Camera),0.083333,0.666667,0.083333,1.0,1.5,0.027778,inf,1
1,(Camera),(Mouse),0.666667,0.083333,0.083333,0.125,1.5,0.027778,1.047619,1
2,(Racket),(Guitar),0.25,0.333333,0.125,0.5,1.5,0.041667,1.333333,1
3,(Guitar),(Racket),0.333333,0.25,0.125,0.375,1.5,0.041667,1.2,1
4,(Watch),(HP),0.625,0.166667,0.166667,0.266667,1.6,0.0625,1.136364,1
5,(HP),(Watch),0.166667,0.625,0.166667,1.0,1.6,0.0625,inf,1
6,(Racket),(Soap),0.25,0.208333,0.083333,0.333333,1.6,0.03125,1.1875,1
7,(Soap),(Racket),0.208333,0.25,0.083333,0.4,1.6,0.03125,1.25,1
8,"(Music Pad, Guitar)",(Camera),0.083333,0.666667,0.083333,1.0,1.5,0.027778,inf,2
9,(Camera),"(Music Pad, Guitar)",0.666667,0.083333,0.083333,0.125,1.5,0.027778,1.047619,1


In [80]:
rules.shape

(16, 10)

# V. Conclusion
With the minimum threshold of parameter:
- support : 0.05
- confidence : 0.05
- lift : 1.5

We can make some conclusion as below:

**From the support result above, it can be seen that**
- The highest support itemsets with length one is :
  1. {Camera} with support 66.7%
  2. {Watch} with support 62.5%
  3. {Guitar} with support 33.3%
  
- The highest support itemsets with length two is :
  1. {Watch, Camera} with support 41.7%
  2. {Camera, Guitar}, {Music Pad, Camera}, {Watch, Music         Pad} with support 20.8%
  3. {Watch, HP} with support 16.7% 
  
- The highest support itemsets with length three is :
  1. {Watch, Music Pad, Camera} with support 12.5%
  2. And the rest of length three item is with support 8.3%
  
So, the top three sold item is Camera, Watch, and Guitar. Meanwhile, the best combination itemsets is {Watch, Camera}, so those combination can be seen as customer transaction behavior and the seller can combine those two items to gain more profit such as increase the production, combine those items as promo package, increase the awareness of other products, etc.

**From the result of rules above, it can be seen that**<br>
The most correct transaction rules is with confidence = 1, those rules contain such as:
- {Mouse} => {Camera}
- {HP} => {Watch}
- {Music Pad, Guitar} => {Camera}
- {Camera, HP} => {Watch}

which means that for 100% of the transactions containing **{X}** as **antecedent** and **{Y}** as **consequent** the rule is correct or 100% of the times a customer buys **{X}**, **{Y}** will be bought as well.

