# Market Basket Analysis

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy. <br>
Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules. <br>

[ref](http://https://towardsdatascience.com/a-gentle-introduction-on-market-basket-analysis-association-rules-fa4b986a40ce)

* * *

## Library 

What library I'am used

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

##  Read CSV

Read data for dataset

In [10]:
df = pd.read_csv('data/form.csv',delimiter=';')

In [11]:
df.drop('Name',axis=1,inplace=True)

In [12]:
df.shape

(24, 3)

## df.shape

assign a row and column and then assign to 'records' variable. 'records' is use for association rules later. Library 'apriori' only accept list -> [] 

In [13]:
count_row = df.shape[0]
count_col = df.shape[1]

In [14]:
records = []
for i in range(count_row):
    records.append([str(df.values[i,j]) for j in range(count_col)])

## Association Rules (Apriori)

Association rule mining is a technique to identify underlying relations between different items. <br>

[ref](https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/)

In [15]:
association_rules = apriori(records, min_support=0.2, min_confidence=0.2, min_lift=1, min_length=2)
association_results = list(association_rules)

## See the result

after using the apriori algorithm, we want to see what the apriori produces. Then i do a looping to print the results.

In [16]:
for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    
    if len(items) == 1:
        print("Rule: " + items[0])
    elif len(items) == 2:
        print("Rule: " + items[0] + " -> " + items[1])
    elif len(items) == 3:
        print("Rule: " + items[0] + " -> " + items[1] + " -> " + items[2])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")
    print('')

Rule: Camera
Support: 0.6666666666666666
Confidence: 0.6666666666666666
Lift: 1.0

Rule: Guitar
Support: 0.3333333333333333
Confidence: 0.3333333333333333
Lift: 1.0

Rule: Music Pad
Support: 0.2916666666666667
Confidence: 0.2916666666666667
Lift: 1.0

Rule: Racket
Support: 0.25
Confidence: 0.25
Lift: 1.0

Rule: Soap
Support: 0.20833333333333334
Confidence: 0.20833333333333334
Lift: 1.0

Rule: Watch
Support: 0.625
Confidence: 0.625
Lift: 1.0

Rule: nan
Support: 0.20833333333333334
Confidence: 0.20833333333333334
Lift: 1.0

Rule: Camera -> Music Pad
Support: 0.20833333333333334
Confidence: 0.31250000000000006
Lift: 1.0714285714285716

Rule: Watch -> Camera
Support: 0.4166666666666667
Confidence: 0.6250000000000001
Lift: 1.0000000000000002

Rule: Watch -> Music Pad
Support: 0.20833333333333334
Confidence: 0.7142857142857143
Lift: 1.1428571428571428



## Explanation from the result

Top 7 are only one item, so we are only focusing on last 3. <br>

Support is an indication of how frequently the itemset appears in the dataset. <br>
Confidence is an indication of how often the rule has been found to be true. <br>
Lift is the ratio of the observed support to that expected if X and Y were independent. <br>

1.Rule: Music Pad -> Camera <br>
Support: 0.20833333333333334 <br>
Confidence: 0.31250000000000006 <br>
Lift: 1.0714285714285716 <br>

The cofidence are the lowest than 2 others, we can take interpretation that are the first rule is rarely found.

2.Rule: Watch -> Camera <br>
Support: 0.4166666666666667 <br>
Confidence: 0.6250000000000001 <br>
Lift: 1.0000000000000002 <br>

This rule have a good confidence and the highest support. The highest support means someone who buy watch, are also buy camera.

3.Rule: Watch -> Music Pad <br>
Support: 0.20833333333333334 <br>
Confidence: 0.7142857142857143 <br>
Lift: 1.1428571428571428 <br>

This rule have confidence the highest but the support is low. Thats mean the rule are the most we can find on the dataset. Every people who buy watch, we can offer music pad for him/her.