<h1 align="center"><font size="5">OVERVIEW</font></h1>

## 1. Introduction
**Market Basket Analysis** is the process of discovering frequent item sets in large transactional database is called market basket analysis. In another definition, market basket analysis is frequent item set mining leads to the discovery of associations and correlations among items. <br>
This is a technique used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions, providing information to understand the purchase behavior. The outcome of this type of technique is, in simple terms, a set of rules that can be understood as “if this, then that”. For more information about these topics.
### Story
In this kernel we are going to use the Apriori algorithm to perform a Market Basket Analysis for Laili Special Bakery. Laili Special Bakery want to improve the production and effectiveness, so that Laili Special Bakery hire the Data Scientist. Data Scientis want to implement market basket analysis.
### Definition of Algorithm
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.<br>

Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule **{onions,potatoes} ⇒ {burger}** found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.
### 1.1 Support
Support is an indication of how frequently the item set appears in the data set.
\begin{equation*}
supp(X⇒Y)=|X∪Y|/n
\end{equation*}

In other words, it’s the number of transactions with both X and Y divided by the total number of transactions. The rules are not useful for low support values. Let’s see different examples using the clothing store transactions from the previous table.
### 1.2 Confidence
For a rule X⇒Y, confidence shows the percentage in which Y is bought with X. It’s an indication of how often the rule has been found to be true.
\begin{equation*}
conf(X⇒Y)=supp(X∪Y)/supp(X)
\end{equation*}
### 1.3 Lift
The lift of a rule is the ratio of the observed support to that expected if X and Y were independent, and is defined as
\begin{equation*}
lift(X⇒Y)=supp(X∪Y)/supp(X)supp(Y)
\end{equation*}
### 1.4 Conviction
The conviction of a rule is defined as
\begin{equation*}
conv(X⇒Y)=1−supp(Y)/1−conf(X⇒Y)
\end{equation*}

It can be interpreted as the ratio of the expected frequency that X occurs without Y if X and Y were independent divided by the observed frequency of incorrect predictions. A high value means that the consequent depends strongly on the antecedent.

## 2. Loading Data

#### Load Library

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from apyori import apriori

#### Importing the Dataset

In [2]:
supermarket = pd.read_csv('../Dataset/GroceryStoreDataSet.csv', index_col=False)
supermarket

Unnamed: 0,ITEM1,ITEM2,ITEM3,ITEM4
0,MILK,BREAD,BISCUIT,
1,BREAD,MILK,BISCUIT,CORNFLAKES
2,BREAD,TEA,BOURNVITA,
3,JAM,MAGGI,BREAD,MILK
4,MAGGI,TEA,BISCUIT,
5,BREAD,TEA,BOURNVITA,
6,MAGGI,TEA,CORNFLAKES,
7,MAGGI,BREAD,TEA,BISCUIT
8,JAM,MAGGI,BREAD,TEA
9,BREAD,MILK,,


In [3]:
num_records = len(supermarket)
num_records

20

#### Data Preprocessing

In [4]:
records = []
for i in range(0, num_records):
    records.append([str(supermarket.values[i,j]) 
                  for j in range(0,3)])

## 3. Applying Apriori
Specify the parameters of apriori class.
- The list
- min_support
- min_confidence
- min_lift
- min_length (the minimum number of items that you want in your rules, typically 2)

In [5]:
#set parameter
association_rules = apriori(records,
                           min_support=0.05,
                           min_confidence=0.5,
                           min_lift=3,
                           min_length=2)
association_results = list(association_rules)
print(len(association_results))

12


In [6]:
#check association results by index
print(association_results[0])

RelationRecord(items=frozenset({'JAM', 'MAGGI'}), support=0.1, ordered_statistics=[OrderedStatistic(items_base=frozenset({'JAM'}), items_add=frozenset({'MAGGI'}), confidence=1.0, lift=4.0)])


In [7]:
results = []
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    
    #first index of the inner list
    value0 = items[0] + " -> " + items[1]
    value1 = str(item[1])
    value2 = str(item[2][0][2])
    value3 = str(item[2][0][3])
    
    
    #third index of the inner list
    
    rows = (value0, value1, value2, value3)
    results.append(rows)
    
labels = ['Rule','Support', 'Confidence', 'Lift']
supermarket_suggestion = pd.DataFrame.from_records(results, columns=labels)
supermarket_suggestion

Unnamed: 0,Rule,Support,Confidence,Lift
0,JAM -> MAGGI,0.1,1.0,4.0
1,MILK -> nan,0.05,1.0,5.0
2,MILK -> BISCUIT,0.1,0.6666666666666667,3.333333333333333
3,COCK -> BISCUIT,0.1,1.0,6.666666666666667
4,TEA -> BISCUIT,0.05,1.0,3.333333333333333
5,BOURNVITA -> BREAD,0.1,0.6666666666666667,3.333333333333333
6,BOURNVITA -> SUGER,0.05,1.0,3.333333333333333
7,JAM -> BREAD,0.1,1.0,4.0
8,MILK -> nan,0.05,1.0,5.0
9,SUGER -> COFFEE,0.05,1.0,3.333333333333333


<h1 align="center"><font size="5">CONCLUSION</font></h1>
Very cool! We clearly see meaningful results here from our analysis shown above, where the higher the lift value, the stronger the correlation between the items. The data clearly shows that bread is a popular consequent, which makes sense because it is a bakery. Besides bread, let's look at the more interesting item correlations (format: antecedant(s) -> consequent):

- JAM -> MAGGI
- MILK -> nan
- BISCUIT -> COFFEE
- TEA -> BISCUIT
- COFFEE -> SUGER
- BREAD -> MAGGI
- BREAD -> MILK	
- COFFEE -> CORNFLAKES
- TEA -> COFFEE
- TEA -> MAGGI

So how is this useful knowledge for the Laili Special Bakery? Businesses are always looking to optimize their setup and drive up their sales. Bakeries are no different, and this kind of analysis could have been done for any kind of retail store or market place as well. Because Data Scientist team now know the correlation between items and the common interest of the customers, the business can make decisions based on these findings. For example, Laili Special Bakery might want to place their freshly baked bread near their pastries, since customers who purchase pastries seem to also be enticed by bread. Besides product placement, Laili Special Bakery might also be interested in having a promotion of a free item, given the great chances of another item being sold as a result of it (For example, if they were to give away some of their free special toast one day, it might not only attract new frequent customers, but there is also a very good chance that the customer will still spend money on bread).