# Association Rule Learning in Data Science 

## LEC NO. 51) Apriori Algorithm:

##### "How to Solve Market Basket Analysis base problems ?" ANSWER :-

###### Here we will implement this algorithm using a dataset. This dataset contains 1000 transactions over the course of a week at a French retail store. To implement this, we have a problem of a retailer, who wants to find the association between his shop's product, so that he can provide an offer of "Buy this and Get that" to his customers.
###### The retailer has a dataset information that contains a list of transactions made by his customer. In the dataset, each row shows the products purchased by customers or transactions made by the customer.
###### To solve this problem, we will perform the below steps, But Before we begin our coding we need to install the apyori package. To install the package use the following code in Jupyter notebook.

In [101]:
!pip install apyori



### STEP-1) Importing Required Libraries
Let's first load the required libraries.

1) Numpy for carrying out efficient computations

2) Matplotlib for visualization of data

3) Pandas for handling DataFrames

4) The apriori function that will be imported from the apyroi package

In [102]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

### STEP-2)  Importing the Dataset
Now lets import dataset and see how our dataset
looks like, how many transactions are there and
what is the shape of the dataset.

In [103]:
data = pd.read_csv('F:\DataScience\Store_data.csv')
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In above dataset, all the rows of the dataset are showing different
transactions made by the customers. The first row
is the transaction done by the first customer, which
means there is no particular name for each column
and have their own individual value or product
details.


In [104]:
# printing the shape of dataframe
data.shape

(1000, 20)

In [105]:
# printing the size of dataframe
data.size

20000

### STEP-3) Data Preprocessing
The Apriori library we are going to use requires our
dataset to be in the form of a list of lists, where the
whole dataset is a big list and each transaction in
the dataset is an inner list within the outer big list.

Currently, we have data in the form of a pandas
dataframe. To convert our pandas dataframe into a
list of lists, execute the following code.

In [106]:
# Create an empty Transactions list and then append list_of_products as 
# element in empty Transactions list. 
Transactions=[]

for i in range(0, 1000):
    Transactions.append([str(data.values[i,j]) for j in range(0,20)])

In [107]:
# Extending above script in explained code:
# Transactions=[]
# for i in range(0,1000):
#     temp=[]
#     for j in range(0,20):
#         temp.append(data.values[i,j])
#     Transactions.append(temp)    

In [108]:
# you can match it with above Dataframe
print(len(Transactions))
# Initial 5 lists of Transactions list are:
Transactions[0:5]

1000


[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 ['burgers',
  'meatballs',
  'eggs',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 ['chutney',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 ['turkey',
  'avocado',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 ['mineral water',
  'milk',
  'energy bar',
  'whole wheat rice',
  'green tea',
  'nan',
  'nan',
  'nan',
 

### STEP-4) Training the Apriori Model on the dataset
To train the model, we will use the apriori
function that will be imported from
the apyroi package. This function will return
the rules to train the model on the dataset.

In [109]:
from apyori import apriori
rules= apriori(transactions= Transactions, min_support=0.003, min_confidence = 0.2, min_lift=3, min_length=2, max_length=2) 

In the above code, the first line is to import the
apriori function. In the second line, the apriori
function returns the output as the rules.It takes
the following parameters:
    
>transactions: A list of transactions. Here we have taken Transactions .

>min_support= To set the minimum support float
value. Here we have used 0.003 (3/1000) that is calculated
by taking 3 transactions per customer each week to
the total number of transactions .

>min_confidence: To set the minimum confidence
value. Here we have taken 0.2. It can be changed as
per the business problem .

>min_lift= To set the minimum lift value. Here we have taken 3 .

>min_length= It takes the minimum number of
products for the association. Here we have taken 2 .

>max_length = It takes the maximum number of
products for the association. Here we have taken 2 .

### STEP-5) Displaying the Association Rules
For displaying the result of the rules occurred
from the apriori function, we will make a list of all the rules.

In [110]:
associations= list(rules)
print(len(associations))

95


In [111]:
print("Thus, we will get the",len(associations),"Rules, By executing the above lines of code.")

Thus, we will get the 95 Rules, By executing the above lines of code.


In [112]:
associations

[RelationRecord(items=frozenset({'almonds', 'burgers'}), support=0.01, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'burgers'}), confidence=0.35714285714285715, lift=4.464285714285714)]),
 RelationRecord(items=frozenset({'almonds', 'soup'}), support=0.008, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'soup'}), confidence=0.2857142857142857, lift=3.913894324853229)]),
 RelationRecord(items=frozenset({'frozen smoothie', 'antioxydant juice'}), support=0.003, ordered_statistics=[OrderedStatistic(items_base=frozenset({'antioxydant juice'}), items_add=frozenset({'frozen smoothie'}), confidence=0.33333333333333337, lift=5.376344086021506)]),
 RelationRecord(items=frozenset({'asparagus', 'milk'}), support=0.003, ordered_statistics=[OrderedStatistic(items_base=frozenset({'asparagus'}), items_add=frozenset({'milk'}), confidence=0.5, lift=3.676470588235294)]),
 RelationRecord(items=frozenset({'babies f

### Understanding the Association Rules:

The apriori algorithm automatically sorts the
associations’ rules based on relevance, thus the
topmost rule has the highest relevance compared
to the other rules returned by the algorithm.
Let’s have a look at the first and most relevant
association rule from the given dataset.


In [113]:
associations[0]

RelationRecord(items=frozenset({'almonds', 'burgers'}), support=0.01, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'burgers'}), confidence=0.35714285714285715, lift=4.464285714285714)])

Rule one is the most relevant rule that the
algorithm identified from the given dataset.
The above output specifies the association
between two items ‘burgers‘ and 'almonds’. The
first rules, states that the almonds and burgers are
bought frequently by most of the customers.

>item-A) items_base = 'almonds' (essential)

>item-B) items_add = 'burgers' (addative)

>The support vector of 0.01 is calculated by dividing
the number of transactions containing almonds
divided by the total number of transactions.

>The confidence of 0.3571, tells us that of the total
transactions, 35.71 % of transactions also contains
burgers. Hence, if a customer buys almonds, it is
35.71% chances that he also buys burgers.

>Finally, the lift of 4.46 tells us that there are 4.46
times chances that burger will be bought with
almonds.

We can check all these things in other rules also. For example:

In [114]:
associations[1]

RelationRecord(items=frozenset({'almonds', 'soup'}), support=0.008, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'soup'}), confidence=0.2857142857142857, lift=3.913894324853229)])

For rule-2 :

item-A) items_base = 'almonds' (essential)

item-B) items_add = 'soup' (addative)

Rest, we can also explain as same as we did previously.