# **Implement Apriori algorithm on dataset**


## **Name: Reetika Goel**


**Learning Objective:** Apply Apriori algorithm to generate association rules and predict next basket item




**Dataset:** Excel dataset contains Order ID, User ID, Product Item name.
The train and test dataset is stored on the Google drive.



Below is the link of dataset:



**Train dataset:**


https://drive.google.com/drive/u/1/folders/1OXQYa_awmTkmr7s3n18ybo6MxcVMKoFG



**Test dataset:**



https://drive.google.com/drive/u/1/folders/1OXQYa_awmTkmr7s3n18ybo6MxcVMKoFG




**Problem Statement:** Consider Order ID as Transaction ID and group items by order id. Generate Association rules 


**MIN_SUP: 0.0045**


**MIN_CONF: 0.2**



**Assignment Description:**


This assignment has 6 parts:


**PART A:** Import libraries and Import dataset


**PART B:** Group the items in dataset by Order ID - Consider Order ID as Transaction ID


**PART C:** Data Pre-processing


**PART D:** Apply Apriori algorithm on the dataset


**PART E:** Visualize the Apriori results on Training data set


**PART F**: Predict the next basket item using Test dataset


## **PART A: Import libraries and Import dataset**

**Code snippet to mount the google drive on colab**

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**Mention the location of your dataset on google drive**

In [0]:
cd '/content/drive/My Drive/Apriori_Datasets'

/content/drive/My Drive/Apriori_Datasets


**Install the Apyori library**

The Apyori library is useful to create the Apriori model as it contains the modules that help users to analyze and create model instantly

In [0]:
pip install apyori



**Import required libraries for Apriori implementation**

In [0]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
from apyori import apriori    #from apyori library, import the apriori module


**Code to load the train dataset csv file**

In [0]:
data_records = pd.read_csv("/content/drive/My Drive/Apriori_Datasets/TRAIN-ARULES.csv")

**Print first five records to see how the dataset looks like**

In [0]:
data_records.head(5)

Unnamed: 0,order_id,user_id,product_name
0,1483,90,Organic Pink Lemonade Bunny Fruit Snacks
1,1483,90,Dark Chocolate Minis
2,1483,90,"Sparkling Water, Natural Mango Essenced"
3,1483,90,Peach-Pear Sparkling Water
4,1483,90,Organic Heritage Flakes Cereal


**Print the entire dataset**

In [0]:
data_records

Unnamed: 0,order_id,user_id,product_name
0,1483,90,Organic Pink Lemonade Bunny Fruit Snacks
1,1483,90,Dark Chocolate Minis
2,1483,90,"Sparkling Water, Natural Mango Essenced"
3,1483,90,Peach-Pear Sparkling Water
4,1483,90,Organic Heritage Flakes Cereal
5,1483,90,Popped Salted Caramel Granola Bars
6,1483,90,"Healthy Grains Granola Bar, Vanilla Blueberry"
7,1483,90,Flax Plus Organic Pumpkin Flax Granola
8,1483,90,Sweet & Salty Nut Almond Granola Bars
9,1483,90,Cool Mint Chocolate Energy Bar


## **PART B: Group the items in dataset by Order ID - Consider Order ID as Transaction ID**

**Group the data in dataset based on 'order_id' column**

In [0]:
g = data_records.groupby('order_id')['product_name']

**Code to access the groups formed based on 'order_id' column**

The DataFrame **'groupby'** object provides an iterator which can be used to iterate through **each Order ID and the corresponding data frame - order_df (product names)**

In [0]:
for order_id, order_df in g:
  print(order_id)   #prints the Order ID
  print(order_df)            #prints the corresponding data frame 

1483
0          Organic Pink Lemonade Bunny Fruit Snacks
1                              Dark Chocolate Minis
2           Sparkling Water, Natural Mango Essenced
3                        Peach-Pear Sparkling Water
4                    Organic Heritage Flakes Cereal
5                Popped Salted Caramel Granola Bars
6     Healthy Grains Granola Bar, Vanilla Blueberry
7            Flax Plus Organic Pumpkin Flax Granola
8             Sweet & Salty Nut Almond Granola Bars
9                    Cool Mint Chocolate Energy Bar
10                       Chocolate Chip Energy Bars
11         Trail Mix Fruit & Nut Chewy Granola Bars
Name: product_name, dtype: object
4595
12                                Creme De Menthe Thins
13    Milk Chocolate English Toffee Miniatures Candy...
14                    Baker's Pure Cane Ultrafine Sugar
15                                         Plain Bagels
16                                       Cinnamon Bread
Name: product_name, dtype: object
7099
17           

We have grouped the dataset based on **'order_id'** column using groupby function in Pandas. Each group corresponds to the specific Order ID

**Code to access a specific dataframe (group)**

In [0]:
g.get_group(2906390)

10996            Clementines
10997          Hass Avocados
10998    Dry Roasted Almonds
10999        English Muffins
Name: product_name, dtype: object

When we run the above code, we get the **dataframe** for **Order ID** - **2906390**

## **PART C: Data Pre-processing**

**Print shape of the dataset**

In [0]:
print(data_records.shape)

(12963, 3)


**12963** - total number of rows


**3** - total number of columns

**Create an array with product names**

The Apriori library used for this assignment requires our dataset to be in the form of a list of product names. Currently we have data in the form of a Pandas dataframe. In order to convert our pandas dataframe into a list of product names, execute the following script:

In [0]:
records = []                         #creates an empty array named 'records'
for order_id, productNm in g:
    records.append(productNm)    # appends the product names to the array 'records'

**Print the 'records' array**

In [0]:
print(records)

[0          Organic Pink Lemonade Bunny Fruit Snacks
1                              Dark Chocolate Minis
2           Sparkling Water, Natural Mango Essenced
3                        Peach-Pear Sparkling Water
4                    Organic Heritage Flakes Cereal
5                Popped Salted Caramel Granola Bars
6     Healthy Grains Granola Bar, Vanilla Blueberry
7            Flax Plus Organic Pumpkin Flax Granola
8             Sweet & Salty Nut Almond Granola Bars
9                    Cool Mint Chocolate Energy Bar
10                       Chocolate Chip Energy Bars
11         Trail Mix Fruit & Nut Chewy Granola Bars
Name: product_name, dtype: object, 12                                Creme De Menthe Thins
13    Milk Chocolate English Toffee Miniatures Candy...
14                    Baker's Pure Cane Ultrafine Sugar
15                                         Plain Bagels
16                                       Cinnamon Bread
Name: product_name, dtype: object, 17                       

**Print the length of records**

In [0]:
print(len(records))

1418


## **PART D: Apply Apriori algorithm on the dataset**

Apriori algorithm works in two steps:


*  **Generate frequent itemsets** - contains items that has support greater than **MIN_SUP**


*  **Generate Confident Association Rules from frequent itemsets** - generate the association rules that has confidence greater than **MIN_CONF**


In order to apply Apriori algorithm on the dataset, we will use **apriori class** imported from the **apyori library**. The **apriori** class requires some parameter values.

**First parameter:** list of items to extract rules from


**Second parameter:** min_support, this parameter is used to select the items with support values greater than value specified by the parameter


**Third parameter:** min_confidence, filters those rules that have confidence greater than the confidence threshhold specified by the parameter


**Fourth parameter:** min_lift, specifies the minimum lift value for the short listed rules


**Fifth parameter:** min_length, specifies the minimum number of items to be included in the rules 

**MIN_SUP:** 0.0045


**MIN_CONF:** 0.2


In [0]:
#execute the script

#Train Apriori model, calculate frequent itemset and association rules
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2)
print("Association Rules: ", association_rules)

#convert the association rules into a list 
association_results = list(association_rules)

Association Rules:  <generator object apriori at 0x7f2c50add990>


## **PART E: Visualize the Apriori results on Training data set**

**Print the association rules in the list form**

In [0]:
print(association_results)

[RelationRecord(items=frozenset({'0% Greek Strained Yogurt', 'Apples'}), support=0.006346967559943582, ordered_statistics=[OrderedStatistic(items_base=frozenset({'0% Greek Strained Yogurt'}), items_add=frozenset({'Apples'}), confidence=0.6428571428571428, lift=41.43506493506493), OrderedStatistic(items_base=frozenset({'Apples'}), items_add=frozenset({'0% Greek Strained Yogurt'}), confidence=0.40909090909090906, lift=41.43506493506493)]), RelationRecord(items=frozenset({'Bag of Organic Bananas', '0% Greek Strained Yogurt'}), support=0.007052186177715092, ordered_statistics=[OrderedStatistic(items_base=frozenset({'0% Greek Strained Yogurt'}), items_add=frozenset({'Bag of Organic Bananas'}), confidence=0.7142857142857143, lift=5.387537993920973)]), RelationRecord(items=frozenset({'Soda', '0% Greek Strained Yogurt'}), support=0.006346967559943582, ordered_statistics=[OrderedStatistic(items_base=frozenset({'0% Greek Strained Yogurt'}), items_add=frozenset({'Soda'}), confidence=0.64285714285

**Visualize the association rules**

In [0]:
for result in association_results:
    items = [x for x in result.items]
    print("Items :"+str(items))
    print("Support :"+str(result.support))
    print("----------------------------------------")
    for OrderedStatistic in result.ordered_statistics:
        items_base = [x for x in OrderedStatistic.items_base]
        items_add = [x for x in OrderedStatistic.items_add]
        print ("  Rule :" + str(items_base) + " --> " + str(items_add))
        print ("  Confidence : " + str(OrderedStatistic.confidence))
        print ("  Lift : " + str(OrderedStatistic.lift))
        print ("  ===========================")
    print("**************************************************************************************")

print("DONE**")
print(len(association_results))

Items :['0% Greek Strained Yogurt', 'Apples']
Support :0.006346967559943582
----------------------------------------
  Rule :['0% Greek Strained Yogurt'] --> ['Apples']
  Confidence : 0.6428571428571428
  Lift : 41.43506493506493
  Rule :['Apples'] --> ['0% Greek Strained Yogurt']
  Confidence : 0.40909090909090906
  Lift : 41.43506493506493
**************************************************************************************
Items :['Bag of Organic Bananas', '0% Greek Strained Yogurt']
Support :0.007052186177715092
----------------------------------------
  Rule :['0% Greek Strained Yogurt'] --> ['Bag of Organic Bananas']
  Confidence : 0.7142857142857143
  Lift : 5.387537993920973
**************************************************************************************
Items :['Soda', '0% Greek Strained Yogurt']
Support :0.006346967559943582
----------------------------------------
  Rule :['0% Greek Strained Yogurt'] --> ['Soda']
  Confidence : 0.6428571428571428
  Lift : 12.318532818

## **PART F: Predict the next basket item using Test dataset**

**Load the test dataset**

In [0]:
#load the test dataset
test_records = pd.read_csv("/content/drive/My Drive/Apriori_Datasets/testarules.csv") 
test_records.dropna()
test_records.head()


Unnamed: 0,Item1,Item2,Item3,Item4,Item5
0,Dark Chocolate Minis,Organic Pink Lemonade Bunny Fruit Snacks,Peach-Pear Sparkling Water,,


**Iterate through each row of the test dataset**

In [0]:
#Iterate through each row in a dataset
test_records_list = []
test_records_array = []

#initialize the row count
row_count = 0

for each_row in test_records.iterrows():
  row_count += 1
  
for i in range(row_count):
  for j in test_records.columns:
    if str(test_records[j][i]) != 'nan':
      test_records_array.append(str(test_records[j][i]))
  test_records_list.append(test_records_array)
  
#Print the test records in a list form
print(test_records_list)


[['Dark Chocolate Minis', 'Organic Pink Lemonade Bunny Fruit Snacks', 'Peach-Pear Sparkling Water']]


**Predict the next item in a basket using list of Association rules**

In [0]:
#Predict the next basket item

class PredictItem:
  def predict(self):
    for each_record in test_records_list:   #Items already present in the shopping cart
      for result in association_results:    #Iterate through the list of association rules
        for OrderedStatistic in result.ordered_statistics:
          items_base = [x for x in OrderedStatistic.items_base]
          items_add = [x for x in OrderedStatistic.items_add]
          items_base.sort()
          if(items_base == each_record):
            print('Items bought with {} is --> {}'.format(each_record, items_add))          #Next item in the basket (predicted based on association rules)
            
      print('\n' + '**************************************************')

**Call the predict function**

In [0]:
item = PredictItem()    #create an object of PredictItem class
item.predict()           #calling the predict function 

Items bought with ['Dark Chocolate Minis', 'Organic Pink Lemonade Bunny Fruit Snacks', 'Peach-Pear Sparkling Water'] is --> ['Maple Pumpkin Seeds with Sea Salt Chewy with a Crunch Granola Bars']
Items bought with ['Dark Chocolate Minis', 'Organic Pink Lemonade Bunny Fruit Snacks', 'Peach-Pear Sparkling Water'] is --> ['Organic Graham Crunch Cereal']
Items bought with ['Dark Chocolate Minis', 'Organic Pink Lemonade Bunny Fruit Snacks', 'Peach-Pear Sparkling Water'] is --> ['Organic Heritage Flakes Cereal']
Items bought with ['Dark Chocolate Minis', 'Organic Pink Lemonade Bunny Fruit Snacks', 'Peach-Pear Sparkling Water'] is --> ['Sparkling Water, Natural Mango Essenced']

**************************************************


In a notion, **{X -> Y}**


X is called from-itemset and,


Y is called to-itemset


From the above result, we can see that 

**#Items already present in the basket**


for X = ['Dark Chocolate Minis', 'Organic Pink Lemonade Bunny Fruit Snacks', 'Peach-Pear Sparkling Water']                 


**# Next basket items predicted based on association rules**


Y = 
['Maple Pumpkin Seeds with Sea Salt Chewy with a Crunch Granola Bars']


['Organic Graham Crunch Cereal']


['Organic Heritage Flakes Cereal']


['Sparkling Water, Natural Mango Essenced']
    

