# Eclat model
- The Eclat model is a simplified version of the **Apriori model**.
- Can be built by adapting the Apriori model code
- In Eclat we're not considering rule but set of items.

# 1. Importing the Libraries

In [8]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# 2. Data Preprocessing

In [9]:
# 1. Here we only need to import the dataset.
#### data_set = pd.read_csv('Market_Basket_Optimisation.csv')

# 2. Since in the first row we don't have names of the columns, therefor we have to tell pandas that...
# ... the first row does not contain the name of the columns, otherwise it will not consider the first row as data...
# ... since it will think the first row are just name of columns. Here we've to set "header = None"
data_set = pd.read_csv('Market_Basket_Optimisation.csv', header = None)

# 3.0 When we will train the Apriori model on the dataset, we will use a certain function called apriori()....
# ... which will take the dataset as an argument, but expect this dataset to have a certain format. ...
# ... and that format is unfortunately not a pandas dataframe, therefore we have to recreate the dataset ...
# ... from the original pandas dataframe, so that it can have this format expected by the apriori() function...
# ... which will be used to train the Apriori model on the whole dataset. And this format is called "lists of transactions".

# 3.1 1st step to create a list is to initialize the list as an empty list.
transaction = []

# 3.2 2nd step is to append the elements of the pandas dataframe to the list using a for loop...
# ... 1st for loop will loop over the rows of the dataframe, 2nd for loop will loop over the 20 columns of the dataframe.
for i in range(0, 7501):      # 7501 and not 7500 because the upper bound is excluded.
                              # Each customer transaction must be stored as a list of product. i.e. in the end we're ....
                              # ... creating  to have a list of lists.
                              # i goes from 0 to 7501 and j from 0 to 20, because the upper bound is excluded 
    ### transaction.append(data_set.values[i, j] for j in range(0, 20) )   # values allows us to access the values of the dataframe.
# 3.3 3rd step: In the Apriori Model all the element in the list must be strings, otherwise the model will not be able to learn
# ... the rules. So the 3rd step is to convert the list of lists to a list of strings.
    transaction.append([str(data_set.values[i, j]) for j in range(0, 20) ])
# 3. Here we don't have to split the dataset into dependent and independent variables.


# 3. Training the Eclat model on the dataset

In [10]:
# Will use the Apyori Module to train the model and not scikit learn as usual.

# 1. Importing the Apyori Module
from apyori import apriori         # The apriori() function belong to the apyori module and will train the model...
                                   # and return the rules i.e. support, confidence, lift that the model has learned.

# 2. Training the Apriori Model on the dataset
    # apriori() function takes 6 arguments: transactions, min_support, min_confidence, min_lift, min_length, max_length
        # transaction is the dataset on which the apriori model will be trained and it value is our list i.e transaction.
        # min_support number of times we should have at least one product in a transaction per week/day/month
            # (Here we will like to consider the product that appears in at least 3 transaction in a day and (*7 days for a week))
            # min_support = per day * 7 / (Total number of transactions) = 3*7/ 7501 = 0.003
        # min_confidence: rule of thumb is 0.2
        # min_lift: rule of thumb is at least 3
        # Buy one product A and get another product B for free. Therefore the rule we want to get at the end must have only 2 products. ...
            # ... i.e. one product in the left hand side of the rule and the other product in the right hand side of the rule.
            # Therefore we need to add to more arguments min_length = 2 and max_length = 2
            
        # we kept the min_confidence = 0.2 and min_lift = 3 even though it's not needed because they will give us a strong association.
rules = apriori(transactions = transaction, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2 )

# Visualizing the results

# 4. Displaying the first results coming directly from the output of the apriori function

In [11]:
# 1. Display the result as a list
results = list(rules)

results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

# 5. Putting the results well organised into a Pandas DataFrame

In [12]:
# Since scrolling the above result left and right is overwhelming, we will display the result well organised into a pandas dataframe.

def inspect(results):          # <= The insect() function will take the result of the apriori() function as an argument and will return the rules. ...
                               # ...  we'll be able to sort the rule by a descending metric. In the above result they are not sorted.

# In the 1st line of the above result, index 2 will access the 2nd element: index 0 of the 2nd element will be access and then use index 0 to access index the 0th element of lhs.
# ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]
    # items_base=frozenset({'light cream'}))
    lhs         = [tuple(result[2][0][0])[0] for result in results]  # <= It will take as argument the product in the left hand side of the rule

# In the 1st line of the above result, index 2 will access the 2nd element: we'll access the element of index 1 of the rhs and then use index 0 to access index the 0th element.
# ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)] 
    # items_add=frozenset({'chicken'}   
    rhs         = [tuple(result[2][0][1])[0] for result in results]  # <= It will take as argument the product in the right hand side of the rule

    supports    = [result[1] for result in results]                  # <= It will take as argument the support of all the rules

    # In Eclat the confidence and lift are not needed therefore we have to remove them.
    #### confidences = [result[2][0][2] for result in results]            # <= It will take as argument the confidence of all the rules
    #### lifts       = [result[2][0][3] for result in results]            # <= It will take as argument the lift of all the rules
    #### return list(zip(lhs, rhs, supports, confidences, lifts))         # <= It will return all the rules with rhs, lhs, supports, confidences and lifts in a list of tuples.

    return list(zip(lhs, rhs, supports))  # <= It will return all the rules with rhs, lhs and supports in a list of tuples.
# At the end we create a final Pandas dataframe  which take as input the output of the inspect() function. ...
    # ... And beside we add the column names with 1st column = lhs, 2nd column = rhs, 3rd column = supports, 4th column = confidences, 5th column = lifts
####resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

# Since the is no rule in the Eclat, we only consider product 1 and product 2, instead of left and right hand side as seen it the Apriori.
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Product 1', 'Product 2', 'Support'])


We have to display the result directly

# 6. Displaying the result sorted by descending support

In [None]:
# 1. using a pre build function in pandas i.e. nlargest() to sort the result by descending order of lift
resultsinDataFrame.nlargest(n = 10, columns = 'Support')         # n = number of rows we want to return i.e. display 10 rows with the highest support
                                                                 # Column = by which column we want our result to be sorted.
                                                                 # keep = whether we want to keep the index or not, incase we have duplicates

Unnamed: 0,Product 1,Product 2,Support
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032


### Interpretation of the result for row 4 in the above table
- Herb & pepper and ground beef are the set of 2 products that are purchase most frequently.
  -  Since they have the highest support, it means that they are the most frequently purchased together.
      - Their supports are 0.22, which means that these 2 products  appears in 22% of the transactions in the dataset.
### and so on,..