## Step 1 — List the Transaction ID (TID) set of each product
The first step is to make a list that contains, for each product, a list of the Transaction IDs in which the product occurs. This list is represented in the following table.

<img src='https://miro.medium.com/max/373/1*scuicjurFLxZhq0i4_g1BQ.png'><br>
The ECLAT Algorithm. The Transaction ID (TID) sets for each product.

These transaction ID lists are is called the Transaction ID Set, also called TID set.

## Step 2 — Filter with minimum support
The next step is to decide on a value called the minimum support. The minimum support will serve to filter out products that do not occur often enough to be considered.

In the current example, we will choose a value of 7 for the minimum support. As you can see in the table of Step 1, there are two products that have a TID set that contains less than 7 transactions: Flour and Butter. Therefore, we will filter them out, and we obtain the following table:

<img src='https://miro.medium.com/max/371/1*4HFQGD-6ovH5RO99BYvrow.png'><br>
The ECLAT Algorithm. Filtering out products that do not reach minimum support.

## Step 3 — Compute the Transaction ID set of each product pair
We now move on to pairs of products. We will basically repeat the same thing as in step 1, but now for product pairs.

The interesting thing about the ECLAT algorithm is that this step is done using the Intersection of the two original sets. This makes it different from the Apriori algorithm.

The ECLAT algorithm is faster because it is much simpler to identify the intersection of the set of transactions IDs than to scan each individual transaction for the presence of pairs of products (as Apriori does). You can see in the below image how it's easy to filter out the transaction IDs that are common between the product pair Wine and Cheese:

<img src='https://miro.medium.com/max/303/1*VQyZfkbV68Qjhv8ovTOSYA.png'><br>
The ECLAT Algorithm. Finding the intersection of Transactions IDs is easier than scanning the whole database

When doing the intersection for each product pair (ignoring the products that did not reach support individually) this gives the following table:

<img src='https://miro.medium.com/max/427/1*FDQyBPRfMlkGnDEH0xcTcg.png'><br>
The Transaction ID sets for all product pairs that are still in the race.

## Step 4 — Filter out the pairs that do not reach minimum support
As before, we need to filter out results that do not reach the minimum support of 7. This leaves us with only two remaining product pairs: Wine & Cheese and Beer & Potato Chips.

<img src='https://miro.medium.com/max/424/1*rFfIs17a7NCQWxwLWyCMQQ.png'><br>
The ECLAT Algorithm. There are two product pairs that meet support.

## Step 5— Continue as long as you can make new pairs above support
From this point on, you repeat the steps as long as possible. For the current example, if we create the product pairs of three products, you’ll find that there aren’t any groups of three that reach the minimum support level. Therefore, the association rules will be those obtained in the previous step.

In [1]:
! pip install pyECLAT
! pip install numpy
! pip install pandas
! pip install plotly

Defaulting to user installation because normal site-packages is not writeable
Collecting pyECLAT
  Downloading pyECLAT-1.0.2-py3-none-any.whl (6.3 kB)
Collecting tqdm>=4.41.1
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.5/78.5 KB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tqdm, pyECLAT
Successfully installed pyECLAT-1.0.2 tqdm-4.64.1
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting plotly
  Downloading plotly-5.10.0-py2.py3-none-any.whl (15.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.2/15.2 MB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting tenacity>=6.2.0
  Downloading tenacity-8.0.1-py3-none-any.whl (24 kB)
Installing collecte

In [7]:
import pandas as pd
from pyECLAT import ECLAT

In [4]:
# store the item sets as lists of strings in a list
transactions = [
    ['beer', 'wine', 'cheese'],
    ['beer', 'potato chips'],
    ['eggs', 'flower', 'butter', 'cheese'],
    ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
    ['wine', 'cheese'],
    ['potato chips'],
    ['eggs', 'flower', 'butter', 'wine', 'cheese'],
    ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
    ['wine', 'beer'],
    ['beer', 'potato chips'],
    ['butter', 'eggs'],
    ['beer', 'potato chips'],
    ['flower', 'eggs'],
    ['beer', 'potato chips'],
    ['eggs', 'flower', 'butter', 'wine', 'cheese'],
    ['beer', 'wine', 'potato chips', 'cheese'],
    ['wine', 'cheese'],
    ['beer', 'potato chips'],
    ['wine', 'cheese'],
    ['beer', 'potato chips']
]

In [5]:
# you simply convert the transaction list into a dataframe
data = pd.DataFrame(transactions)
data

Unnamed: 0,0,1,2,3,4
0,beer,wine,cheese,,
1,beer,potato chips,,,
2,eggs,flower,butter,cheese,
3,eggs,flower,butter,beer,potato chips
4,wine,cheese,,,
5,potato chips,,,,
6,eggs,flower,butter,wine,cheese
7,eggs,flower,butter,beer,potato chips
8,wine,beer,,,
9,beer,potato chips,,,


In [6]:
# we are looking for itemSETS
# we do not want to have any individual products returned
min_n_products = 2

# we want to set min support to 7
# but we have to express it as a percentage
min_support = 7/len(transactions)

# we have no limit on the size of association rules
# so we set it to the longest transaction
max_length = max([len(x) for x in transactions])

In [9]:
# create an instance of eclat
my_eclat = ECLAT(data=data, verbose=True)

# fit the algorithm
rule_indices, rule_supports = my_eclat.fit(min_support=min_support, min_combination=min_n_products, max_combination=max_length)
print(rule_supports)

100%|██████████| 8/8 [00:00<00:00, 1347.30it/s]
100%|██████████| 8/8 [00:00<00:00, 96698.65it/s]
100%|██████████| 8/8 [00:00<00:00, 4214.85it/s]


Combination 2 by 2


10it [00:00, 475.91it/s]


Combination 3 by 3


10it [00:00, 408.02it/s]


Combination 4 by 4


5it [00:00, 346.11it/s]


Combination 5 by 5


1it [00:00, 244.21it/s]

{'beer & potato chips': 0.45, 'cheese & wine': 0.35}





# Interpretation
The interpretation of this is that within the transactions of our night store, there are two product combinations that are relatively strong. People often buy Wine and Cheese together. People also often buy Potato Chips and Beer together. Clearly, it could be a good idea to put those products together so that people can easily get to both of them.