# ECLAT ( Equivalence Class Transformation) Vertical Apriori Algorithm

##  Basic Ideas

* Both Apriori and FP-growth use horizontal data format
* Eclat mines frequent itemset using the vertical data format
* It's a depth first search (DFS) based algorithm
* Each item is stored together with its transaction id. 
* It uses intersection based approach to compute the support an intemset. 
* The item sets are checked in `lexicographic order`
  * (Depth-First traversal of the prefix tree).
* The search scheme is the same as the general scheme for searching with canonical forms having the prefix property and possessing a perfect extension rule (generate only canonical extensions).
* Eclat generates more candidate item sets than Apriori, because it (usually) does not store the support of all visited item sets.
  * As a consequence it cannot fully exploit the Apriori property for pruning.
* Eclat uses a purely vertical transaction representation.
* No subset tests and no subset generation are needed to compute the support.
  * The support of item sets is rather determined by intersecting transaction lists.


NB:  Eclat cannot fully exploit the Apriori property, because it does not store the support of all
explored item sets, not because it cannot know it. If all computed support values were stored, it could
be implemented in such a way that all support values needed for full a priori pruning are available.



In [5]:
import pandas as pd

In [4]:
#pip install pyECLAT

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyECLAT
  Downloading pyECLAT-1.0.2-py3-none-any.whl (6.3 kB)
Installing collected packages: pyECLAT
Successfully installed pyECLAT-1.0.2


In [6]:
#from pyECLAT import Example2 # importing dataset ( example 1 and example 2 are datasets in pyECLAT)
#df = Example2().get() # storing the dataset in a variable
path = 'https://raw.githubusercontent.com/tec03/Datasets/main/datasets/eclatEg2.csv'
df = pd.read_csv(path, header = None)
df.head()

Unnamed: 0,0,1,2,3
0,I1,I2,I5,
1,I2,I4,,
2,I2,I3,,
3,I1,I2,I4,
4,I1,I3,,


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       9 non-null      object
 1   1       9 non-null      object
 2   2       4 non-null      object
 3   3       1 non-null      object
dtypes: object(4)
memory usage: 416.0+ bytes


In [8]:
#pip install pyECLAT

In [9]:
from pyECLAT import ECLAT # importing the ECLAT module

eclat = ECLAT(data=df)
eclat.df_bin # DataFrame of binary values

Unnamed: 0,I2,I5,I4,I3,I1
0,1,1,0,0,1
1,1,0,1,0,0
2,1,0,0,1,0
3,1,0,1,0,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,1,1
7,1,1,0,1,1
8,1,0,0,1,1


Counting items in each column:

In [10]:
items_total = eclat.df_bin.astype(int).sum(axis=0) 
items_total

 I2    7
 I5    2
 I4    2
 I3    6
 I1    6
dtype: int64

In [11]:
import pandas as pd

df = pd.DataFrame({'items': items_total.index, 
                   'transactions': items_total.values
                   }) 
df

Unnamed: 0,items,transactions
0,I2,7
1,I5,2
2,I4,2
3,I3,6
4,I1,6


Counting items in each row: 

In [12]:
items_per_transaction = eclat.df_bin.astype(int).sum(axis=1)
items_per_transaction

0    3
1    2
2    2
3    3
4    2
5    2
6    2
7    4
8    3
dtype: int64

Cloning pandas DataFrame for visualization purpose: 

In [13]:
df_table = df.sort_values("transactions", 
                          ascending=False
                          )
df_table.head(5).style.background_gradient(cmap='Blues')#  Top 5 most popular products/items

Unnamed: 0,items,transactions
0,I2,7
3,I3,6
4,I1,6
1,I5,2
2,I4,2


Creating tree map using plotly:

In [14]:
import plotly.express as px

df_table["all"] = "Tree Map" 

fig = px.treemap(df_table.head(50), 
                 path=['all', "items"], 
                 values='transactions',
                 color=df_table["transactions"].head(50), 
                 hover_data=['items'],
                 color_continuous_scale='Blues',
                )
fig.show()

## ECLAT algorithm

In [15]:
min_support = 8/100 # the item shoud appear at least at 5% of transactions

min_combination = 2 # start from transactions containing at least 2 items

max_combination = max(items_per_transaction) # up to maximum items per transaction

rule_indices, rule_supports = eclat.fit(min_support = min_support,
                                        #min_combination = min_combination,
                                        #max_combination = max_combination,
                                        #separator=',',
                                        #verbose=True
                                        )

Combination 1 by 1


5it [00:00, 122.28it/s]


Combination 2 by 2


10it [00:00, 176.61it/s]


Combination 3 by 3


10it [00:00, 173.88it/s]


In [20]:
import pandas as pd

result = pd.DataFrame(rule_supports.items(),
                      columns=['itemsets', 'support']
                      )

result.sort_values(by=['support'], 
                   ascending=False
                   )

Unnamed: 0,itemsets,support
0,I2,0.777778
3,I3,0.666667
4,I1,0.666667
7,I2 & I3,0.444444
8,I2 & I1,0.444444
12,I3 & I1,0.444444
14,I2 & I5 & I1,0.222222
2,I4,0.222222
5,I2 & I5,0.222222
6,I2 & I4,0.222222


<!--NAVIGATION-->
< [previous](prev) | [Contents](toc.ipynb) | [next](next.ipynb) >