#### Introduction
The Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a popular method for frequent itemset mining in large transactional databases. Unlike the Apriori algorithm, which uses a breadth-first search strategy, Eclat uses a depth-first search strategy to discover frequent itemsets.

#### Key Points:
###### Depth-First Search: Eclat uses a depth-first search approach, making it efficient for mining large datasets with many transactions.
###### Intersection-based: The algorithm relies on the intersection of transaction IDs (TID) lists to find frequent itemsets.
###### Memory Usage: Eclat can be more memory-intensive compared to Apriori because it requires keeping track of transaction IDs for each itemset.

#### Scenarios where the Eclat Algorithm is Beneficial
Eclat is particularly useful in scenarios where:

###### Large Datasets: The dataset contains a large number of transactions, making depth-first search more efficient.
###### Sparse Data: The data is sparse, meaning that there are many items but only a few transactions contain a particular item.
###### Frequent Itemsets Needed: The goal is to find frequent itemsets without generating candidate sets iteratively as in Apriori.

#### Methods for Implementing the Eclat Algorithm
The Eclat algorithm can be implemented using the following steps:

Transform the Dataset: Convert the transactional database into a vertical data format, where each item is associated with a list of transaction IDs.
Depth-First Search: Perform a depth-first search on the vertical data to find frequent itemsets by intersecting TID lists.
Pruning: Use support thresholds to prune infrequent itemsets and optimize the search process.

## Let’s now introduce an example use case to make the topic a little bit more practical and applied. In this article, we will take a small dataset of transactions of a night store. For each transaction, we simply have a list of products.

!pip install pyECLAT

In [1]:
# store the item sets as lists of strings in a list
transactions = [
    ['beer', 'wine', 'cheese'],
    ['beer', 'potato chips'],
    ['eggs', 'flower', 'butter', 'cheese'],
    ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
    ['wine', 'cheese'],
    ['potato chips'],
    ['eggs', 'flower', 'butter', 'wine', 'cheese'],
    ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
    ['wine', 'beer'],
    ['beer', 'potato chips'],
    ['butter', 'eggs'],
    ['beer', 'potato chips'],
    ['flower', 'eggs'],
    ['beer', 'potato chips'],
    ['eggs', 'flower', 'butter', 'wine', 'cheese'],
    ['beer', 'wine', 'potato chips', 'cheese'],
    ['wine', 'cheese'],
    ['beer', 'potato chips'],
    ['wine', 'cheese'],
    ['beer', 'potato chips']
]

###### The pyECLAT library takes a data frame as input. You can simply convert your list of transactions into a data frame and the pyECLAT package will take care of the rest. It is not a problem that you have a lot of None values in the resulting data frame

In [2]:

import pandas as pd

# you simply convert the transaction list into a dataframe
data = pd.DataFrame(transactions)
data

Unnamed: 0,0,1,2,3,4
0,beer,wine,cheese,,
1,beer,potato chips,,,
2,eggs,flower,butter,cheese,
3,eggs,flower,butter,beer,potato chips
4,wine,cheese,,,
5,potato chips,,,,
6,eggs,flower,butter,wine,cheese
7,eggs,flower,butter,beer,potato chips
8,wine,beer,,,
9,beer,potato chips,,,


In [3]:

# we are looking for itemSETS
# we do not want to have any individual products returned
min_n_products = 2

# we want to set min support to 7
# but we have to express it as a percentage
min_support = 7/len(transactions)

# we have no limit on the size of association rules
# so we set it to the longest transaction
max_length = max([len(x) for x in transactions])

In [5]:
from pyECLAT import ECLAT

# create an instance of eclat
my_eclat = ECLAT(data=data, verbose=True)

# fit the algorithm
rule_indices, rule_supports = my_eclat.fit(min_support=min_support,
                                           min_combination=min_n_products,
                                           max_combination=max_length)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 1494.96it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 111476.52it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 2658.62it/s]


Combination 2 by 2


10it [00:00, 630.08it/s]


Combination 3 by 3


10it [00:00, 963.88it/s]


Combination 4 by 4


5it [00:00, 836.65it/s]


Combination 5 by 5


1it [00:00, 575.03it/s]


In [6]:
rule_indices

{'cheese & wine': [0, 4, 6, 14, 15, 16, 18],
 'beer & potato chips': [1, 3, 7, 9, 11, 13, 15, 17, 19]}

In [7]:

print(rule_supports)

{'cheese & wine': 0.35, 'beer & potato chips': 0.45}


#### Conclusion
The Eclat algorithm is an efficient method for mining frequent itemsets in large and sparse transactional databases. By using a depth-first search strategy and intersection of transaction ID lists, Eclat can quickly identify frequent itemsets without the need for iterative candidate generation. This makes it particularly useful for applications requiring frequent pattern discovery in large datasets, such as market basket analysis and web usage mining. The provided Python implementation demonstrates how to apply the Eclat algorithm to a sample dataset, showcasing the identification of frequent itemsets based on a specified minimum support threshold.