# ECLAT
---
The Eclat (Equivalence Class Transformation) algorithm is another classic data mining algorithm used for mining frequent itemsets and discovering association rules in a database. It differs from the Apriori algorithm in terms of its methodology and efficiency. This page aims to provide an overview of the Eclat algorithm, its method, key parameters, advantages, limitations, and a basic implementation guide using Python.

The Eclat algorithm employs a depth-first search strategy to find frequent itemsets in a dataset. Instead of generating candidate itemsets as in Apriori, Eclat uses a vertical data format to represent transactions. It maintains an index structure, often called the tidset, which records the transactions in which each item appears. Eclat then recursively combines frequent itemsets by intersecting their tidsets. This approach scans the database only once, eliminates the need for candidate generation, making it efficient for mining frequent itemsets in large databases.

### **Key Parameters**
| **Parameter**             | **Description**                                                               |
|:--------------------------|:------------------------------------------------------------------------------|
| `min_support`             | User-defined threshold (a decimal between 0 and 1) that determines the minimum frequency at which an itemset must be present in the dataset to be considered 'frequent'.|
| `min_combination`    | User-defined minimum size of the itemsets to be considered frequent.<br><br>Setting a higher value for `min_combination` will result in the algorithm only considering larger itemsets as frequent. This can lead to discovering fewer but potentially more significant association rules or patterns. It filters out smaller itemsets, which may include common but less interesting associations.<br><br> Setting a lower value for `min_combination` allows the algorithm to find smaller frequent itemsets. This can lead to a larger number of discovered itemsets, including more specific and potentially noise patterns. It may be useful for finding fine-grained associations but can also result in a higher volume of results to analyze.|
| `max_combination`          | User-defined maximum size of the itemsets to be considered.<br><br>Setting a higher value for `max_combination` allows the algorithm to consider larger itemsets as frequent. This can be useful when you have prior knowledge that certain associations or patterns involve a larger number of items. However, it may also increase computational complexity and runtime.<br><br>Setting a lower value for `max_combination` limits the size of itemsets considered by the algorithm. It can lead to faster execution and a smaller number of results, focusing on more concise patterns. However, you might miss associations that involve larger sets of items.|

## Install the pyECLAT library

In [18]:
!pip install pyECLAT



## Import the libraries

In [19]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning, module="seaborn")
warnings.filterwarnings("ignore", category=Warning)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)

from pyECLAT import ECLAT

## Load the dataset

In [20]:
url = "https://github.com/robitussin/CCADMACL/blob/5b91f8f5149bd03375c1529f4e0d352d7f4f2a9e/10%20-%20Eclat%20Algorithm/implementation/market_basket_optimization.csv?raw=true"

df = pd.read_csv(url, header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


 ## Generate transaction lists

In [21]:
txns = df.fillna("").values.tolist()
txns = [[item for item in txn if item != ''] for txn in txns]
txns = [[item.strip() for item in txn] for txn in txns]

# Create a list of unique ids for the transactions
ids = [i + 1 for i in range(len(txns))]

# Initialize an empty list
data =[]
# Iterate through transactions and add them to the DataFrame with IDs
for i, txn in enumerate(txns):
    data.extend([{'TID': ids[i], 'Item': item} for item in txn])

df_txn = pd.DataFrame(data)
df_txn.head(25)

Unnamed: 0,TID,Item
0,1,shrimp
1,1,almonds
2,1,avocado
3,1,vegetables mix
4,1,green grapes
5,1,whole weat flour
6,1,yams
7,1,cottage cheese
8,1,energy drink
9,1,tomato juice


## Find the most frequent items

In [22]:
# Find the top 30 most frequent items
top_items = df_txn['Item'].value_counts().reset_index()

# Convert the top 30 items into DataFrame and sort by item count in descending order
df_top_items = pd.DataFrame(top_items)
df_top_items.columns = ['Item', 'Count']

# Calculate the percentage of transactions for each item
total_transactions = len(df)
df_top_items['% Count'] = (df_top_items['Count']*100 / total_transactions).round(2)

# Display the results
df_top_items.style.background_gradient(cmap='Blues')

Unnamed: 0,Item,Count,% Count
0,mineral water,1788,23.84
1,eggs,1348,17.97
2,spaghetti,1306,17.41
3,french fries,1282,17.09
4,chocolate,1230,16.4
5,green tea,991,13.21
6,milk,972,12.96
7,ground beef,737,9.83
8,frozen vegetables,715,9.53
9,pancakes,713,9.51


## Generate frequent itemsets using ECLAT

In [23]:
# Initiate an Eclat instance and load transactions DataFrame to the instance
eclat = ECLAT(data=df, verbose=True)

# Generate a binary dataframe
eclat.df_bin.head()

100%|██████████| 120/120 [00:01<00:00, 94.36it/s] 
100%|██████████| 120/120 [00:00<00:00, 4099.07it/s]
100%|██████████| 120/120 [00:00<00:00, 4179.54it/s]


Unnamed: 0,ground beef,cottage cheese,soup,chutney,nonfat milk,melons,whole wheat rice,milk,dessert wine,salmon,antioxydant juice,strong cheese,babies food,pasta,white wine,flax seed,salt,blueberries,mayonnaise,soda,magazines,pepper,meatballs,eggs,avocado,black tea,cake,rice,grated cheese,chili,chicken,mineral water,gluten free bar,cream,candy bars,fromage blanc,yams,muffins,spaghetti,gums,cereals,zucchini,cauliflower,frozen vegetables,spinach,pickles,salad,french fries,carrots,butter,cider,red wine,pet food,bramble,champagne,hand protein bar,chocolate bread,light cream,green beans,protein bar,turkey,green grapes,cookies,tomatoes,bacon,yogurt cake,shampoo,herb & pepper,fresh bread,asparagus,barbecue sauce,almonds,shallot,burgers,sparkling water,low fat yogurt,oil,extra dark chocolate,tea,napkins,mushroom cream sauce,ketchup,chocolate,parmesan cheese,mint,brownies,body spray,whole wheat pasta,asparagus.1,bug spray,strawberries,toothpaste,water spray,tomato sauce,vegetables mix,hot dogs,oatmeal,escalope,energy bar,light mayo,shrimp,green tea,whole weat flour,tomato juice,sandwich,energy drink,corn,frozen smoothie,mashed potato,cooking oil,ham,olive oil,clothes accessories,burger sauce,honey,pancakes,fresh tuna,french wine,eggplant,mint green tea
0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,1,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Display a list with all the names of the different items

In [24]:

unique_item_list = eclat.uniq_
print(unique_item_list)

['ground beef', 'cottage cheese', 'soup', 'chutney', 'nonfat milk', 'melons', 'whole wheat rice', 'milk', 'dessert wine', 'salmon', 'antioxydant juice', 'strong cheese', 'babies food', 'pasta', 'white wine', 'flax seed', 'salt', 'blueberries', 'mayonnaise', 'soda', 'magazines', 'pepper', 'meatballs', 'eggs', nan, 'avocado', 'black tea', 'cake', 'rice', 'grated cheese', 'chili', 'chicken', 'mineral water', 'gluten free bar', 'cream', 'candy bars', 'fromage blanc', 'yams', 'muffins', 'spaghetti', 'gums', 'cereals', 'zucchini', 'cauliflower', 'frozen vegetables', 'spinach', 'pickles', 'salad', 'french fries', 'carrots', 'butter', 'cider', 'red wine', 'pet food', 'bramble', 'champagne', 'hand protein bar', 'chocolate bread', 'light cream', 'green beans', 'protein bar', 'turkey', 'green grapes', 'cookies', 'tomatoes', 'bacon', 'yogurt cake', 'shampoo', 'herb & pepper', 'fresh bread', 'asparagus', 'barbecue sauce', 'almonds', 'shallot', 'burgers', 'sparkling water', 'low fat yogurt', 'oil', 

### Set parameters

In [25]:
min_support_threshold = 0.04
min_combination = 2
max_combination = 3

Applying Eclat algorithm assuming an item has to appear in at least 4% of the total transaction to be considered as frequent and a frequent itemset should contain at least 1 item and a maximum of 3 items

In [26]:
get_ECLAT_indexes, get_ECLAT_supports = eclat.fit(min_support = min_support_threshold, min_combination = min_combination, max_combination = max_combination, separator=' & ', verbose=True)

# Display results in a dataframe
result = pd.DataFrame(get_ECLAT_supports.items(),columns=['Item', 'Support'])
result = result.sort_values(by=['Support'], ascending=False).reset_index(drop=True)
result

Combination 2 by 2


435it [00:09, 47.37it/s]


Combination 3 by 3


4060it [01:01, 66.20it/s]


Unnamed: 0,Item,Support
0,mineral water & spaghetti,0.059725
1,mineral water & chocolate,0.05266
2,eggs & mineral water,0.050927
3,milk & mineral water,0.047994
4,ground beef & mineral water,0.040928


The top 5 items with the highest support values are: mineral water (23.84%), eggs (17.97%), spaghetti (17.41%), french fries (17.09%), and chocolate (16.38%).
The least frequent items are fresh bread (4.31%), salmon (4.25%), and ground beef & mineral water (4.09%).
Some interesting itemsets with relatively high support include mineral water & spaghetti (5.97%), chocolate & mineral water (5.27%), and eggs & mineral water (5.09%).