<a href="https://colab.research.google.com/github/metehanunal0/ECLAT-Association-rule-mining/blob/main/Eclat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%pip install pyECLAT

Collecting pyECLAT
  Downloading pyECLAT-1.0.2-py3-none-any.whl (6.3 kB)
Installing collected packages: pyECLAT
Successfully installed pyECLAT-1.0.2


# Importing dataset

In [6]:
# importing dataset ( example 1 and example 2 are datasets in pyECLAT)
from pyECLAT import Example2
# storing the dataset in a variable
dataset = Example2().get()
# printing the dataset
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams
1,burgers,meatballs,eggs,,,,
2,chutney,,,,,,
3,turkey,avocado,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,


In [7]:
dataset.shape

(3001, 7)

In [10]:
dataset.describe()

Unnamed: 0,0,1,2,3,4,5,6
count,3001,2315,1774,1374,1048,775,581
unique,109,113,109,108,104,91,93
top,mineral water,mineral water,mineral water,mineral water,green tea,eggs,green tea
freq,221,183,161,87,67,44,44


# Visualizing Frequent Items

In [11]:
# importing the ECLAT module
from pyECLAT import ECLAT
# loading transactions DataFrame to ECLAT class
eclat = ECLAT(data=dataset)
# DataFrame of binary values
eclat.df_bin

Unnamed: 0,cereals,extra dark chocolate,muffins,shallot,energy bar,blueberries,mineral water,nonfat milk,light mayo,hand protein bar,...,green tea,cauliflower,salmon,ketchup,strong cheese,gums,fresh bread,green beans,parmesan cheese,frozen smoothie
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,1,0,1,0,0,0,...,1,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2996,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2997,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2998,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2999,0,0,0,0,0,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In this binary dataset, every row represents a transaction. Columns are possible products that might appear in every transaction. Every cell contains one of two possible values:

* 0 – the product was not included in the transaction
* 1 – the transaction contains the product

In [12]:
# count items in each column
items_total = eclat.df_bin.astype(int).sum(axis=0)
items_total

cereals                  54
extra dark chocolate     31
muffins                  69
shallot                  22
energy bar               80
                       ... 
gums                     27
fresh bread              91
green beans              17
parmesan cheese          58
frozen smoothie         144
Length: 119, dtype: int64

In [13]:
# count items in each row
items_per_transaction = eclat.df_bin.astype(int).sum(axis=1)
items_per_transaction

0       7
1       3
2       1
3       2
4       5
       ..
2996    1
2997    2
2998    3
2999    7
3000    5
Length: 3001, dtype: int64

In [17]:
import pandas as pd
# Loading items per column stats to the DataFrame
df = pd.DataFrame({'items': items_total.index, 'transactions': items_total.values})
# cloning pandas DataFrame for visualization purpose
df_table = df.sort_values("transactions", ascending=False)
#  Top 5 most popular products/items
df_table.head(5).style.background_gradient(cmap='gnuplot')

Unnamed: 0,items,transactions
6,mineral water,711
22,spaghetti,549
47,eggs,532
79,chocolate,485
46,french fries,463


In [32]:
# importing required module
import plotly.express as px
# to have a same origin
df_table["all"] = "All"
# creating tree map using plotly
fig = px.treemap(df_table.head(50), path=['all', "items"], values='transactions',
                  color=df_table["transactions"].head(50), hover_data=['items'],
                  color_continuous_scale='sunsetdark',
                )
# ploting the treemap
fig.show()

In [30]:
fig = px.bar(df_table.head(50), y='items', x='transactions',
             orientation='h',
             title='Top Items by Transactions',
             labels={'items': 'Items', 'transactions': 'Transactions'},
             color='transactions',
             color_continuous_scale='sunsetdark')

# Customize the layout if needed
fig.update_layout(
    yaxis=dict(title='Items'),
    xaxis=dict(title='Transactions'),
    coloraxis_colorbar=dict(title='Transactions', tickformat=','),
)

# Show the bar chart
fig.show()

# Association Rules

* **Minimum support** – should be provided as a percentage of the overall items from the dataset
* **Minumum combinations** – the minimum amount of items in the transaction
* **Maximum combinations** – the maximum amount of items in the transaction

Note: the higher the value of the maximum combinations the longer the calculation will take.

In [33]:
# the item should appear at least at 5% of transactions
min_support = 5/100
# start from transactions containing at least 2 items
min_combination = 2
# up to maximum items per transaction
max_combination = max(items_per_transaction)
rule_indices, rule_supports = eclat.fit(min_support=min_support,
                                                 min_combination=min_combination,
                                                 max_combination=max_combination,
                                                 separator=' & ',
                                                 verbose=True)

Combination 2 by 2


253it [00:02, 99.40it/s] 


Combination 3 by 3


1771it [00:13, 129.58it/s]


Combination 4 by 4


8855it [01:10, 126.34it/s]


Combination 5 by 5


33649it [04:53, 114.68it/s]


Combination 6 by 6


100947it [14:47, 113.74it/s]


Combination 7 by 7


245157it [38:46, 105.40it/s]


The fit method of the ECLAT algorithm returns two outputs: *rule_indices* and *rule_supports*.
* rule_indices is a list of itemsets (as indices of items) that satisfy the specified support and combination criteria.
* rule_supports is a corresponding list of support values for each of the mined itemsets.

In [34]:
result = pd.DataFrame(rule_supports.items(),columns=['Item', 'Support'])
result.sort_values(by=['Support'], ascending=False)

Unnamed: 0,Item,Support
0,mineral water & spaghetti,0.060646


Mineral water and spaghetti are commonly purchased by customers based on the transaction data in our dataset and the minimum support value we’ve provided.