# Eclat

If you're planning to find the best set of products in bucket without any rules, you can go for **Eclat** rather than **Apriori**.

Eclat only use support value for find the most common set of product.

## Algorithm
- **Step 1** : Choose the minimum support value.
- **Step 2** : Select all the subset of transaction which have higher support value than minimum support.
- **Step 3** : Sort the set by decreasing support.
- **END** : Your rules are prepared.

## Importing the libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
%pip install apyori

Note: you may need to restart the kernel to use updated packages.


## Data Preprocessing
- As we are using apriori for find the ecalt, data preprocessing step remains. 

In [3]:
# Load the dataset with/without header.
dataset = pd.read_csv(r'../dataset/Market_Basket_Optimisation.csv', header = None)

# Convert pandas data frame to list of list containing only string values. (Keep empty values as nan)
transactions = [[str(value).lower() for value in row] for row in dataset.values]

## Train Eclat ARL Model

In [4]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, max_length = 2)
results = list(rules)
results

[RelationRecord(items=frozenset({'almonds'}), support=0.020397280362618318, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'almonds'}), confidence=0.020397280362618318, lift=1.0)]),
 RelationRecord(items=frozenset({'antioxydant juice'}), support=0.008932142381015865, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'antioxydant juice'}), confidence=0.008932142381015865, lift=1.0)]),
 RelationRecord(items=frozenset({'asparagus'}), support=0.004666044527396347, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'asparagus'}), confidence=0.004666044527396347, lift=1.0)]),
 RelationRecord(items=frozenset({'avocado'}), support=0.03332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'avocado'}), confidence=0.03332888948140248, lift=1.0)]),
 RelationRecord(items=frozenset({'babies food'}), support=0.004532728969470737, ordered_statistics=[OrderedSta

## Visualize result

If we check the `results` we can understand that it is returning RelationRecord object as shown in below.

```
RelationRecord(
    items=frozenset({'chicken', 'light cream'}), 
    support=0.004532728969470737, 
    ordered_statistics=[
        OrderedStatistic(
            items_base=frozenset({'light cream'}), 
            items_add=frozenset({'chicken'}), 
            confidence=0.29059829059829057, 
            lift=4.84395061728395
        )
    ]
),
```

We need to loop through all the results and create new data frame having this information "Products", "Support".

So, that we can perform sort on "Support" value on top 10 columns.

In [5]:
data_values = []
min_length=2
for result in results:
    if len(result.items) >= 2:
        if 'nan' not in list(result.items):
            data_values.append([result.items, result.support])
resultsinDataFrame = pd.DataFrame(data_values, columns=["Product", "Support"])
resultsinDataFrame = resultsinDataFrame.nlargest(n=10, columns="Support")
resultsinDataFrame

Unnamed: 0,Product,Support
697,"(mineral water, spaghetti)",0.059725
241,"(mineral water, chocolate)",0.05266
339,"(eggs, mineral water)",0.050927
662,"(mineral water, milk)",0.047994
577,"(mineral water, ground beef)",0.040928
256,"(chocolate, spaghetti)",0.039195
589,"(ground beef, spaghetti)",0.039195
353,"(eggs, spaghetti)",0.036528
320,"(eggs, french fries)",0.036395
507,"(mineral water, frozen vegetables)",0.035729
