In [None]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

**Let's load the dataset**

In [None]:
grocery = pd.read_csv('../input/supermarket/GroceryStoreDataSet.csv', names = ['Products'], sep = ',')

**Finding first 5 and last 5 records**

In [None]:
grocery.head(5)

In [None]:
grocery.tail(5)

In [None]:
grocery.info()

In [None]:
grocery.shape

**Here we come to know that, there are no null records. Total 20 records are present in one single columns.**

In [None]:
grocery_df = list(grocery["Products"].apply(lambda x:x.split(",") ))
grocery_df

**Here, we have split the products and create a list.**

**One Hot Encoding**

Using TransactionEncoder, we convert the list to a One-Hot Encoded Boolean list.
Products that customers bought or did not buy during shopping will now be represented by values 1 and 0.

In [None]:
te = TransactionEncoder()
te_data = te.fit(grocery_df).transform(grocery_df)
gdf = pd.DataFrame(te_data, columns = te.columns_)
gdf = gdf.replace(False,0)
gdf

In [None]:
gdf = gdf.replace(True,1)
gdf

In [None]:
gdf.sum().to_frame('Frequency').sort_values('Frequency',ascending=False).plot(kind='bar',
                                                                                  figsize=(12,8),
                                                                                  title="Frequent Items")
plt.show()

**Applying Apriori Algorithm**

In the next step, we are applying Apriori algrithm. For this data set, we'll set a min_support value with a threshold value of 20% and printed them on the screen as well.

In [None]:
gdf1 = apriori(gdf, min_support = 0.2, use_colnames = True, verbose = 1)
gdf1

In [None]:
gdf1.sort_values(by = "support" , ascending = False)

**Sorted from highest to lowest WRT Support values**

**In the next step, we'll chose the 60% minimum confidence value. In other words, when product A is purchased, product B also got purchased 60% or more time.**

In [None]:
gdf_rules = association_rules(gdf1, metric = 'confidence', min_threshold = 0.6)
gdf_rules

**From above table:**

* The probability of Milk sales is 25%
* Milk and Bread are sold together in 20% of all purchases (support)
* 80% of customers who buy Milk will also buy Bread (confidence)
* Sales of Bread increased by 1.23 times in shopping with Milk (lift)
* Milk & Bread correlation with each other is seen as 1.75

In [None]:
gdf_rules.sort_values(by = "lift", ascending = False)

**Lift indicates:**

* Whether if two type of products can be sold together - **lift value higher than 1**
* Whether 1 product be substitute of other - **lift value lower than 1**
* Whether if there is no relation between the type or products - **lift value equals to 1**