In [1]:
!pip install mlxtend

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

ModuleNotFoundError: No module named 'mlxtend'

I have a dataset that I downloaded on https://www.kaggle.com/shazadudwadia/supermarket. Now, I import the dataset with the help of ‘pandas.read_csv’ and overview how it is structured.  

In [None]:
df = pd.read_csv('GroceryStoreDataSet.csv', names = ['products'], sep = ',')
df.head()

Let's examine the shape of the data set,

In [None]:
df.shape

Let's split the products and create a list called by 'data',

In [None]:
data = list(df["products"].apply(lambda x:x.split(",") ))
data

**Apriori Algorithm and One-Hot Encoding**
 
Apriori's algorithm transforms True/False or 1/0.  
Using TransactionEncoder, we convert the list to a One-Hot Encoded Boolean list.  
Products that customers bought or did not buy during shopping will now be represented by values 1 and 0.


In [None]:
#Let's transform the list, with one-hot encoding
from mlxtend.preprocessing import TransactionEncoder
a = TransactionEncoder()
a_data = a.fit(data).transform(data)
df = pd.DataFrame(a_data,columns=a.columns_)
df = df.replace(False,0)
df

**Applying Apriori and Resulting**

The next step is to create the Apriori Model. We can change all the parameters in the Apriori Model in the mlxtend package.   
I will try to use minimum support parameters for this modeling.    
For this, I set a min_support value with a threshold value of 20% and printed them on the screen as well.
 

In [None]:
#set a threshold value for the support value and calculate the support value.
df = apriori(df, min_support = 0.2, use_colnames = True, verbose = 1)
df

I chose the 60% minimum confidence value.
In other words, when product X is purchased, we can say that the purchase of product Y is 60% or more.

In [None]:
#Let's view our interpretation values using the Associan rule function.
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.6)
df_ar


For example, if we examine our 1st index value;
* The probability of seeing sugar sales is seen as 30%.
* Bread intake is seen as 65%.
* We can say that the support of both of them is measured as 20%.
* 67% of those who buys sugar, buys bread as well.
* Users who buy sugar will likely consume 3% more bread than users who don't buy sugar.
* Their correlation with each other is seen as 1.05.
 

As a result, if item X and Y are bought together more frequently, then several steps can be taken to increase the profit. For instance:
 
* 	Cross-Selling can be improved by combining products - items
* 	The shop layout can be changed so that sales can be improved when certain items are kept together. 
* 	Promotional activities which are an advertising campaign can be carried out to increase the sales of goods that customers do not buy.
* 	Collective discounts can be offered on these products if the customer buys both of them.
 
 
 
 


References 

https://en.wikipedia.org/wiki/Association_rule_learning  
https://blogs.oracle.com/datascience/overview-of-traditional-machine-learning-techniques
https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/  
https://www.datasciencecentral.com/profiles/blogs/data-mining-association-rules-in-r-diapers-and-beer  
https://pyshark.com/market-basket-analysis-using-association-rule-mining-in-python/  
https://www.veribilimiokulu.com/python-ile-birliktelik-kurallari-analizi-association-rules-analysis-with-python/  
https://www.mygreatlearning.com/blog/apriori-algorithm-explained/  
https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/  
            
            