<a href="https://colab.research.google.com/github/karima33/python_projects/blob/main/check9.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#Importing libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

!pip install apriori_python


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
#Reading the data set 
data = pd.read_csv("/content/Market_Basket_Optimisation.csv", sep=',')
data.shape

(7500, 20)

In [3]:
data.head()

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


The apriori method in mlxtend requires the dataset in the 0 and 1 format. Hence, the following lines of code are used in order to convert the dataset to the required format.

In [4]:
encoded_values = []
for index, row in data.iterrows():
    market_basket_labels = {}
    uncommons = list(set(data) - set(row))
    commons = list(set(data).intersection(row))
    for uc in uncommons:
        market_basket_labels[uc] = 0
    for common in commons:
        market_basket_labels[common] = 1
    encoded_values.append(market_basket_labels)

market_basket_optimisation_encoded = pd.DataFrame(encoded_values)

In [5]:
market_basket_optimisation_encoded.head()

Unnamed: 0,whole weat flour,antioxydant juice,spinach,avocado,salad,mineral water,green tea,tomato juice,shrimp,olive oil,vegetables mix,salmon,energy drink,green grapes,cottage cheese,low fat yogurt,frozen smoothie,yams,honey,almonds
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


In [6]:
item_list = market_basket_optimisation_encoded.columns
item_count = list()
for item in item_list:
        item_count.append(len(market_basket_optimisation_encoded[market_basket_optimisation_encoded[item]==1]))
    
print(item_count)

[69, 66, 52, 249, 36, 1787, 990, 227, 535, 493, 192, 318, 199, 67, 238, 573, 474, 85, 355, 152]


Let's find the top 10 products with the highest recurrence across transactions, as these may skew the results as they are dominant compared to the other products.

In [7]:
#ploting the most purchased items
import plotly.express as px
item_count_df = pd.DataFrame()
item_count_df['item'] = item_list
item_count_df['count'] = item_count

px.bar(item_count_df.sort_values(by = 'count', ascending = False).iloc[0:10,], x = 'item', y = 'count')

From the results, we can understand that mineral water is the most purchased product amongst all of them followed by green tea.

# Apriori algorithm
Apriori algorithm refers to the algorithm which is used to calculate the association rules between objects. It means how two or more objects are related to one another. In other words, we can say that the apriori algorithm is an association rule leaning that analyzes that people who bought product A also bought product B.

The primary objective of the apriori algorithm is to create the association rule between different objects. The association rule describes how two or more objects are related to one another. Apriori algorithm is also called frequent pattern mining.

The main components of Apritori algoirithm are:
1. Support
2. Confidence
3. Lift

The apriori algorithm uses frequently occuring itemsets to generate association rules. An upper threshold value, to consider the frequent itemsets above the value is given. The association rules are determined based on the frequency of occurence.

The main components of Apritori algoirithm are:

1. Support
2. Confidence
3. Lift

* Support:  It is defined as the number of transactions in which a product or a set of products are brought together divided by the total number of trasnactions.

* Confidence - Confidence is defined as the number of transactions in which Product 1 and 2 are purchased divided by the total number of transactions in which Product 1 is purchased.

* Lift - Lift is defined as the increase in the purchase of Product 1 when Product 2 is sold.



In [34]:
#appriori algorithm application
from mlxtend.frequent_patterns import apriori, association_rules
frequent_itemsets_ap = apriori(market_basket_optimisation_encoded, min_support=0.01, use_colnames=True)# the min_support for finding out items or group of items which have a support greater than the minimum support.


In [35]:
frequent_itemsets_ap

Unnamed: 0,support,itemsets
0,0.0332,(avocado)
1,0.238267,(mineral water)
2,0.132,(green tea)
3,0.030267,(tomato juice)
4,0.071333,(shrimp)
5,0.065733,(olive oil)
6,0.0256,(vegetables mix)
7,0.0424,(salmon)
8,0.026533,(energy drink)
9,0.031733,(cottage cheese)


In [38]:
# import association rules class to find association rules amonng the items/group of items which have a support greater than the min support.
from mlxtend.frequent_patterns import association_rules

# we have used the metric as confidence and min_threshold to filter out the rules based on these parameters.
market_basket_rules = association_rules(frequent_itemsets_ap, metric="confidence", min_threshold=0.2)

In [44]:
# Convert the rules obtained into a dataframe for better visualisation
result = pd.DataFrame(market_basket_rules)
result.sort_values(by='support',inplace=True,ascending=False)
#the list of items that are purchased together frequently.
result

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(green tea),(mineral water),0.132,0.238267,0.030933,0.234343,0.983534,-0.000518,0.994876
3,(olive oil),(mineral water),0.065733,0.238267,0.027467,0.41785,1.753707,0.011805,1.308483
5,(low fat yogurt),(mineral water),0.0764,0.238267,0.023867,0.312391,1.311098,0.005663,1.1078
2,(shrimp),(mineral water),0.071333,0.238267,0.023467,0.328972,1.380688,0.00647,1.135174
6,(frozen smoothie),(mineral water),0.0632,0.238267,0.020133,0.318565,1.337012,0.005075,1.117838
4,(salmon),(mineral water),0.0424,0.238267,0.016933,0.399371,1.676152,0.006831,1.268226
7,(honey),(mineral water),0.047333,0.238267,0.014933,0.315493,1.324117,0.003655,1.11282
0,(avocado),(mineral water),0.0332,0.238267,0.011467,0.345382,1.449559,0.003556,1.163629


we can conclude that these are the products that are purchased together most commonly.