# Association Rule - Market Basket Analysis
- Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among different items.
- Market basket analysis is a technique which identifies the stength of association between pairs of products purchased together.
<br><br>
- It is good to have a threshold metric for association rules.<br>
     The two main options are "confidence" and "lift".
- **Confidence** is the proportion of all baskets of the selected itemset that also contains the consequent item.

- **Lift** is the influence that an itemset has on the consequent item. 

<div class="alert alert - block alert-info">
<h1> Table of Contents </h1></div> <a class ="anchor" id = "0.1"></a>

1. [Importing Libraries](#1)
2. [Loading Dataset](#2)
3. [Data Exploration](#3)
4. [Feature Engineering](#4)
5. [Apriori Algorithm](#5)

<div class="alert alert - block alert-info">
<h1>1. Importing Libraries </h1></div> <a class ="anchor" id = "1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import plotly.express as px

try:
    import apyori
except:
    !pip install apyori

from apyori import apriori

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<div class ="alert alert - block alert-info">
    <h1> 2. Loading Dataset</h1></div><a class ="anchor" id = "2"></a>

[Back to Table of Contents](#0.1)

In [None]:
data = pd.read_csv("/kaggle/input/groceries-dataset/Groceries_dataset.csv")
print("Data Dimension:", data.shape)
data.head()

In [None]:
data.isnull().any()

In [None]:
print("Total number of unique products are:", len(data['itemDescription'].unique()))

<div class ="alert alert - block alert-info">
    <h1> 3. Data Exploration</h1></div><a class="anchor" id ="3"></a>
    
[Back to Table of Contents](#0.1)

In [None]:
#Top 10 frequently sold products
print("Top 10 frequently sold products(Tabular Representation)")
x = data['itemDescription'].value_counts().sort_values(ascending=False)[:10]
x

In [None]:
fig = px.bar(x= x.index, y= x.values)
fig.update_layout(title_text= "Top 10 frequently sold products (Graphical Representation)", xaxis_title= "Products", yaxis_title="Count")
fig.show()

In [None]:
# Exploring Higher sales by time of the year:
data["Year"] = data['Date'].str.split("-").str[-1]
data["Month-Year"] = data['Date'].str.split("-").str[1] + "-" + data['Date'].str.split("-").str[-1]
data.head()

In [None]:
fig1 = px.bar(data["Month-Year"].value_counts(ascending=False), 
              orientation= "v", 
              color = data["Month-Year"].value_counts(ascending=False),
               labels={'value':'Count', 'index':'Date','color':'Meter'})

fig1.update_layout(title_text="Exploring higher sales by the date")

fig1.show()

<div class ="alert alert - block alert-warning">
    <h4>Observations </h4></div>
- Milk is purchased the highest followed by vegetables <br>
- The most purchases are during August/Sepetember, while February/March has the leats demands

<div class ="alert alert - block alert-info">
    <h1> 4. Feature Engineering</h1></div><a class="anchor" id ="4"></a>
    
[Back to Table of Contents](#0.1)

In [None]:
products = data['itemDescription'].unique()

In [None]:
#one hot encoding the products:

dummy = pd.get_dummies(data['itemDescription'])
data.drop(['itemDescription'], inplace =True, axis=1)

data = data.join(dummy)

data.head()

In [None]:
# Transaction: If a customer bought multiple products in one day, it will be considered as 1 transaction:

data1 = data.groupby(['Member_number', 'Date'])[products[:]].sum()
data1 = data1.reset_index()[products]

print("New Dimension", data1.shape)
data1.head()

In [None]:
#Replacing all non-zero values with the name of the product:

def product_names(x):
    for product in products:
        if x[product] >0:
            x[product] = product
    return x

data1 = data1.apply(product_names, axis=1)
data1.head()

In [None]:
print("Total Number of Transactions:", len(data1))

In [None]:
#Removing Zeros, Extracting the list of items bought per customer

x = data1.values
x = [sub[~(sub==0)].tolist() for sub in x if sub [sub != 0].tolist()]
transactions = x
transactions[0:10]

<div class ="alert alert - block alert-info">
    <h1>5. Apriori Algorithm</h1></div><a class="anchor" id ="5"></a>
    
[Back to Table of Contents](#0.1)

In [None]:
rules = apriori(transactions, min_support = 0.00030, min_confidence = 0.05, min_lift = 3, max_length = 2, target = "rules")
association_results = list(rules)
print(association_results[0])

In [None]:
for item in association_results:
    
    pair = item[0]
    items = [x for x in pair]
    
    print("Rule : ", items[0], " -> " + items[1])
    print("Support : ", str(item[1]))
    print("Confidence : ",str(item[2][0][2]))
    print("Lift : ", str(item[2][0][3]))
    
    print("=============================") 