**Market Based Analysis** is one of the key techniques used by large relations to show associations between items.It allows retailers to identify relationships between the items that people buy together frequently.

Given a set of transactions, we can find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.

**Association Rule** – An implication expression of the form X -> Y, where X and Y are any 2 itemsets.
**Rule Evaluation Metrics** –

**Support(s)** –
The number of transactions that include items in the {X} and {Y} parts of the rule as a percentage of the total number of transaction.It is a measure of how frequently the collection of items occur together as a percentage of all transactions.
**Support = \sigma(X+Y) \div total –** 
It is interpreted as fraction of transactions that contain both X and Y.
**Confidence(c)** –
It is the ratio of the no of transactions that includes all items in {B} as well as the no of transactions that includes all items in {A} to the no of transactions that includes all items in {A}.
**Conf(X=>Y) = Supp(X\cupY) \div Supp(X) –**
It measures how often each item in Y appears in transactions that contains items in X also.
**Lift(l)** –
The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence, assuming that the itemsets X and Y are independent of each other.The expected confidence is the confidence divided by the frequency of {Y}.
**Lift(X=>Y) = Conf(X=>Y) \div Supp(Y) –**
Lift value near 1 indicates X and Y almost often appear together as expected, greater than 1 means they appear together more than expected and less than 1 means they appear less than expected.Greater lift values indicate stronger association.

Source: https://www.geeksforgeeks.org/association-rule/

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
## Read the dataset
df = pd.read_csv('/kaggle/input/online-retail-data-set-from-ml-repository/retail_dataset.csv', sep=',')
df.head()

In [None]:
df.info()

In [None]:
## Make list of the dataset
records = []
for i in range(1, 315):
    records.append([str(df.values[i, j]) for j in range(0, 7)])

In [None]:
records

In [None]:
## Encode the data for machine to read 

te = TransactionEncoder()
te_ary = te.fit(records).transform(records)
df1 = pd.DataFrame(te_ary, columns = te.columns_)
df1.head()

In [None]:
## Find Frequent items using apriori algorithm

frequent_itemsets = apriori(df1, min_support=0.2, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

In [None]:
## association with confidence metric

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

In [None]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)
rules

In [None]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

In [None]:
## Find the items which match the required criteria

rules[ (rules['antecedent_len'] >= 2) &
       (rules['confidence'] > 0.5) &
       (rules['lift'] > 1.0) ]

In [None]:
## Plot a graph between lift and confidence values using antecedent length

import matplotlib.pyplot as plt
from mlxtend.plotting import category_scatter

fix = category_scatter(x = "lift", y = "confidence", label_col = "antecedent_len", 
                       data=rules, legend_loc= "lower right")