## Experiment 3
Apply a-priori algorithm to find frequently occurring items from given data and generate strong association rules using support and confidence thresholds. For Example: Market Basket Analysis .

Performed on kaggle.com
Dataset available on https://www.kaggle.com/mashlyn/online-retail-ii-uci

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [2]:
df = pd.read_csv('/kaggle/input/online-retail-ii-uci/online_retail_II.csv').dropna()
df.head()

In [3]:
items = df[['StockCode','Description']].drop_duplicates()
items.head()

In [4]:
df.info()

# **Creating Dataset**
We group items having the same invoice number (i.e. they have been purchased together) into lists

In [5]:
dataset = df.groupby(['Invoice'])['StockCode'].apply(list)
dataset.head()

'''
for reference: 
https://stackoverflow.com/questions/53037888/pandas-groupby-to-list
'''

In [6]:
#converting to list
dataset = dataset.tolist()[:5000] #taking only first 5000 rows for this experiment
print(dataset[:5])

# Apriori Implementation

In [7]:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori 
from mlxtend.frequent_patterns import association_rules

In [8]:
# encoding transactions
te = TransactionEncoder()
encoded_dataset = te.fit(dataset).transform(dataset)
transaction_df = pd.DataFrame(encoded_dataset,columns=te.columns_)
transaction_df.head()

In [9]:
#obtaining frequent itemsets
frequent_itemsets=apriori(transaction_df,min_support=0.005,use_colnames=True) # lower min support for large dataset
frequent_itemsets.count()

In [10]:
frequent_itemsets

In [11]:
res=association_rules(frequent_itemsets,metric='confidence',min_threshold=0.005)
res

In [12]:
#function to obtain item description for given stock code
def re_label(codes):
    labels = []
    for code in codes:
        labels.append(items[items['StockCode']==code]['Description'].iloc[0])
    return tuple(labels)

In [13]:
res2 = res[res['confidence']>=1]
res2 = res2.reset_index()
res2

In [14]:
#sample row index 43354
print('antecedents:')
print(re_label(list(res2.iloc[43354]['antecedents'])))
print('consequents:')
print(re_label(list(res2.iloc[43354]['consequents'])))

In [15]:
#sample row index 0
print('antecedents:')
print(re_label(list(res2.iloc[0]['antecedents'])))
print('consequents:')
print(re_label(list(res2.iloc[0]['consequents'])))