**APMA 940: Mathematics of Data Science**  
Project: *Association Rule Learning with Applications to Market Basket Analysis*  
Author: Javier Almonacid  
Date: April 12, 2023  

---  


# MBA Example 1: Supermarket Data Set

We run first a more academic example in which the data set is not too large and is already formatted as a binary matrix. The data set, which originates from point-of-sale transactions in a small supermarket, contains 1361 transactions (rows) and 255 items (columns).

>Data Set Source: M. Barksy. CSCI 485: Data Mining 2012, Lab 7. Department of Computer Science, Vancouver Island University (2012). Available at [http://csci.viu.ca/~barskym/teaching/DM2012/labs/LAB7/PartII.html](http://csci.viu.ca/~barskym/teaching/DM2012/labs/LAB7/PartII.html). Accessed on April 12, 2023.

In [5]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

pd.set_option('display.max_columns', 75)

basket_encoded = pd.read_csv("marketbasket_viu.csv")
print("Number of rows: "+str(len(basket_encoded.index)))


Number of rows: 1361


## Preprocessing

We now have only two preprocessing tasks to perform. First, because an association rule requires at least two items, we have to remove from the database all single-item transactions. Then, we convert the numeric entries (0 and 1) into boolean ones (False and True, respectively).

In [6]:
# Keep only those rows where more than one item was purchased.
basket_filtered = basket_encoded[(basket_encoded > 0).sum(axis=1) >= 2]

# Convert numeric entries into boolean type.
basket_filtered = basket_filtered.applymap(bool)

print("Number of rows after cleaning: "+str(len(basket_filtered.index)))
basket_filtered

Number of rows after cleaning: 611


Unnamed: 0,Hair Conditioner,Lemons,Standard coffee,Frozen Chicken Wings,98pct. Fat Free Hamburger,Sugar Cookies,Onions,Deli Ham,Dishwasher Detergent,Beets,40 Watt Lightbulb,Ice Cream,Cottage Cheese,Plain English Muffins,Strawberry Soda,Vanilla Ice Cream,Potato Chips,Strawberry Yogurt,Diet Soda,D Cell Batteries,Paper Towels,Mint Chocolate Bar,Salsa Dip,Buttered Popcorn,Cheese Crackers,Chocolate Bar,Rice Soup,Mouthwash,Sugar,Cheese Flavored Chips,Sweat Potatoes,Deodorant,Waffles,Decaf Coffee,Smoked Turkey Sliced,Screw Driver,Sesame Oil,...,Mixed Nuts,Chicken TV Dinner,Tissues,Garlic,Dried Fruit Mix,Cole Slaw,Donuts,Sliced Turkey,Sliced Chicken,Broccoli,Ranch Dip,Sponge,Frozen Corn,Paper Cups,Wheat Bread,Oven Cleaner,Tomato Sauce,Plastic Forks,Popcorn,Creamy Peanut Butter,Sweet Relish,Plain Muffins,Cheese Dip,Colby Cheese,Chicken Noodle Soup,Fingernail Clippers,Corned Beef,Lollipops,Plain White Bread,Blueberry Yogurt,Frozen Chicken Thighs,Mixed Vegetables,Souring Pads,Tuna Spread,Toilet Paper,White Wine,Columbian Coffee
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
7,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
10,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
11,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
12,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1350,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False
1356,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False
1358,False,False,False,True,True,True,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,True,False,True,True,False,False,True,False,False,False,False,False,False,False,True,False,...,False,False,False,True,True,False,False,False,True,False,False,True,False,False,True,True,True,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False
1359,True,False,False,False,True,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,True,False,...,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,True,False,False,False,False,False,False,True,False,True,False,True,False,False


## Frequent Itemset Mining

The data now is in the appropriate format. First, we generate frequent itemsets with support of at least 0.04 using the Apriori algorithm.

In [7]:
#Generate the frequent itemsets
frequent_itemsets = apriori(basket_filtered, min_support=0.04, use_colnames=True)\
                    .sort_values("support",ascending=False).reset_index(drop=True)
print("Frequent itemsets found: "+str(len(frequent_itemsets.index))+"\n")
frequent_itemsets

Frequent itemsets found: 1647



Unnamed: 0,support,itemsets
0,0.252046,( Eggs)
1,0.243863,( White Bread)
2,0.219313,( 2pct. Milk)
3,0.201309,( Potato Chips)
4,0.201309,( 98pct. Fat Free Hamburger)
...,...,...
1642,0.040917,"( White Bread, Jelly Filled Donuts)"
1643,0.040917,"( French Fries, Canned Tuna)"
1644,0.040917,"( White Bread, Cola, Pancake Mix)"
1645,0.040917,"( Domestic Beer, Canned Tuna)"


## Creating the Association Rules

Then, we generate association rules that have a confidence value over 0.9. We also sort the output according to their lift values.

In [8]:
#Apply association rules
assoc_rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.9)\
             .sort_values("lift",ascending=False).reset_index(drop=True)
print("Rules found: "+str(len(assoc_rules.index)))
assoc_rules

Rules found: 20


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"( White Bread, Wheat Bread, Bananas)",( 2pct. Milk),0.045827,0.219313,0.04419,0.964286,4.396855,0.03414,21.859247,0.809669
1,"( Potato Chips, Apples, Eggs)",( 2pct. Milk),0.04419,0.219313,0.042553,0.962963,4.390824,0.032862,21.07856,0.807956
2,"( Pepperoni Pizza - Frozen, Eggs, Toothpaste)",( 2pct. Milk),0.042553,0.219313,0.040917,0.961538,4.384328,0.031584,20.297872,0.806222
3,"( White Bread, Potato Chips, Eggs, Toothpaste)",( 2pct. Milk),0.047463,0.219313,0.04419,0.931034,4.245239,0.033781,11.319967,0.802533
4,"( White Bread, Eggs, Toothpaste, Wheat Bread)",( 2pct. Milk),0.04419,0.219313,0.040917,0.925926,4.221946,0.031225,10.53928,0.798425
5,"( Cola, Popcorn Salt, Eggs)",( 2pct. Milk),0.04419,0.219313,0.040917,0.925926,4.221946,0.031225,10.53928,0.798425
6,"( Eggs, Popcorn Salt, Toothpaste)",( 2pct. Milk),0.0491,0.219313,0.04419,0.9,4.103731,0.033422,7.806874,0.795372
7,"( Potato Chips, Toothpaste, Wheat Bread)",( 2pct. Milk),0.0491,0.219313,0.04419,0.9,4.103731,0.033422,7.806874,0.795372
8,"( 2pct. Milk, Toothpaste, Bananas)",( White Bread),0.04419,0.243863,0.042553,0.962963,3.948794,0.031777,20.415712,0.781283
9,"( Potato Chips, Wheat Bread, 98pct. Fat Free...",( White Bread),0.047463,0.243863,0.04419,0.931034,3.817866,0.032615,10.963993,0.77485


In [9]:
# print(assoc_rules[["antecedents",
#                    "consequents",
#                    "antecedent support",
#                    "consequent support",
#                    "support",
#                    "confidence",
#                    "lift",
#                    "conviction"]].head(20).to_latex())