# <b>Apriori Algorithm to Explore Complementary Items from Small Transactions </b> 

In this notebook, we are performing Apriori Algorithm to find the complementary items from small transactions. We define a small transaction to be a transaction containing <= 10 items. 

In [1]:
# import necessary libraries
import pandas as pd
from itertools import combinations
from operator import itemgetter
from time import time
import numpy as np
import itertools
import collections
from datetime import datetime

In [2]:
# read data and take a glance
df = pd.read_csv("../raw_data/ncr/items_transactions.csv")
df.head()

Unnamed: 0,global_transaction_id,item_id,dept_num,qty_sold,item_price,qty_is_weight,ticket_num,date,time_scanned
0,0,4889,6,3,99,0,2527,2020-07-01,07:03:30
1,0,3125,6,460,429,1,2527,2020-07-01,07:03:34
2,1,5,20,4,100,0,2528,2020-07-01,07:04:22
3,1,2,20,6,100,0,2528,2020-07-01,07:04:25
4,2,7013201045,1,1,299,0,2529,2020-07-01,07:05:06


In [3]:
# read data and take a glance
items_descriptions = pd.read_csv("../raw_data/ncr/items_descriptions.csv")
items_descriptions.head()

Unnamed: 0,item_id,description,ecomm_description,category,item_type,upc
0,1,PAN DULCE SENCILLO,"Mexican Sweet Bread/Pan Dulce Mexicano, 1 Count",20101020,0,10
1,2,BOLILLO FRENCH ROLLS,"Bolillo, French Rolls, 1 Count",20101210,0,20
2,3,BOLILLO QUESO/CHILE JALAP,"Jalapeño and Cheese Bolillo, 1 Count",20101210,0,30
3,4,EMPANADA,,20101020,0,40
4,5,MIni Bolillo,"BOLILLO SMALL, 2 OZ",20101210,0,50


## <b> 1. Data Preprocessing </b>
In this section, we first cleaned up the data. More specifically, we did the following tasks:
<ul>
<li>1.1 Remove all non food items</li>
<li>1.2 Reshape the dataframe to have one transaction per row</li>
<li>1.3 Extract the input data for the Apriori Algorithm</li>
</ul>

<b>1.1 Remove all non food items</b>

In [4]:
# filter out non food items
df = df[(df.item_id != 9492206955)] #plastic bag
df = df[(df.item_id != 5555)] #BAG EB&W CUSTOMER
df = df[(df.item_id != 9746490305)] #JUBILEE BIG ROLL PAPER TOWELS
df = df[(df.item_id != 74870310181)] #store_brand paper napkins

# remove items containing "CRV" in their name
items_descriptions["CRV?"] = items_descriptions["description"].apply(lambda x: "Yes" if "CRV" in x else "No")
CRVs = items_descriptions[items_descriptions["CRV?"]=="Yes"]
CRVs = CRVs["item_id"].tolist()

for CRV in CRVs:
    df = df[(df.item_id != CRV)]

<b>1.2 Reshape dataframe</b>

In [5]:
# groupby() to have one transaction per row
df2 = df.groupby("global_transaction_id")["item_id"].apply(lambda x: str(x))
df2_frame = df2.to_frame()

# initial version of new dataframe - needs some preprocessing
df2_frame.head()

Unnamed: 0_level_0,item_id
global_transaction_id,Unnamed: 1_level_1
0,"0 4889\n1 3125\nName: 0, dtype: int64"
1,"2 5\n3 2\nName: 1, dtype: int64"
2,"4 7013201045\n5 4856406700\nName: 2, dty..."
3,6 7084781116\n8 7084781116\n10 7084...
5,13 20698000000\n14 20900000000\n15 20...


In [6]:
list_of_lists = []
for index, value in df2.items():
    only_item_ids = []
    row2 = value.split('Name')[0]
    row3 = row2.split('\n')
    for element in row3:
        if element != '':
            length = len(element.split(' '))
            only_item_ids.append(element.split(' ')[length-1])
    list_of_lists.append(only_item_ids)

<b>1.3 Extract input data for Apriori</b>

In [7]:
def clean(x):
    alist = x.split(',')
    only_numbers = []
    for element in alist:
        if element != 'None':
            if element != '':
                only_numbers.append(element)
    only_numbers_string = ','.join(only_numbers)
    return only_numbers_string

In [8]:
table = pd.DataFrame(list_of_lists)

columns = [i for i in range(table.shape[1])]

table['items'] = table[columns].apply(lambda row: ','.join(row.values.astype(str)), axis=1)

table['items'] = table['items'].apply(lambda x: clean(x))

# items is the input data we need for the algorithm
data = pd.DataFrame(table['items'])

# remove all transactions which have more than 10 items
data["items list"] = data["items"].apply(lambda x: x.split(","))
data["remove?"] = data["items list"].apply(lambda x: "YES" if len(x) > 10 else "NO")
data_small_transactions = data[data["remove?"] == "NO"].reset_index()

# store this number
num_small_transactions = data_small_transactions.shape[0]

# 88,377 transactions with <= 10 items
print("Number of small transactions: "+str(num_small_transactions))

data = pd.DataFrame(data_small_transactions["items"])
data.head()

Number of small transactions: 88377


Unnamed: 0,items
0,48893125
1,52
2,70132010454856406700
3,7084781116708478111670847811167432309164
4,"20698000000,20900000000,20900000000,2090000000..."


In [9]:
data_list = []
for i in range(len(data['items'])):
    data_list.append(data['items'][i].split(','))

# the data is ready for the apriori algorithm, and we save the data for future use
input_file='Apriori_input_small_transactions.txt'
with open(input_file,'w') as output:
    for key in data_list:
        output.write(' '.join([str(item) for item in key])+'\n')

## <b> 2. Apriori Algorithm </b>
In this section, we implement the Apriori Algorithm to obtain the frequent itemsets. An itemset is frequent if its support count is greater than or equal to a threshold. We sort the single items from largest to smallest support count. The minimum support count among the top 10 percent of single items is our threshold. 

In [10]:
db = []
db_f = []
temp = {}
candidate = []  # list of candidate after join
frequent_Item = []
frequent_Item_sorted = {}
result_list = {}

# read in the data and get all the single items with higher support count than the min_support
def construct_L1(file):
    filter = {}
    count = {}
    file = open(file, 'r')
    for line in file:  # each line is a transaction
        data = set()
        for key in line.strip().split(' '):  # get all the elements in the file
            if key not in count.keys():
                count[key] = 1
            else:
                count[key] += 1
            if count[key] >= min_support_count:
                filter[int(key)] = count[key]
            data.add(int(key))
        db.append(data)
    frequent_Item.append(filter)  # items in L1 stored as a dict in the frequent item list


# filter the data by removing the set of transaction that does not intersect with the qualified items
def filter_db():
    global db
    global candidate
    global temp
    keys = sorted(frequent_Item[0].keys())
    for transaction in db:
        if len(transaction.intersection(keys)) != 0:
            db_f.append(sorted(transaction.intersection(keys)))  # now we get filtered data and are ready for next step
    candidate = list(itertools.combinations(sorted(frequent_Item[0].keys()), 2))  # construct C2
    for key in candidate:
        temp[key] = 0
    for transaction in db_f:
        pool = list(itertools.combinations(transaction, 2))
        for combo in pool:
            combo = sorted(combo)
            combo = tuple(combo)
            if temp.get(combo) is not None:
                temp[combo] += 1


def construct_Lk():
    global temp
    global Lk
    Lk = {}
    for key in temp:
        if temp.get(key) >= min_support_count:
            Lk[key] = temp[key]
    frequent_Item.append(Lk)


# remove the infrequent items in each transaction
def clean(size):
    global db_f
    keys = list(frequent_Item[size-1])
    if len(keys) != 0:
        uni = set(keys[0]).union(keys[1])
        for i in range(len(keys)-2):
            uni = uni.union(keys[i+2])
        for j in range(len(db_f)):
            db_f[j] = set(db_f[j]).intersection(uni)


def aprioriGen(k):
    del candidate[:] # ready for new candidate list
    temp.clear()
    temp_can = []
    keys = list(frequent_Item[k-2])
    for x in range(len(keys)):
        for y in range(x + 1, len(keys)):
            L1 = list(keys[x])[:k - 2]
            L2 = list(keys[y])[:k - 2]
            if L1 == L2:
                union = set(keys[x]).union(keys[y])  # set union
                temp_can.append(tuple(sorted(union)))
    for next_can in temp_can:
        can_list = itertools.combinations(next_can, k - 1)  # subset generated from the candidate list for next level
        for can in can_list:
            if can not in keys:
                break
        candidate.append(next_can)
    for key in candidate:
        temp[key] = 0
    for transaction in db_f:
        pool = list(itertools.combinations(transaction, k))
        for combo in pool:
            combo = sorted(combo)
            combo = tuple(combo)
            if temp.get(combo) is not None:
                temp[combo] += 1


def sort():
    global frequent_Item_sorted
    global result_list
    keys = frequent_Item[0]
    for key in keys:
        tup = (key,)
        frequent_Item_sorted[tup] = keys.get(key)
    for i in range(len(frequent_Item)-1):
        frequent_Item_sorted.update(frequent_Item[i + 1])
    sorted(frequent_Item_sorted)
    result_list = collections.OrderedDict(sorted(frequent_Item_sorted.items()))


def result(file):
    with open(file,'w') as output:
        for key in result_list:
            output.write(' '.join([str(item) for item in key])+' ({:d})'.format(frequent_Item_sorted[key])+'\n')

In [11]:
# determine the minimum support count of the top 10 percent frequent single items

single_items = (data['items'].str.split(",", expand=True))\
        .apply(pd.value_counts).sum(axis=1).dropna()

single_items_df = pd.DataFrame(single_items,columns=['count'])

# calculate the percentage of count of each single item
single_items_df['count_percentage'] = single_items_df['count'].apply(lambda x: x/num_small_transactions)

# sort the data frame by the percentage of count in descending order
single_items_df_2 = single_items_df.sort_values(by=['count_percentage'], ascending=False)

# get top 10 percent 
single_items_df_3 = single_items_df_2[:round(.1*single_items_df_2.shape[0])]

# min_support_count = 70
min_support_count = int(min(single_items_df_3['count']))
min_support_count

70

In [12]:
time = datetime.now()
global input
input = input_file
output = 'filtered_result_70.txt'

size = 2
construct_L1(input)
filter_db()
while len(candidate) > 0:
    construct_Lk()  # find L set and store to frequent item list
    clean(size)     # clean database
    size += 1
    aprioriGen(size)    # generate next level candidate list
sort()
result(output)
print((datetime.now() - time).total_seconds())

7.109055


In [13]:
file = output
frequent_set = []
support_count = []
set_size = []
file = open(file, 'r')
for line in file:  # each line is a transaction
    temp = line.split(' ')
    if len(temp[:-1]) == 1:
        frequent_set.append(int(temp[:-1][0]))
    else:
        frequent_set.append(tuple(([int(x) for x in temp[:-1]])))
    support_count.append(int(temp[-1][1:-2]))
    set_size.append(len(temp)-1)

In [14]:
data = {'frequent set':frequent_set,'set size':set_size, 'support count':support_count}
df3 = pd.DataFrame(data)
df3.head()

Unnamed: 0,frequent set,set size,support count
0,1,1,4539
1,"(1, 2)",2,1033
2,"(1, 2, 17)",3,254
3,"(1, 2, 18)",3,88
4,"(1, 2, 4011)",3,131


In [15]:
df3['set size'].value_counts()

1    894
2    447
3     67
4      3
Name: set size, dtype: int64

## <b> 3. Results </b>
Here we take the frequent itemsets and calculate lifts for all possible association rules for frequent itemsets of 2, 3, 4, and 5. <br>
<br>
lift({X}->{Y}) = lift({Y}->{X}) = P(X,Y)/(P(X)P(Y)) <br>
<br>
A lift close to 1 means that the occurence of X and the occurence of Y in the same transaction are independent events. 
So, we use a threshold of 1.5 and identify all association rules with lifts >= 1.5 as strong associations rules.

In [16]:
def calculate_lift(both_sides, left_side, right_side):
    percent_transactions_containing_both = int(df3[df3["frequent set"] == both_sides]["support count"])/int(num_small_transactions)
    percent_transactions_containing_left_side = int(df3[df3["frequent set"] == left_side]["support count"])/int(num_small_transactions)
    percent_transactions_containing_right_side = int(df3[df3["frequent set"] == right_side]["support count"])/int(num_small_transactions)
    lift = (percent_transactions_containing_both)/(percent_transactions_containing_left_side*percent_transactions_containing_right_side)
    return lift

In [17]:
# calculate lifts for all possible association rules for frequent itemsets of 2, 3, 4, and 5

left_sides = []
right_sides = []
sizes = []
lifts = []
for index, row in df3.iterrows():
    
    if row["set size"] == 2:
        atuple = row["frequent set"]
        if atuple[0] not in items_descriptions["item_id"].tolist():
            print(atuple[0])
            pass
        if atuple[1] not in items_descriptions["item_id"].tolist():
            print(atuple[1])
            pass
        if atuple[0] in items_descriptions["item_id"].tolist():
            if atuple[1] in items_descriptions["item_id"].tolist():
                name_0 = items_descriptions.loc[items_descriptions['item_id']==atuple[0],'description'].iloc[0].lower()
                name_1 = items_descriptions.loc[items_descriptions['item_id']==atuple[1],'description'].iloc[0].lower()
                itemset = (name_0, name_1)
                #0->1 and 1->0
                left_sides.append(name_0)
                right_sides.append(name_1)
                sizes.append(2)
                lifts.append(calculate_lift(atuple, atuple[0], atuple[1]))
        
    elif row["set size"] == 3:
        atuple = row["frequent set"]
        name_0 = items_descriptions.loc[items_descriptions['item_id']==atuple[0],'description'].iloc[0].lower()
        name_1 = items_descriptions.loc[items_descriptions['item_id']==atuple[1],'description'].iloc[0].lower()
        name_2 = items_descriptions.loc[items_descriptions['item_id']==atuple[2],'description'].iloc[0].lower()
        itemset = (name_0, name_1, name_2)
        #0->1,2 and 1,2->0
        left_sides.append(name_0)
        right_sides.append((name_1, name_2))
        sizes.append(3)
        lifts.append(calculate_lift(atuple, atuple[0], (atuple[1],atuple[2])))
        #1->0,2 and 0,2->1
        left_sides.append(name_1)
        right_sides.append((name_0, name_2))
        sizes.append(3)
        lifts.append(calculate_lift(atuple, atuple[1], (atuple[0],atuple[2])))
        #2->0,1 and 0,1->2
        left_sides.append(name_2)
        right_sides.append((name_0, name_1))
        sizes.append(3)
        lifts.append(calculate_lift(atuple, atuple[2], (atuple[0],atuple[1])))
        
    elif row["set size"] == 4:
        atuple = row["frequent set"]
        name_0 = items_descriptions.loc[items_descriptions['item_id']==atuple[0],'description'].iloc[0].lower()
        name_1 = items_descriptions.loc[items_descriptions['item_id']==atuple[1],'description'].iloc[0].lower()
        name_2 = items_descriptions.loc[items_descriptions['item_id']==atuple[2],'description'].iloc[0].lower()
        name_3 = items_descriptions.loc[items_descriptions['item_id']==atuple[3],'description'].iloc[0].lower()
        itemset = (name_0, name_1, name_2, name_3)
        #0->1,2,3 and 1,2,3->0
        left_sides.append(name_0)
        right_sides.append((name_1, name_2, name_3))
        sizes.append(4)
        lifts.append(calculate_lift(atuple, atuple[0], (atuple[1],atuple[2],atuple[3])))
        #1->0,2,3 and 0,2,3->1
        left_sides.append(name_1)
        right_sides.append((name_0, name_2, name_3))
        sizes.append(4)
        lifts.append(calculate_lift(atuple, atuple[1], (atuple[0],atuple[2],atuple[3])))
        #2->0,1,3 and 0,1,3->2
        left_sides.append(name_2)
        right_sides.append((name_0, name_1, name_3))
        sizes.append(4)
        lifts.append(calculate_lift(atuple, atuple[2], (atuple[0],atuple[1],atuple[3])))
        #3->0,1,2 and 0,1,2->3
        left_sides.append(name_3)
        right_sides.append((name_0, name_1, name_2))
        sizes.append(4)
        lifts.append(calculate_lift(atuple, atuple[3], (atuple[0],atuple[1],atuple[2])))
        #0,1->2,3 and 2,3->0,1
        left_sides.append((name_0, name_1))
        right_sides.append((name_2, name_3))
        sizes.append(4)
        lifts.append(calculate_lift(atuple, (atuple[0],atuple[1]), (atuple[2],atuple[3])))
        #0,2->1,3 and 1,3->0,2
        left_sides.append((name_0, name_2))
        right_sides.append((name_1, name_3))
        sizes.append(4)
        lifts.append(calculate_lift(atuple, (atuple[0],atuple[2]), (atuple[1],atuple[3])))
        #0,3->1,2 and 1,2->0,3
        left_sides.append((name_0, name_3))
        right_sides.append((name_1, name_2))
        sizes.append(4)
        lifts.append(calculate_lift(atuple, (atuple[0],atuple[3]), (atuple[1],atuple[2])))
        
    elif row["set size"] == 5:
        atuple = row["frequent set"]
        name_0 = items_descriptions.loc[items_descriptions['item_id']==atuple[0],'description'].iloc[0].lower()
        name_1 = items_descriptions.loc[items_descriptions['item_id']==atuple[1],'description'].iloc[0].lower()
        name_2 = items_descriptions.loc[items_descriptions['item_id']==atuple[2],'description'].iloc[0].lower()
        name_3 = items_descriptions.loc[items_descriptions['item_id']==atuple[3],'description'].iloc[0].lower()
        name_4 = items_descriptions.loc[items_descriptions['item_id']==atuple[4],'description'].iloc[0].lower()
        itemset = (name_0, name_1, name_2, name_3, name_4)
        #0->1,2,3,4 and 1,2,3,4->0
        left_sides.append(name_0)
        right_sides.append((name_1, name_2, name_3, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, atuple[0], (atuple[1],atuple[2],atuple[3],atuple[4])))
        #1->0,2,3,4 and 0,2,3,4->1
        left_sides.append(name_1)
        right_sides.append((name_0, name_2, name_3, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, atuple[1], (atuple[0],atuple[2],atuple[3],atuple[4])))
        #2->0,1,3,4 and 0,1,3,4->2
        left_sides.append(name_2)
        right_sides.append((name_0, name_1, name_3, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, atuple[2], (atuple[0],atuple[1],atuple[3],atuple[4])))
        #3->0,1,2,4 and 0,1,2,4->3
        left_sides.append(name_3)
        right_sides.append((name_0, name_1, name_2, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, atuple[3], (atuple[0],atuple[1],atuple[2],atuple[4])))
        #4->0,1,2,3 and 0,1,2,3->4
        left_sides.append(name_4)
        right_sides.append((name_0, name_1, name_2, name_3))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, atuple[4], (atuple[0],atuple[1],atuple[2],atuple[3])))
        #0,1->2,3,4 and 2,3,4->0,1
        left_sides.append((name_0, name_1))
        right_sides.append((name_2, name_3, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[0],atuple[1]), (atuple[2],atuple[3],atuple[4])))
        #0,2->1,3,4 and 1,3,4->0,2
        left_sides.append((name_0, name_2))
        right_sides.append((name_1, name_3, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[0],atuple[2]), (atuple[1],atuple[3],atuple[4])))
        #0,3->1,2,4 and 1,2,4->0,3
        left_sides.append((name_0, name_3))
        right_sides.append((name_1, name_2, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[0],atuple[3]), (atuple[1],atuple[2],atuple[4])))
        #0,4->1,2,3 and 1,2,3->0,4
        left_sides.append((name_0, name_4))
        right_sides.append((name_1, name_2, name_3))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[0],atuple[4]), (atuple[1],atuple[2],atuple[3])))
        #1,2->0,3,4 and 0,3,4->1,2
        left_sides.append((name_1, name_2))
        right_sides.append((name_0, name_3, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[1],atuple[2]), (atuple[0],atuple[3],atuple[4])))
        #1,3->0,2,4 and 0,2,4->1,3
        left_sides.append((name_1, name_3))
        right_sides.append((name_0, name_2, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[1],atuple[3]), (atuple[0],atuple[2],atuple[4])))
        #1,4->0,2,3 and 0,2,3->1,4
        left_sides.append((name_1, name_4))
        right_sides.append((name_0, name_2, name_3))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[1],atuple[4]), (atuple[0],atuple[2],atuple[3])))
        #2,3->0,1,4 and 0,1,4->2,3
        left_sides.append((name_2, name_3))
        right_sides.append((name_0, name_1, name_4))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[2],atuple[3]), (atuple[0],atuple[1],atuple[4])))
        #2,4->0,1,3 and 0,1,3->2,4
        left_sides.append((name_2, name_4))
        right_sides.append((name_0, name_1, name_3))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[2],atuple[4]), (atuple[0],atuple[1],atuple[3])))
        #3,4->0,1,2 and 0,1,2->3,4
        left_sides.append((name_3, name_4))
        right_sides.append((name_0, name_1, name_2))
        sizes.append(5)
        lifts.append(calculate_lift(atuple, (atuple[3],atuple[4]), (atuple[0],atuple[1],atuple[2])))

    else:
        pass

In [18]:
results = pd.DataFrame({'left side': left_sides, 'right side': right_sides, 'set size': sizes, 'lift': lifts})
results.head()

Unnamed: 0,left side,right side,set size,lift
0,pan dulce sencillo,bolillo french rolls,2,1.94989
1,pan dulce sencillo,"(bolillo french rolls, premium sweet bread)",3,9.474194
2,bolillo french rolls,"(pan dulce sencillo, premium sweet bread)",3,2.064729
3,premium sweet bread,"(pan dulce sencillo, bolillo french rolls)",3,9.088518
4,pan dulce sencillo,"(bolillo french rolls, pan dulce figura/tapado)",3,11.422745


In [19]:
strong_associations = results[results["lift"]>=1.5]
strong_associations.sort_values(by="lift", ascending=False, inplace=True)
# 490 strong associations
strong_associations.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


(490, 4)

In [20]:
strong_associations.head(10)

Unnamed: 0,left side,right side,set size,lift
550,*pepper- bell red,*pepper bell yellow,2,180.592754
449,*peppers-bell green,*pepper- bell red,2,82.905253
639,costillas de puerco/spareribs,trocitos frescos de puerco,2,48.845872
616,*squash chayote,*squash-mexican,2,44.087182
595,*carrots loose,*squash chayote,2,34.625853
665,wedges fries/papas fritas,chicken tenders,2,27.474753
647,frijoles fritos/refried beans,arroz/fried rice,2,22.926352
636,beef marinated flap meat,costillas de res para asar,2,22.472028
666,ceviche de camaron,camaron aguachile verde,2,21.691702
637,beef marinated flap meat,prna pllo adbda/mrntd leg meat,2,20.736466


In [21]:
# extract strong associations from itemsets of 2
strong_associations_itemsets_2 = strong_associations[strong_associations["set size"] == 2]
# 282 strong associations from itemsets of 2
strong_associations_itemsets_2.shape

(282, 4)

In [22]:
# extract strong associations from itemsets of 3
strong_associations_itemsets_3 = strong_associations[strong_associations["set size"] == 3]
# 187 strong associations from itemsets of 3
strong_associations_itemsets_3.shape

(187, 4)

In [23]:
# extract strong associations from itemsets of 4
strong_associations_itemsets_4 = strong_associations[strong_associations["set size"] == 4]
# 21 strong associations from itemsets of 4
strong_associations_itemsets_4.shape

(21, 4)

In [24]:
# extract strong associations from itemsets of 5
strong_associations_itemsets_5 = strong_associations[strong_associations["set size"] == 5]
# 0 strong associations from itemsets of 5
strong_associations_itemsets_5.shape

(0, 4)