### SDSC3002 Data Mining - Apriori Algorithm Assignment

This notebook is an assignment for **SDSC3002 Data Mining**. The objective is to find frequent itemsets using the **Apriori algorithm**. 

We need to implement an **acceleration method** and compare its performance against the original method in terms of execution time. The comparison will be conducted by measuring the time taken to find frequent itemsets that meet a **minimum support threshold** ranging from **0.0001 to 0.0005**, while varying the **K** value (size of the itemsets).

In [None]:
import pandas as pd
import numpy as np

### For time calculation:

In [None]:
from time import process_time

### Install and import the Apriori library

In [None]:
#pip install efficient_apriori
from efficient_apriori import apriori
from efficient_apriori import itemsets_from_transactions

### Transform data:

In [None]:
search_text = ","
replace_text = " "
with open(r"Path where the Text file is stored\File name.txt","r") as file:
    messy_data = file.read()
    cleaned_data = messy_data.replace(search_text, replace_text)

In [4]:
with open(r"Path where the Text file is stored\File name.txt","w") as file:
    file.write(cleaned_data)

### Load data with pandas dataframe

In [None]:

df1 = pd.read_csv("Path where the Text file is stored\File name.txt", header = None)
df2 = df1[0].str.split(" ",expand = True)

#Skip O to avoid confusion
s = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','P']

df2.columns = s
df2

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,P
0,0,1,2,,,,,,,,,,,,
1,3,,,,,,,,,,,,,,
2,4,3,,,,,,,,,,,,,
3,5,6,7,8,0,9,10,,,,,,,,
4,11,12,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
668328,822,866,,,,,,,,,,,,,
668329,775,,,,,,,,,,,,,,
668330,762,,,,,,,,,,,,,,
668331,775,,,,,,,,,,,,,,


## Data Preprocessing

### Change data into list form:

In [None]:
records = []
for i in range(0,len(df2)):
    records.append(
        [
        str(df2.values[i,j])
        for j in range(0,len(df2.columns))
        ]
    )

### Remove "None" data in the list:

In [7]:
for i, j in enumerate(records):
    while "None" in records[i]:
        records[i].remove("None")

In the Apriori algorithm, "K" represents the size of the itemsets being considered in each iteration.

For example:
- K=1 refers to individual items (e.g., {Milk}, {Bread}).
- K=2 refers to pairs of items (e.g., {Milk, Bread}).
- K=3 refers to triplets (e.g., {Milk, Bread, Eggs}).

The algorithm iteratively increases K until no larger frequent itemsets meet the minimum support threshold.

#### ***Original Method***

### Under setting of min_support = 0.0001, find all frequent itemset.

In [9]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0001, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start) 
print("Elapsed time during the whole program in seconds:", t1_stop - t1_start)

t1_time = t1_stop-t1_start

itemsets, _ = itemsets_from_transactions(records, min_support = 0.0001)

Elapsed time: 55.9375 29.765625
Elapsed time during the whole program in seconds: 26.171875


In [10]:
#The whole itemset
itemsets

{1: {('0',): 48863,
  ('1',): 100762,
  ('2',): 199,
  ('3',): 42471,
  ('4',): 10876,
  ('5',): 4157,
  ('6',): 2314,
  ('7',): 5922,
  ('8',): 8424,
  ('9',): 533,
  ('10',): 493,
  ('11',): 9252,
  ('12',): 5279,
  ('13',): 21425,
  ('14',): 3032,
  ('15',): 38893,
  ('16',): 11175,
  ('17',): 27739,
  ('18',): 5556,
  ('19',): 24971,
  ('20',): 777,
  ('21',): 54513,
  ('22',): 19701,
  ('23',): 23770,
  ('24',): 22835,
  ('25',): 2528,
  ('26',): 276,
  ('27',): 4886,
  ('28',): 4633,
  ('29',): 392,
  ('30',): 14188,
  ('31',): 9921,
  ('32',): 7503,
  ('33',): 45942,
  ('34',): 22339,
  ('35',): 28139,
  ('36',): 10734,
  ('37',): 3565,
  ('38',): 6249,
  ('39',): 8126,
  ('40',): 2595,
  ('41',): 1013,
  ('42',): 12144,
  ('43',): 4261,
  ('44',): 8754,
  ('45',): 8616,
  ('46',): 456,
  ('47',): 1004,
  ('48',): 2775,
  ('49',): 8150,
  ('50',): 13448,
  ('51',): 5158,
  ('52',): 4772,
  ('53',): 4536,
  ('54',): 295,
  ('55',): 598,
  ('56',): 539,
  ('57',): 6474,
  ('58',):

In [11]:
# Print output for verification
print("There are at most",len(itemsets))

There are at most 5


In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4]) + len(itemsets[5])
print("There are in total",x,"itemset could be generate with minimum support = 0.0001, and k vary from 1 to 5")

There are in total 18694 itemset could be generate with minimum support = 0.0001, and k vary from 1 to 5


### Number of size k frequent patterns for each size k with at least one frequent pattern:

In [None]:
# k = 1
itemsets[1]

{('0',): 48863,
 ('1',): 100762,
 ('2',): 199,
 ('3',): 42471,
 ('4',): 10876,
 ('5',): 4157,
 ('6',): 2314,
 ('7',): 5922,
 ('8',): 8424,
 ('9',): 533,
 ('10',): 493,
 ('11',): 9252,
 ('12',): 5279,
 ('13',): 21425,
 ('14',): 3032,
 ('15',): 38893,
 ('16',): 11175,
 ('17',): 27739,
 ('18',): 5556,
 ('19',): 24971,
 ('20',): 777,
 ('21',): 54513,
 ('22',): 19701,
 ('23',): 23770,
 ('24',): 22835,
 ('25',): 2528,
 ('26',): 276,
 ('27',): 4886,
 ('28',): 4633,
 ('29',): 392,
 ('30',): 14188,
 ('31',): 9921,
 ('32',): 7503,
 ('33',): 45942,
 ('34',): 22339,
 ('35',): 28139,
 ('36',): 10734,
 ('37',): 3565,
 ('38',): 6249,
 ('39',): 8126,
 ('40',): 2595,
 ('41',): 1013,
 ('42',): 12144,
 ('43',): 4261,
 ('44',): 8754,
 ('45',): 8616,
 ('46',): 456,
 ('47',): 1004,
 ('48',): 2775,
 ('49',): 8150,
 ('50',): 13448,
 ('51',): 5158,
 ('52',): 4772,
 ('53',): 4536,
 ('54',): 295,
 ('55',): 598,
 ('56',): 539,
 ('57',): 6474,
 ('58',): 17811,
 ('59',): 66523,
 ('60',): 8439,
 ('61',): 2619,
 ('62

In [None]:
print("There are",len(itemsets[1]),"frequent itemset could be generate with minimum support = 0.0001, and k = 1")

There are 733 frequent itemset could be generate with minimum support = 0.0001, and k = 1


In [None]:
# k = 2
itemsets[2]

{('0', '1'): 9823,
 ('0', '10'): 72,
 ('0', '100'): 139,
 ('0', '101'): 4210,
 ('0', '102'): 325,
 ('0', '103'): 217,
 ('0', '104'): 552,
 ('0', '105'): 174,
 ('0', '106'): 113,
 ('0', '11'): 598,
 ('0', '110'): 1753,
 ('0', '111'): 444,
 ('0', '114'): 496,
 ('0', '117'): 1143,
 ('0', '118'): 105,
 ('0', '12'): 209,
 ('0', '120'): 153,
 ('0', '121'): 265,
 ('0', '122'): 1013,
 ('0', '123'): 225,
 ('0', '124'): 318,
 ('0', '126'): 74,
 ('0', '127'): 80,
 ('0', '13'): 2319,
 ('0', '132'): 113,
 ('0', '134'): 578,
 ('0', '135'): 181,
 ('0', '136'): 181,
 ('0', '138'): 210,
 ('0', '14'): 71,
 ('0', '140'): 441,
 ('0', '141'): 120,
 ('0', '142'): 182,
 ('0', '143'): 762,
 ('0', '144'): 400,
 ('0', '145'): 399,
 ('0', '146'): 820,
 ('0', '147'): 189,
 ('0', '148'): 181,
 ('0', '149'): 104,
 ('0', '15'): 2949,
 ('0', '150'): 325,
 ('0', '151'): 359,
 ('0', '152'): 192,
 ('0', '155'): 68,
 ('0', '156'): 105,
 ('0', '157'): 329,
 ('0', '158'): 255,
 ('0', '16'): 702,
 ('0', '160'): 366,
 ('0', 

In [None]:
print("There are",len(itemsets[2]),"frequent itemset could be generate with minimum support = 0.0001, and k = 2")

There are 8133 frequent itemset could be generate with minimum support = 0.0001, and k = 2


In [None]:
# k = 3
itemsets[3]

{('0', '1', '101'): 924,
 ('0', '1', '104'): 110,
 ('0', '1', '11'): 108,
 ('0', '1', '110'): 371,
 ('0', '1', '111'): 89,
 ('0', '1', '114'): 79,
 ('0', '1', '117'): 245,
 ('0', '1', '122'): 214,
 ('0', '1', '124'): 82,
 ('0', '1', '13'): 425,
 ('0', '1', '134'): 79,
 ('0', '1', '140'): 81,
 ('0', '1', '143'): 178,
 ('0', '1', '144'): 90,
 ('0', '1', '145'): 101,
 ('0', '1', '146'): 190,
 ('0', '1', '15'): 551,
 ('0', '1', '150'): 72,
 ('0', '1', '157'): 84,
 ('0', '1', '158'): 67,
 ('0', '1', '16'): 154,
 ('0', '1', '160'): 67,
 ('0', '1', '17'): 309,
 ('0', '1', '170'): 115,
 ('0', '1', '180'): 71,
 ('0', '1', '19'): 631,
 ('0', '1', '194'): 119,
 ('0', '1', '195'): 108,
 ('0', '1', '21'): 2548,
 ('0', '1', '22'): 502,
 ('0', '1', '220'): 84,
 ('0', '1', '23'): 676,
 ('0', '1', '234'): 196,
 ('0', '1', '24'): 225,
 ('0', '1', '243'): 99,
 ('0', '1', '248'): 92,
 ('0', '1', '260'): 101,
 ('0', '1', '27'): 203,
 ('0', '1', '28'): 125,
 ('0', '1', '295'): 72,
 ('0', '1', '3'): 441,
 ('

In [None]:
print("There are",len(itemsets[3]),"frequent itemset could be generate with minimum support = 0.0001, and k = 3")

There are 7998 frequent itemset could be generate with minimum support = 0.0001, and k = 3


In [None]:
# k = 4
itemsets[4]

{('0', '1', '101', '21'): 309,
 ('0', '1', '101', '22'): 67,
 ('0', '1', '101', '23'): 80,
 ('0', '1', '101', '33'): 196,
 ('0', '1', '101', '34'): 200,
 ('0', '1', '101', '58'): 89,
 ('0', '1', '101', '59'): 199,
 ('0', '1', '101', '64'): 68,
 ('0', '1', '101', '96'): 67,
 ('0', '1', '110', '21'): 109,
 ('0', '1', '110', '33'): 67,
 ('0', '1', '110', '59'): 75,
 ('0', '1', '117', '35'): 70,
 ('0', '1', '13', '15'): 77,
 ('0', '1', '13', '21'): 101,
 ('0', '1', '13', '59'): 95,
 ('0', '1', '15', '17'): 67,
 ('0', '1', '15', '21'): 117,
 ('0', '1', '15', '33'): 110,
 ('0', '1', '15', '34'): 78,
 ('0', '1', '15', '35'): 78,
 ('0', '1', '15', '59'): 121,
 ('0', '1', '17', '33'): 71,
 ('0', '1', '19', '21'): 152,
 ('0', '1', '19', '23'): 129,
 ('0', '1', '19', '33'): 163,
 ('0', '1', '19', '34'): 72,
 ('0', '1', '19', '59'): 145,
 ('0', '1', '21', '22'): 129,
 ('0', '1', '21', '23'): 171,
 ('0', '1', '21', '3'): 118,
 ('0', '1', '21', '33'): 495,
 ('0', '1', '21', '34'): 462,
 ('0', '1', '

In [None]:
print("There are",len(itemsets[4]),"frequent itemset could be generate with minimum support = 0.0001, and k = 4")

There are 1740 frequent itemset could be generate with minimum support = 0.0001, and k = 4


In [None]:
# k = 5
itemsets[5]

{('0', '1', '101', '21', '33'): 71,
 ('0', '1', '101', '21', '34'): 80,
 ('0', '1', '21', '23', '33'): 67,
 ('0', '1', '21', '33', '34'): 106,
 ('0', '1', '21', '33', '58'): 111,
 ('0', '1', '21', '33', '59'): 144,
 ('0', '1', '21', '34', '59'): 109,
 ('0', '1', '21', '58', '59'): 67,
 ('0', '1', '23', '33', '59'): 73,
 ('0', '1', '33', '34', '59'): 86,
 ('0', '1', '33', '58', '59'): 102,
 ('1', '19', '21', '23', '33'): 82,
 ('1', '19', '21', '23', '59'): 90,
 ('1', '19', '21', '33', '59'): 83,
 ('1', '19', '23', '33', '59'): 100,
 ('1', '21', '22', '23', '33'): 77,
 ('1', '21', '22', '23', '59'): 75,
 ('1', '21', '22', '33', '59'): 68,
 ('1', '21', '23', '33', '58'): 80,
 ('1', '21', '23', '33', '59'): 128,
 ('1', '21', '33', '34', '58'): 96,
 ('1', '21', '33', '34', '59'): 115,
 ('1', '21', '33', '4', '59'): 70,
 ('1', '21', '33', '58', '59'): 146,
 ('1', '21', '33', '59', '66'): 80,
 ('1', '21', '33', '59', '96'): 67,
 ('1', '21', '34', '4', '59'): 72,
 ('1', '22', '23', '33', '58')

In [None]:
print("There are",len(itemsets[5]),"frequent itemset could be generate with minimum support = 0.0001, and k = 5")

There are 90 frequent itemset could be generate with minimum support = 0.0001, and k = 5


### Under setting of minimum support = 0.0002, find all frequent itemset.

In [23]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0002, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start)
print("Elapsed time during the whole program in seconds:", t1_stop - t1_start)

t2_time = t1_stop - t1_start

itemsets, _ = itemsets_from_transactions(records, min_support = 0.0002)

Elapsed time: 91.046875 76.765625
Elapsed time during the whole program in seconds: 14.28125


In [24]:
#The whole itemset
itemsets

{1: {('0',): 48863,
  ('1',): 100762,
  ('2',): 199,
  ('3',): 42471,
  ('4',): 10876,
  ('5',): 4157,
  ('6',): 2314,
  ('7',): 5922,
  ('8',): 8424,
  ('9',): 533,
  ('10',): 493,
  ('11',): 9252,
  ('12',): 5279,
  ('13',): 21425,
  ('14',): 3032,
  ('15',): 38893,
  ('16',): 11175,
  ('17',): 27739,
  ('18',): 5556,
  ('19',): 24971,
  ('20',): 777,
  ('21',): 54513,
  ('22',): 19701,
  ('23',): 23770,
  ('24',): 22835,
  ('25',): 2528,
  ('26',): 276,
  ('27',): 4886,
  ('28',): 4633,
  ('29',): 392,
  ('30',): 14188,
  ('31',): 9921,
  ('32',): 7503,
  ('33',): 45942,
  ('34',): 22339,
  ('35',): 28139,
  ('36',): 10734,
  ('37',): 3565,
  ('38',): 6249,
  ('39',): 8126,
  ('40',): 2595,
  ('41',): 1013,
  ('42',): 12144,
  ('43',): 4261,
  ('44',): 8754,
  ('45',): 8616,
  ('46',): 456,
  ('47',): 1004,
  ('48',): 2775,
  ('49',): 8150,
  ('50',): 13448,
  ('51',): 5158,
  ('52',): 4772,
  ('53',): 4536,
  ('54',): 295,
  ('55',): 598,
  ('56',): 539,
  ('57',): 6474,
  ('58',):

In [25]:
# Print output for verification
print("There are at most",len(itemsets))

There are at most 5


In [None]:

x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4]) + len(itemsets[5])
print("There are in total",x,"itemset could be generate with minimum support = 0.0002, and k vary from 1 to 5")

There are in total 8219 itemset could be generate with minimum support = 0.0002, and k vary from 1 to 5


### Number of size k frequent patterns for each size k with at least one frequent pattern:

In [None]:

# k = 1
itemsets[1]

{('0',): 48863,
 ('1',): 100762,
 ('2',): 199,
 ('3',): 42471,
 ('4',): 10876,
 ('5',): 4157,
 ('6',): 2314,
 ('7',): 5922,
 ('8',): 8424,
 ('9',): 533,
 ('10',): 493,
 ('11',): 9252,
 ('12',): 5279,
 ('13',): 21425,
 ('14',): 3032,
 ('15',): 38893,
 ('16',): 11175,
 ('17',): 27739,
 ('18',): 5556,
 ('19',): 24971,
 ('20',): 777,
 ('21',): 54513,
 ('22',): 19701,
 ('23',): 23770,
 ('24',): 22835,
 ('25',): 2528,
 ('26',): 276,
 ('27',): 4886,
 ('28',): 4633,
 ('29',): 392,
 ('30',): 14188,
 ('31',): 9921,
 ('32',): 7503,
 ('33',): 45942,
 ('34',): 22339,
 ('35',): 28139,
 ('36',): 10734,
 ('37',): 3565,
 ('38',): 6249,
 ('39',): 8126,
 ('40',): 2595,
 ('41',): 1013,
 ('42',): 12144,
 ('43',): 4261,
 ('44',): 8754,
 ('45',): 8616,
 ('46',): 456,
 ('47',): 1004,
 ('48',): 2775,
 ('49',): 8150,
 ('50',): 13448,
 ('51',): 5158,
 ('52',): 4772,
 ('53',): 4536,
 ('54',): 295,
 ('55',): 598,
 ('56',): 539,
 ('57',): 6474,
 ('58',): 17811,
 ('59',): 66523,
 ('60',): 8439,
 ('61',): 2619,
 ('62

In [None]:
print("There are",len(itemsets[1]),"frequent itemset could be generate with minimum support = 0.0002, and k = 1")

There are 592 frequent itemset could be generate with minimum support = 0.0002, and k = 1


In [None]:
# k = 2
itemsets[2]

{('0', '1'): 9823,
 ('0', '100'): 139,
 ('0', '101'): 4210,
 ('0', '102'): 325,
 ('0', '103'): 217,
 ('0', '104'): 552,
 ('0', '105'): 174,
 ('0', '11'): 598,
 ('0', '110'): 1753,
 ('0', '111'): 444,
 ('0', '114'): 496,
 ('0', '117'): 1143,
 ('0', '12'): 209,
 ('0', '120'): 153,
 ('0', '121'): 265,
 ('0', '122'): 1013,
 ('0', '123'): 225,
 ('0', '124'): 318,
 ('0', '13'): 2319,
 ('0', '134'): 578,
 ('0', '135'): 181,
 ('0', '136'): 181,
 ('0', '138'): 210,
 ('0', '140'): 441,
 ('0', '142'): 182,
 ('0', '143'): 762,
 ('0', '144'): 400,
 ('0', '145'): 399,
 ('0', '146'): 820,
 ('0', '147'): 189,
 ('0', '148'): 181,
 ('0', '15'): 2949,
 ('0', '150'): 325,
 ('0', '151'): 359,
 ('0', '152'): 192,
 ('0', '157'): 329,
 ('0', '158'): 255,
 ('0', '16'): 702,
 ('0', '160'): 366,
 ('0', '162'): 325,
 ('0', '163'): 247,
 ('0', '164'): 159,
 ('0', '169'): 235,
 ('0', '17'): 1615,
 ('0', '170'): 572,
 ('0', '173'): 227,
 ('0', '174'): 344,
 ('0', '175'): 231,
 ('0', '176'): 223,
 ('0', '18'): 250,
 

In [None]:
print("There are",len(itemsets[2]),"frequent itemset could be generate with minimum support = 0.0002, and k = 2")

There are 4553 frequent itemset could be generate with minimum support = 0.0002, and k = 2


In [None]:
# k = 3
itemsets[3]

{('0', '1', '101'): 924,
 ('0', '1', '110'): 371,
 ('0', '1', '117'): 245,
 ('0', '1', '122'): 214,
 ('0', '1', '13'): 425,
 ('0', '1', '143'): 178,
 ('0', '1', '146'): 190,
 ('0', '1', '15'): 551,
 ('0', '1', '16'): 154,
 ('0', '1', '17'): 309,
 ('0', '1', '19'): 631,
 ('0', '1', '21'): 2548,
 ('0', '1', '22'): 502,
 ('0', '1', '23'): 676,
 ('0', '1', '234'): 196,
 ('0', '1', '24'): 225,
 ('0', '1', '27'): 203,
 ('0', '1', '3'): 441,
 ('0', '1', '30'): 138,
 ('0', '1', '33'): 1701,
 ('0', '1', '34'): 1449,
 ('0', '1', '35'): 517,
 ('0', '1', '36'): 179,
 ('0', '1', '38'): 204,
 ('0', '1', '4'): 466,
 ('0', '1', '42'): 210,
 ('0', '1', '44'): 294,
 ('0', '1', '45'): 216,
 ('0', '1', '50'): 218,
 ('0', '1', '58'): 685,
 ('0', '1', '59'): 2069,
 ('0', '1', '62'): 259,
 ('0', '1', '64'): 643,
 ('0', '1', '66'): 399,
 ('0', '1', '67'): 167,
 ('0', '1', '68'): 189,
 ('0', '1', '7'): 160,
 ('0', '1', '70'): 162,
 ('0', '1', '71'): 139,
 ('0', '1', '74'): 187,
 ('0', '1', '78'): 386,
 ('0', '

In [None]:
print("There are",len(itemsets[3]),"frequent itemset could be generate with minimum support = 0.0002, and k = 3")

There are 2734 frequent itemset could be generate with minimum support = 0.0002, and k = 3


In [None]:
# k = 4
itemsets[4]

{('0', '1', '101', '21'): 309,
 ('0', '1', '101', '33'): 196,
 ('0', '1', '101', '34'): 200,
 ('0', '1', '101', '59'): 199,
 ('0', '1', '19', '21'): 152,
 ('0', '1', '19', '33'): 163,
 ('0', '1', '19', '59'): 145,
 ('0', '1', '21', '23'): 171,
 ('0', '1', '21', '33'): 495,
 ('0', '1', '21', '34'): 462,
 ('0', '1', '21', '4'): 153,
 ('0', '1', '21', '58'): 211,
 ('0', '1', '21', '59'): 552,
 ('0', '1', '21', '64'): 161,
 ('0', '1', '21', '66'): 134,
 ('0', '1', '21', '96'): 179,
 ('0', '1', '22', '33'): 154,
 ('0', '1', '23', '33'): 199,
 ('0', '1', '23', '59'): 215,
 ('0', '1', '33', '34'): 290,
 ('0', '1', '33', '58'): 314,
 ('0', '1', '33', '59'): 450,
 ('0', '1', '33', '96'): 143,
 ('0', '1', '34', '59'): 344,
 ('0', '1', '34', '64'): 146,
 ('0', '1', '4', '59'): 142,
 ('0', '1', '58', '59'): 199,
 ('0', '1', '59', '64'): 142,
 ('0', '1', '59', '96'): 154,
 ('0', '101', '33', '59'): 140,
 ('0', '21', '33', '34'): 166,
 ('0', '21', '33', '58'): 185,
 ('0', '21', '33', '59'): 288,
 ('

In [None]:
print("There are",len(itemsets[4]),"frequent itemset could be generate with minimum support = 0.0002, and k = 4")

There are 334 frequent itemset could be generate with minimum support = 0.0002, and k = 4


In [None]:
# k = 5
itemsets[5]

{('0', '1', '21', '33', '59'): 144,
 ('1', '21', '33', '58', '59'): 146,
 ('104', '143', '15', '24', '30'): 233,
 ('104', '143', '15', '30', '31'): 159,
 ('104', '15', '24', '30', '31'): 134,
 ('143', '15', '24', '30', '31'): 141}

In [None]:
print("There are",len(itemsets[5]),"frequent itemset could be generate with minimum support = 0.0002, and k = 5")

There are 6 frequent itemset could be generate with minimum support = 0.0002, and k = 5


### Under setting of minimum support = 0.0003, find all frequent itemset.

In [37]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0003, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start)
print("Elapsed time during the whole program in seconds:", t1_stop-t1_start)

t3_time = t1_stop - t1_start

itemsets, _ = itemsets_from_transactions(records, min_support=0.0003)

Elapsed time: 113.46875 103.203125
Elapsed time during the whole program in seconds: 10.265625


In [38]:
#The whole itemset
itemsets

{1: {('0',): 48863,
  ('1',): 100762,
  ('3',): 42471,
  ('4',): 10876,
  ('5',): 4157,
  ('6',): 2314,
  ('7',): 5922,
  ('8',): 8424,
  ('9',): 533,
  ('10',): 493,
  ('11',): 9252,
  ('12',): 5279,
  ('13',): 21425,
  ('14',): 3032,
  ('15',): 38893,
  ('16',): 11175,
  ('17',): 27739,
  ('18',): 5556,
  ('19',): 24971,
  ('20',): 777,
  ('21',): 54513,
  ('22',): 19701,
  ('23',): 23770,
  ('24',): 22835,
  ('25',): 2528,
  ('26',): 276,
  ('27',): 4886,
  ('28',): 4633,
  ('29',): 392,
  ('30',): 14188,
  ('31',): 9921,
  ('32',): 7503,
  ('33',): 45942,
  ('34',): 22339,
  ('35',): 28139,
  ('36',): 10734,
  ('37',): 3565,
  ('38',): 6249,
  ('39',): 8126,
  ('40',): 2595,
  ('41',): 1013,
  ('42',): 12144,
  ('43',): 4261,
  ('44',): 8754,
  ('45',): 8616,
  ('46',): 456,
  ('47',): 1004,
  ('48',): 2775,
  ('49',): 8150,
  ('50',): 13448,
  ('51',): 5158,
  ('52',): 4772,
  ('53',): 4536,
  ('54',): 295,
  ('55',): 598,
  ('56',): 539,
  ('57',): 6474,
  ('58',): 17811,
  ('59'

In [39]:
# Print output for verification
print("There are at most",len(itemsets))

There are at most 5


In [None]:

x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4]) + len(itemsets[5])
print("There are in total",x,"itemset could be generate with minimum support = 0.0003, and k vary from 1 to 5")

There are in total 5077 itemset could be generate with minimum support = 0.0003, and k vary from 1 to 5


### Number of size k frequent patterns for each size k with at least one frequent pattern:

In [None]:
# k = 1
itemsets[1]

{('0',): 48863,
 ('1',): 100762,
 ('3',): 42471,
 ('4',): 10876,
 ('5',): 4157,
 ('6',): 2314,
 ('7',): 5922,
 ('8',): 8424,
 ('9',): 533,
 ('10',): 493,
 ('11',): 9252,
 ('12',): 5279,
 ('13',): 21425,
 ('14',): 3032,
 ('15',): 38893,
 ('16',): 11175,
 ('17',): 27739,
 ('18',): 5556,
 ('19',): 24971,
 ('20',): 777,
 ('21',): 54513,
 ('22',): 19701,
 ('23',): 23770,
 ('24',): 22835,
 ('25',): 2528,
 ('26',): 276,
 ('27',): 4886,
 ('28',): 4633,
 ('29',): 392,
 ('30',): 14188,
 ('31',): 9921,
 ('32',): 7503,
 ('33',): 45942,
 ('34',): 22339,
 ('35',): 28139,
 ('36',): 10734,
 ('37',): 3565,
 ('38',): 6249,
 ('39',): 8126,
 ('40',): 2595,
 ('41',): 1013,
 ('42',): 12144,
 ('43',): 4261,
 ('44',): 8754,
 ('45',): 8616,
 ('46',): 456,
 ('47',): 1004,
 ('48',): 2775,
 ('49',): 8150,
 ('50',): 13448,
 ('51',): 5158,
 ('52',): 4772,
 ('53',): 4536,
 ('54',): 295,
 ('55',): 598,
 ('56',): 539,
 ('57',): 6474,
 ('58',): 17811,
 ('59',): 66523,
 ('60',): 8439,
 ('61',): 2619,
 ('62',): 7199,
 ('

In [None]:
print("There are",len(itemsets[1]),"frequent itemset could be generate with minimum support = 0.0003, and k = 1")

There are 524 frequent itemset could be generate with minimum support = 0.0003, and k = 1


In [None]:
# k = 2
itemsets[2]

{('0', '1'): 9823,
 ('0', '101'): 4210,
 ('0', '102'): 325,
 ('0', '103'): 217,
 ('0', '104'): 552,
 ('0', '11'): 598,
 ('0', '110'): 1753,
 ('0', '111'): 444,
 ('0', '114'): 496,
 ('0', '117'): 1143,
 ('0', '12'): 209,
 ('0', '121'): 265,
 ('0', '122'): 1013,
 ('0', '123'): 225,
 ('0', '124'): 318,
 ('0', '13'): 2319,
 ('0', '134'): 578,
 ('0', '138'): 210,
 ('0', '140'): 441,
 ('0', '143'): 762,
 ('0', '144'): 400,
 ('0', '145'): 399,
 ('0', '146'): 820,
 ('0', '15'): 2949,
 ('0', '150'): 325,
 ('0', '151'): 359,
 ('0', '157'): 329,
 ('0', '158'): 255,
 ('0', '16'): 702,
 ('0', '160'): 366,
 ('0', '162'): 325,
 ('0', '163'): 247,
 ('0', '169'): 235,
 ('0', '17'): 1615,
 ('0', '170'): 572,
 ('0', '173'): 227,
 ('0', '174'): 344,
 ('0', '175'): 231,
 ('0', '176'): 223,
 ('0', '18'): 250,
 ('0', '180'): 336,
 ('0', '182'): 298,
 ('0', '185'): 268,
 ('0', '19'): 2910,
 ('0', '191'): 313,
 ('0', '194'): 613,
 ('0', '195'): 604,
 ('0', '206'): 227,
 ('0', '21'): 5441,
 ('0', '214'): 264,
 

In [None]:
print("There are",len(itemsets[2]),"frequent itemset could be generate with minimum support = 0.0003, and k = 2")

There are 3081 frequent itemset could be generate with minimum support = 0.0003, and k = 2


In [None]:
# k = 3
itemsets[3]

{('0', '1', '101'): 924,
 ('0', '1', '110'): 371,
 ('0', '1', '117'): 245,
 ('0', '1', '122'): 214,
 ('0', '1', '13'): 425,
 ('0', '1', '15'): 551,
 ('0', '1', '17'): 309,
 ('0', '1', '19'): 631,
 ('0', '1', '21'): 2548,
 ('0', '1', '22'): 502,
 ('0', '1', '23'): 676,
 ('0', '1', '24'): 225,
 ('0', '1', '27'): 203,
 ('0', '1', '3'): 441,
 ('0', '1', '33'): 1701,
 ('0', '1', '34'): 1449,
 ('0', '1', '35'): 517,
 ('0', '1', '38'): 204,
 ('0', '1', '4'): 466,
 ('0', '1', '42'): 210,
 ('0', '1', '44'): 294,
 ('0', '1', '45'): 216,
 ('0', '1', '50'): 218,
 ('0', '1', '58'): 685,
 ('0', '1', '59'): 2069,
 ('0', '1', '62'): 259,
 ('0', '1', '64'): 643,
 ('0', '1', '66'): 399,
 ('0', '1', '78'): 386,
 ('0', '1', '8'): 221,
 ('0', '1', '82'): 219,
 ('0', '1', '88'): 283,
 ('0', '1', '91'): 230,
 ('0', '1', '96'): 566,
 ('0', '101', '15'): 225,
 ('0', '101', '21'): 573,
 ('0', '101', '22'): 213,
 ('0', '101', '23'): 266,
 ('0', '101', '33'): 693,
 ('0', '101', '34'): 430,
 ('0', '101', '35'): 21

In [None]:
print("There are",len(itemsets[3]),"frequent itemset could be generate with minimum support = 0.0003, and k = 3")

There are 1342 frequent itemset could be generate with minimum support = 0.0003, and k = 3


In [None]:
# k = 4
itemsets[4]

{('0', '1', '101', '21'): 309,
 ('0', '1', '21', '33'): 495,
 ('0', '1', '21', '34'): 462,
 ('0', '1', '21', '58'): 211,
 ('0', '1', '21', '59'): 552,
 ('0', '1', '23', '59'): 215,
 ('0', '1', '33', '34'): 290,
 ('0', '1', '33', '58'): 314,
 ('0', '1', '33', '59'): 450,
 ('0', '1', '34', '59'): 344,
 ('0', '21', '33', '59'): 288,
 ('0', '33', '58', '59'): 241,
 ('1', '19', '21', '23'): 228,
 ('1', '19', '21', '33'): 223,
 ('1', '19', '21', '59'): 248,
 ('1', '19', '23', '33'): 241,
 ('1', '19', '23', '59'): 265,
 ('1', '19', '33', '59'): 268,
 ('1', '21', '22', '23'): 212,
 ('1', '21', '22', '33'): 212,
 ('1', '21', '22', '59'): 228,
 ('1', '21', '23', '33'): 331,
 ('1', '21', '23', '59'): 348,
 ('1', '21', '33', '34'): 463,
 ('1', '21', '33', '58'): 497,
 ('1', '21', '33', '59'): 685,
 ('1', '21', '33', '66'): 214,
 ('1', '21', '33', '96'): 251,
 ('1', '21', '34', '4'): 202,
 ('1', '21', '34', '59'): 588,
 ('1', '21', '34', '64'): 343,
 ('1', '21', '34', '88'): 209,
 ('1', '21', '34',

In [None]:
print("There are",len(itemsets[4]),"frequent itemset could be generate with minimum support = 0.0003, and k = 4")

There are 129 frequent itemset could be generate with minimum support = 0.0003, and k = 4


In [None]:
# k = 5
itemsets[5]

{('104', '143', '15', '24', '30'): 233}

In [None]:
print("There are",len(itemsets[5]),"frequent itemset could be generate with minimum support = 0.0003, and k = 5")

There are 1 frequent itemset could be generate with minimum support = 0.0003, and k = 5


### Under setting of minimum support = 0.0004, find all frequent itemset.

In [51]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0004, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start)
 
print("Elapsed time during the whole program in seconds:", t1_stop-t1_start)

t4_time = t1_stop - t1_start

itemsets, _ = itemsets_from_transactions(records, min_support=0.0004)

Elapsed time: 130.59375 122.390625
Elapsed time during the whole program in seconds: 8.203125


In [52]:
#The whole itemset
itemsets

{1: {('0',): 48863,
  ('1',): 100762,
  ('3',): 42471,
  ('4',): 10876,
  ('5',): 4157,
  ('6',): 2314,
  ('7',): 5922,
  ('8',): 8424,
  ('9',): 533,
  ('10',): 493,
  ('11',): 9252,
  ('12',): 5279,
  ('13',): 21425,
  ('14',): 3032,
  ('15',): 38893,
  ('16',): 11175,
  ('17',): 27739,
  ('18',): 5556,
  ('19',): 24971,
  ('20',): 777,
  ('21',): 54513,
  ('22',): 19701,
  ('23',): 23770,
  ('24',): 22835,
  ('25',): 2528,
  ('26',): 276,
  ('27',): 4886,
  ('28',): 4633,
  ('29',): 392,
  ('30',): 14188,
  ('31',): 9921,
  ('32',): 7503,
  ('33',): 45942,
  ('34',): 22339,
  ('35',): 28139,
  ('36',): 10734,
  ('37',): 3565,
  ('38',): 6249,
  ('39',): 8126,
  ('40',): 2595,
  ('41',): 1013,
  ('42',): 12144,
  ('43',): 4261,
  ('44',): 8754,
  ('45',): 8616,
  ('46',): 456,
  ('47',): 1004,
  ('48',): 2775,
  ('49',): 8150,
  ('50',): 13448,
  ('51',): 5158,
  ('52',): 4772,
  ('53',): 4536,
  ('54',): 295,
  ('55',): 598,
  ('56',): 539,
  ('57',): 6474,
  ('58',): 17811,
  ('59'

In [53]:
# Print output for verification
print("There are at most",len(itemsets))

There are at most 4


In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4])
print("There are in total",x,"itemset could be generate with minimum support = 0.0004, and k vary from 1 to 4")

There are in total 3556 itemset could be generate with minimum support = 0.0004, and k vary from 1 to 4


### Number of size k frequent patterns for each size k with at least one frequent pattern:

In [None]:
# k = 1
itemsets[1]

{('0',): 48863,
 ('1',): 100762,
 ('3',): 42471,
 ('4',): 10876,
 ('5',): 4157,
 ('6',): 2314,
 ('7',): 5922,
 ('8',): 8424,
 ('9',): 533,
 ('10',): 493,
 ('11',): 9252,
 ('12',): 5279,
 ('13',): 21425,
 ('14',): 3032,
 ('15',): 38893,
 ('16',): 11175,
 ('17',): 27739,
 ('18',): 5556,
 ('19',): 24971,
 ('20',): 777,
 ('21',): 54513,
 ('22',): 19701,
 ('23',): 23770,
 ('24',): 22835,
 ('25',): 2528,
 ('26',): 276,
 ('27',): 4886,
 ('28',): 4633,
 ('29',): 392,
 ('30',): 14188,
 ('31',): 9921,
 ('32',): 7503,
 ('33',): 45942,
 ('34',): 22339,
 ('35',): 28139,
 ('36',): 10734,
 ('37',): 3565,
 ('38',): 6249,
 ('39',): 8126,
 ('40',): 2595,
 ('41',): 1013,
 ('42',): 12144,
 ('43',): 4261,
 ('44',): 8754,
 ('45',): 8616,
 ('46',): 456,
 ('47',): 1004,
 ('48',): 2775,
 ('49',): 8150,
 ('50',): 13448,
 ('51',): 5158,
 ('52',): 4772,
 ('53',): 4536,
 ('54',): 295,
 ('55',): 598,
 ('56',): 539,
 ('57',): 6474,
 ('58',): 17811,
 ('59',): 66523,
 ('60',): 8439,
 ('61',): 2619,
 ('62',): 7199,
 ('

In [None]:
print("There are",len(itemsets[1]),"frequent itemset could be generate with minimum support = 0.0004, and k = 1")

There are 470 frequent itemset could be generate with minimum support = 0.0004, and k = 1


In [None]:
# k = 2
itemsets[2]

{('0', '1'): 9823,
 ('0', '101'): 4210,
 ('0', '102'): 325,
 ('0', '104'): 552,
 ('0', '11'): 598,
 ('0', '110'): 1753,
 ('0', '111'): 444,
 ('0', '114'): 496,
 ('0', '117'): 1143,
 ('0', '122'): 1013,
 ('0', '124'): 318,
 ('0', '13'): 2319,
 ('0', '134'): 578,
 ('0', '140'): 441,
 ('0', '143'): 762,
 ('0', '144'): 400,
 ('0', '145'): 399,
 ('0', '146'): 820,
 ('0', '15'): 2949,
 ('0', '150'): 325,
 ('0', '151'): 359,
 ('0', '157'): 329,
 ('0', '16'): 702,
 ('0', '160'): 366,
 ('0', '162'): 325,
 ('0', '17'): 1615,
 ('0', '170'): 572,
 ('0', '174'): 344,
 ('0', '180'): 336,
 ('0', '182'): 298,
 ('0', '185'): 268,
 ('0', '19'): 2910,
 ('0', '191'): 313,
 ('0', '194'): 613,
 ('0', '195'): 604,
 ('0', '21'): 5441,
 ('0', '218'): 323,
 ('0', '22'): 1860,
 ('0', '220'): 396,
 ('0', '23'): 2744,
 ('0', '234'): 699,
 ('0', '24'): 1237,
 ('0', '243'): 465,
 ('0', '248'): 324,
 ('0', '260'): 450,
 ('0', '27'): 746,
 ('0', '28'): 514,
 ('0', '295'): 287,
 ('0', '3'): 2205,
 ('0', '30'): 718,
 ('

In [None]:
print("There are",len(itemsets[2]),"frequent itemset could be generate with minimum support = 0.0004, and k = 2")

There are 2237 frequent itemset could be generate with minimum support = 0.0004, and k = 2


In [None]:
# k = 3
itemsets[3]

{('0', '1', '101'): 924,
 ('0', '1', '110'): 371,
 ('0', '1', '13'): 425,
 ('0', '1', '15'): 551,
 ('0', '1', '17'): 309,
 ('0', '1', '19'): 631,
 ('0', '1', '21'): 2548,
 ('0', '1', '22'): 502,
 ('0', '1', '23'): 676,
 ('0', '1', '3'): 441,
 ('0', '1', '33'): 1701,
 ('0', '1', '34'): 1449,
 ('0', '1', '35'): 517,
 ('0', '1', '4'): 466,
 ('0', '1', '44'): 294,
 ('0', '1', '58'): 685,
 ('0', '1', '59'): 2069,
 ('0', '1', '64'): 643,
 ('0', '1', '66'): 399,
 ('0', '1', '78'): 386,
 ('0', '1', '88'): 283,
 ('0', '1', '96'): 566,
 ('0', '101', '21'): 573,
 ('0', '101', '33'): 693,
 ('0', '101', '34'): 430,
 ('0', '101', '58'): 308,
 ('0', '101', '59'): 602,
 ('0', '110', '59'): 282,
 ('0', '117', '35'): 348,
 ('0', '13', '15'): 420,
 ('0', '13', '33'): 324,
 ('0', '13', '59'): 448,
 ('0', '15', '17'): 319,
 ('0', '15', '24'): 334,
 ('0', '15', '33'): 390,
 ('0', '15', '35'): 341,
 ('0', '15', '59'): 514,
 ('0', '19', '21'): 302,
 ('0', '19', '23'): 496,
 ('0', '19', '33'): 587,
 ('0', '19'

In [None]:
print("There are",len(itemsets[3]),"frequent itemset could be generate with minimum support = 0.0004, and k = 3")

There are 800 frequent itemset could be generate with minimum support = 0.0004, and k = 3


In [None]:
# k = 4
itemsets[4]

{('0', '1', '101', '21'): 309,
 ('0', '1', '21', '33'): 495,
 ('0', '1', '21', '34'): 462,
 ('0', '1', '21', '59'): 552,
 ('0', '1', '33', '34'): 290,
 ('0', '1', '33', '58'): 314,
 ('0', '1', '33', '59'): 450,
 ('0', '1', '34', '59'): 344,
 ('0', '21', '33', '59'): 288,
 ('1', '19', '33', '59'): 268,
 ('1', '21', '23', '33'): 331,
 ('1', '21', '23', '59'): 348,
 ('1', '21', '33', '34'): 463,
 ('1', '21', '33', '58'): 497,
 ('1', '21', '33', '59'): 685,
 ('1', '21', '34', '59'): 588,
 ('1', '21', '34', '64'): 343,
 ('1', '21', '4', '59'): 320,
 ('1', '21', '58', '59'): 277,
 ('1', '21', '59', '66'): 373,
 ('1', '21', '59', '96'): 309,
 ('1', '23', '33', '59'): 418,
 ('1', '33', '34', '59'): 348,
 ('1', '33', '58', '59'): 444,
 ('104', '143', '15', '24'): 461,
 ('104', '143', '15', '30'): 550,
 ('104', '143', '15', '31'): 287,
 ('104', '143', '24', '30'): 441,
 ('104', '143', '30', '31'): 322,
 ('104', '15', '17', '24'): 299,
 ('104', '15', '24', '30'): 670,
 ('104', '15', '30', '31'): 

In [None]:
print("There are",len(itemsets[4]),"frequent itemset could be generate with minimum support = 0.0004, and k = 4")

There are 49 frequent itemset could be generate with minimum support = 0.0004, and k = 4


In [None]:
# k = 5
#There are no association rule could be generate with minimum support = 0.0004, and k = 5

### Under setting of minimum support = 0.0005, find all frequent itemset.

In [64]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0005, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start)
 
print("Elapsed time during the whole program in seconds:", t1_stop-t1_start)

t5_time = t1_stop - t1_start

itemsets, _ = itemsets_from_transactions(records, min_support=0.0005)

Elapsed time: 144.96875 137.78125
Elapsed time during the whole program in seconds: 7.1875


In [65]:
#The whole itemset
itemsets

{1: {('0',): 48863,
  ('1',): 100762,
  ('3',): 42471,
  ('4',): 10876,
  ('5',): 4157,
  ('6',): 2314,
  ('7',): 5922,
  ('8',): 8424,
  ('9',): 533,
  ('10',): 493,
  ('11',): 9252,
  ('12',): 5279,
  ('13',): 21425,
  ('14',): 3032,
  ('15',): 38893,
  ('16',): 11175,
  ('17',): 27739,
  ('18',): 5556,
  ('19',): 24971,
  ('20',): 777,
  ('21',): 54513,
  ('22',): 19701,
  ('23',): 23770,
  ('24',): 22835,
  ('25',): 2528,
  ('27',): 4886,
  ('28',): 4633,
  ('29',): 392,
  ('30',): 14188,
  ('31',): 9921,
  ('32',): 7503,
  ('33',): 45942,
  ('34',): 22339,
  ('35',): 28139,
  ('36',): 10734,
  ('37',): 3565,
  ('38',): 6249,
  ('39',): 8126,
  ('40',): 2595,
  ('41',): 1013,
  ('42',): 12144,
  ('43',): 4261,
  ('44',): 8754,
  ('45',): 8616,
  ('46',): 456,
  ('47',): 1004,
  ('48',): 2775,
  ('49',): 8150,
  ('50',): 13448,
  ('51',): 5158,
  ('52',): 4772,
  ('53',): 4536,
  ('55',): 598,
  ('56',): 539,
  ('57',): 6474,
  ('58',): 17811,
  ('59',): 66523,
  ('60',): 8439,
  ('

In [66]:
# Print output for verification
print("There are at most",len(itemsets))

There are at most 4


In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4])
print("There are in total",x,"itemset could be generate with minimum support = 0.0005, and k vary from 1 to 4")

There are in total 2695 itemset could be generate with minimum support = 0.0005, and k vary from 1 to 4


### Number of size k frequent patterns for each size k with at least one frequent pattern:

In [None]:
# k = 1
itemsets[1]

{('0',): 48863,
 ('1',): 100762,
 ('3',): 42471,
 ('4',): 10876,
 ('5',): 4157,
 ('6',): 2314,
 ('7',): 5922,
 ('8',): 8424,
 ('9',): 533,
 ('10',): 493,
 ('11',): 9252,
 ('12',): 5279,
 ('13',): 21425,
 ('14',): 3032,
 ('15',): 38893,
 ('16',): 11175,
 ('17',): 27739,
 ('18',): 5556,
 ('19',): 24971,
 ('20',): 777,
 ('21',): 54513,
 ('22',): 19701,
 ('23',): 23770,
 ('24',): 22835,
 ('25',): 2528,
 ('27',): 4886,
 ('28',): 4633,
 ('29',): 392,
 ('30',): 14188,
 ('31',): 9921,
 ('32',): 7503,
 ('33',): 45942,
 ('34',): 22339,
 ('35',): 28139,
 ('36',): 10734,
 ('37',): 3565,
 ('38',): 6249,
 ('39',): 8126,
 ('40',): 2595,
 ('41',): 1013,
 ('42',): 12144,
 ('43',): 4261,
 ('44',): 8754,
 ('45',): 8616,
 ('46',): 456,
 ('47',): 1004,
 ('48',): 2775,
 ('49',): 8150,
 ('50',): 13448,
 ('51',): 5158,
 ('52',): 4772,
 ('53',): 4536,
 ('55',): 598,
 ('56',): 539,
 ('57',): 6474,
 ('58',): 17811,
 ('59',): 66523,
 ('60',): 8439,
 ('61',): 2619,
 ('62',): 7199,
 ('63',): 9812,
 ('64',): 15804,


In [None]:
print("There are",len(itemsets[1]),"frequent itemset could be generate with minimum support = 0.0005, and k = 1 ")

There are 437 frequent itemset could be generate with minimum support = 0.0005, and k = 1 


In [None]:
# k = 2
itemsets[2]

{('0', '1'): 9823,
 ('0', '101'): 4210,
 ('0', '104'): 552,
 ('0', '11'): 598,
 ('0', '110'): 1753,
 ('0', '111'): 444,
 ('0', '114'): 496,
 ('0', '117'): 1143,
 ('0', '122'): 1013,
 ('0', '13'): 2319,
 ('0', '134'): 578,
 ('0', '140'): 441,
 ('0', '143'): 762,
 ('0', '144'): 400,
 ('0', '145'): 399,
 ('0', '146'): 820,
 ('0', '15'): 2949,
 ('0', '151'): 359,
 ('0', '16'): 702,
 ('0', '160'): 366,
 ('0', '17'): 1615,
 ('0', '170'): 572,
 ('0', '174'): 344,
 ('0', '180'): 336,
 ('0', '19'): 2910,
 ('0', '194'): 613,
 ('0', '195'): 604,
 ('0', '21'): 5441,
 ('0', '22'): 1860,
 ('0', '220'): 396,
 ('0', '23'): 2744,
 ('0', '234'): 699,
 ('0', '24'): 1237,
 ('0', '243'): 465,
 ('0', '260'): 450,
 ('0', '27'): 746,
 ('0', '28'): 514,
 ('0', '3'): 2205,
 ('0', '30'): 718,
 ('0', '31'): 524,
 ('0', '316'): 559,
 ('0', '32'): 731,
 ('0', '33'): 6331,
 ('0', '34'): 3563,
 ('0', '35'): 2400,
 ('0', '36'): 967,
 ('0', '38'): 809,
 ('0', '39'): 584,
 ('0', '4'): 1627,
 ('0', '42'): 1185,
 ('0', '4

In [None]:
print("There are",len(itemsets[2]),"frequent itemset could be generate with minimum support = 0.0005, and k = 2 ")

There are 1727 frequent itemset could be generate with minimum support = 0.0005, and k = 2 


In [None]:
# k = 3
itemsets[3]

{('0', '1', '101'): 924,
 ('0', '1', '110'): 371,
 ('0', '1', '13'): 425,
 ('0', '1', '15'): 551,
 ('0', '1', '19'): 631,
 ('0', '1', '21'): 2548,
 ('0', '1', '22'): 502,
 ('0', '1', '23'): 676,
 ('0', '1', '3'): 441,
 ('0', '1', '33'): 1701,
 ('0', '1', '34'): 1449,
 ('0', '1', '35'): 517,
 ('0', '1', '4'): 466,
 ('0', '1', '58'): 685,
 ('0', '1', '59'): 2069,
 ('0', '1', '64'): 643,
 ('0', '1', '66'): 399,
 ('0', '1', '78'): 386,
 ('0', '1', '96'): 566,
 ('0', '101', '21'): 573,
 ('0', '101', '33'): 693,
 ('0', '101', '34'): 430,
 ('0', '101', '59'): 602,
 ('0', '117', '35'): 348,
 ('0', '13', '15'): 420,
 ('0', '13', '59'): 448,
 ('0', '15', '33'): 390,
 ('0', '15', '35'): 341,
 ('0', '15', '59'): 514,
 ('0', '19', '23'): 496,
 ('0', '19', '33'): 587,
 ('0', '19', '59'): 561,
 ('0', '21', '23'): 337,
 ('0', '21', '33'): 999,
 ('0', '21', '34'): 733,
 ('0', '21', '58'): 400,
 ('0', '21', '59'): 1135,
 ('0', '21', '96'): 352,
 ('0', '22', '23'): 372,
 ('0', '22', '33'): 473,
 ('0', '2

In [None]:
print("There are",len(itemsets[3]),"frequent itemset could be generate with minimum support = 0.0005, and k = 3 ")

There are 505 frequent itemset could be generate with minimum support = 0.0005, and k = 3 


In [None]:
# k = 4
itemsets[4]

{('0', '1', '21', '33'): 495,
 ('0', '1', '21', '34'): 462,
 ('0', '1', '21', '59'): 552,
 ('0', '1', '33', '59'): 450,
 ('0', '1', '34', '59'): 344,
 ('1', '21', '23', '59'): 348,
 ('1', '21', '33', '34'): 463,
 ('1', '21', '33', '58'): 497,
 ('1', '21', '33', '59'): 685,
 ('1', '21', '34', '59'): 588,
 ('1', '21', '34', '64'): 343,
 ('1', '21', '59', '66'): 373,
 ('1', '23', '33', '59'): 418,
 ('1', '33', '34', '59'): 348,
 ('1', '33', '58', '59'): 444,
 ('104', '143', '15', '24'): 461,
 ('104', '143', '15', '30'): 550,
 ('104', '143', '24', '30'): 441,
 ('104', '15', '24', '30'): 670,
 ('104', '15', '30', '31'): 373,
 ('13', '15', '17', '24'): 361,
 ('143', '15', '17', '24'): 349,
 ('143', '15', '24', '30'): 599,
 ('143', '15', '30', '31'): 374,
 ('15', '17', '24', '30'): 357,
 ('15', '24', '30', '95'): 376}

In [None]:
print("There are",len(itemsets[4]),"frequent itemset could be generate with minimum support = 0.0005, and k = 4")

There are 26 frequent itemset could be generate with minimum support = 0.0005, and k = 4


In [None]:
# k = 5
#There are no association rule could be generate with minimum support = 0.0005, and k = 5

### Total time used to find all frequent itemset before using acceleration techniques in seconds:

In [None]:
before_op = t1_time + t2_time + t3_time + t4_time + t5_time 
before_op

66.109375

#### ***"Acceleration method"***

### Notice that there are some transactions with only one item. Since we want to focus on finding frequently bought together items, these transactions will be dropped.

In [None]:
df3 = (df2.dropna(subset = ['B'], how = 'all'))
df3

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,P
0,0,1,2,,,,,,,,,,,,
2,4,3,,,,,,,,,,,,,
3,5,6,7,8,0,9,10,,,,,,,,
4,11,12,,,,,,,,,,,,,
5,13,14,15,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
668316,796,851,,,,,,,,,,,,,
668321,849,823,,,,,,,,,,,,,
668322,768,764,,,,,,,,,,,,,
668326,769,779,,,,,,,,,,,,,


In [79]:
# Print output for verification
print("There are ",len(df2)-len(df3),"of rows (transactions) dropped")

There are  270831 of rows (transactions) dropped


In [None]:
records = []
for i in range(0,len(df3)):
    records.append(
        [
        str(df2.values[i,j])
        for j in range(0,len(df3.columns))
        ]
    )

In [81]:
for i, j in enumerate(records):
    while "None" in records[i]:
        records[i].remove("None")

### Under setting of min_support = 0.0001, find all frequent itemset.

In [82]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0001, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start) 
print("Elapsed time during the whole program in seconds:", t1_stop - t1_start)

t6_time = t1_stop-t1_start

itemsets, _ = itemsets_from_transactions(records, min_support = 0.0001)

Elapsed time: 182.671875 167.796875
Elapsed time during the whole program in seconds: 14.875


In [83]:
itemsets

{1: {('0',): 30006,
  ('1',): 69336,
  ('2',): 95,
  ('3',): 27018,
  ('4',): 6952,
  ('5',): 2313,
  ('6',): 1104,
  ('7',): 3762,
  ('8',): 5476,
  ('9',): 289,
  ('10',): 247,
  ('11',): 4821,
  ('12',): 3145,
  ('13',): 10718,
  ('14',): 1949,
  ('15',): 20036,
  ('16',): 6947,
  ('17',): 15049,
  ('18',): 2865,
  ('19',): 16370,
  ('20',): 416,
  ('21',): 38397,
  ('22',): 13020,
  ('23',): 15124,
  ('24',): 11523,
  ('25',): 1246,
  ('26',): 187,
  ('27',): 3111,
  ('28',): 2885,
  ('29',): 385,
  ('30',): 7992,
  ('31',): 5157,
  ('32',): 3703,
  ('33',): 27459,
  ('34',): 13047,
  ('35',): 13963,
  ('36',): 6066,
  ('37',): 2211,
  ('38',): 3843,
  ('39',): 4103,
  ('40',): 1311,
  ('41',): 499,
  ('42',): 5983,
  ('43',): 2300,
  ('44',): 5742,
  ('45',): 5622,
  ('46',): 140,
  ('47',): 713,
  ('48',): 1360,
  ('49',): 4080,
  ('50',): 5790,
  ('51',): 2517,
  ('52',): 2377,
  ('53',): 2768,
  ('54',): 166,
  ('55',): 326,
  ('56',): 284,
  ('57',): 3156,
  ('58',): 10006,
  

In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4]) + len(itemsets[5]) + len(itemsets[6])
print("There are in total",x,"itemset could be generate with minimum support = 0.0001, and k vary from 1 to 6")

There are in total 19224 itemset could be generate with minimum support = 0.0001, and k vary from 1 to 6


### Under setting of min_support = 0.0002, find all frequent itemset.

In [85]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0002, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start) 
print("Elapsed time during the whole program in seconds:", t1_stop - t1_start)

t7_time = t1_stop-t1_start

itemsets, _ = itemsets_from_transactions(records, min_support = 0.0002)

Elapsed time: 202.640625 194.1875
Elapsed time during the whole program in seconds: 8.453125


In [86]:
itemsets

{1: {('0',): 30006,
  ('1',): 69336,
  ('2',): 95,
  ('3',): 27018,
  ('4',): 6952,
  ('5',): 2313,
  ('6',): 1104,
  ('7',): 3762,
  ('8',): 5476,
  ('9',): 289,
  ('10',): 247,
  ('11',): 4821,
  ('12',): 3145,
  ('13',): 10718,
  ('14',): 1949,
  ('15',): 20036,
  ('16',): 6947,
  ('17',): 15049,
  ('18',): 2865,
  ('19',): 16370,
  ('20',): 416,
  ('21',): 38397,
  ('22',): 13020,
  ('23',): 15124,
  ('24',): 11523,
  ('25',): 1246,
  ('26',): 187,
  ('27',): 3111,
  ('28',): 2885,
  ('29',): 385,
  ('30',): 7992,
  ('31',): 5157,
  ('32',): 3703,
  ('33',): 27459,
  ('34',): 13047,
  ('35',): 13963,
  ('36',): 6066,
  ('37',): 2211,
  ('38',): 3843,
  ('39',): 4103,
  ('40',): 1311,
  ('41',): 499,
  ('42',): 5983,
  ('43',): 2300,
  ('44',): 5742,
  ('45',): 5622,
  ('46',): 140,
  ('47',): 713,
  ('48',): 1360,
  ('49',): 4080,
  ('50',): 5790,
  ('51',): 2517,
  ('52',): 2377,
  ('53',): 2768,
  ('54',): 166,
  ('55',): 326,
  ('56',): 284,
  ('57',): 3156,
  ('58',): 10006,
  

In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4]) + len(itemsets[5])
print("There are in total",x,"itemset could be generate with minimum support = 0.0002, and k vary from 1 to 5")

There are in total 8328 itemset could be generate with minimum support = 0.0002, and k vary from 1 to 5


### Under setting of min_support = 0.0003, find all frequent itemset.

In [88]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0003, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start) 
print("Elapsed time during the whole program in seconds:", t1_stop - t1_start)

t8_time = t1_stop-t1_start

itemsets, _ = itemsets_from_transactions(records, min_support = 0.0003)

Elapsed time: 215.125 209.328125
Elapsed time during the whole program in seconds: 5.796875


In [89]:
itemsets

{1: {('0',): 30006,
  ('1',): 69336,
  ('3',): 27018,
  ('4',): 6952,
  ('5',): 2313,
  ('6',): 1104,
  ('7',): 3762,
  ('8',): 5476,
  ('9',): 289,
  ('10',): 247,
  ('11',): 4821,
  ('12',): 3145,
  ('13',): 10718,
  ('14',): 1949,
  ('15',): 20036,
  ('16',): 6947,
  ('17',): 15049,
  ('18',): 2865,
  ('19',): 16370,
  ('20',): 416,
  ('21',): 38397,
  ('22',): 13020,
  ('23',): 15124,
  ('24',): 11523,
  ('25',): 1246,
  ('26',): 187,
  ('27',): 3111,
  ('28',): 2885,
  ('29',): 385,
  ('30',): 7992,
  ('31',): 5157,
  ('32',): 3703,
  ('33',): 27459,
  ('34',): 13047,
  ('35',): 13963,
  ('36',): 6066,
  ('37',): 2211,
  ('38',): 3843,
  ('39',): 4103,
  ('40',): 1311,
  ('41',): 499,
  ('42',): 5983,
  ('43',): 2300,
  ('44',): 5742,
  ('45',): 5622,
  ('46',): 140,
  ('47',): 713,
  ('48',): 1360,
  ('49',): 4080,
  ('50',): 5790,
  ('51',): 2517,
  ('52',): 2377,
  ('53',): 2768,
  ('54',): 166,
  ('55',): 326,
  ('56',): 284,
  ('57',): 3156,
  ('58',): 10006,
  ('59',): 42785

In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4]) + len(itemsets[5])
print("There are in total",x,"itemset could be generate with minimum support = 0.0003, and k vary from 1 to 5")

There are in total 5186 itemset could be generate with minimum support = 0.0003, and k vary from 1 to 5


### Under setting of min_support = 0.0004, find all frequent itemset.

In [91]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0004, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start) 
print("Elapsed time during the whole program in seconds:", t1_stop - t1_start)

t9_time = t1_stop-t1_start

itemsets, _ = itemsets_from_transactions(records, min_support = 0.0004)

Elapsed time: 224.8125 220.0
Elapsed time during the whole program in seconds: 4.8125


In [92]:
itemsets

{1: {('0',): 30006,
  ('1',): 69336,
  ('3',): 27018,
  ('4',): 6952,
  ('5',): 2313,
  ('6',): 1104,
  ('7',): 3762,
  ('8',): 5476,
  ('9',): 289,
  ('10',): 247,
  ('11',): 4821,
  ('12',): 3145,
  ('13',): 10718,
  ('14',): 1949,
  ('15',): 20036,
  ('16',): 6947,
  ('17',): 15049,
  ('18',): 2865,
  ('19',): 16370,
  ('20',): 416,
  ('21',): 38397,
  ('22',): 13020,
  ('23',): 15124,
  ('24',): 11523,
  ('25',): 1246,
  ('26',): 187,
  ('27',): 3111,
  ('28',): 2885,
  ('29',): 385,
  ('30',): 7992,
  ('31',): 5157,
  ('32',): 3703,
  ('33',): 27459,
  ('34',): 13047,
  ('35',): 13963,
  ('36',): 6066,
  ('37',): 2211,
  ('38',): 3843,
  ('39',): 4103,
  ('40',): 1311,
  ('41',): 499,
  ('42',): 5983,
  ('43',): 2300,
  ('44',): 5742,
  ('45',): 5622,
  ('47',): 713,
  ('48',): 1360,
  ('49',): 4080,
  ('50',): 5790,
  ('51',): 2517,
  ('52',): 2377,
  ('53',): 2768,
  ('54',): 166,
  ('55',): 326,
  ('56',): 284,
  ('57',): 3156,
  ('58',): 10006,
  ('59',): 42785,
  ('60',): 439

In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4])
print("There are in total",x,"itemset could be generate with minimum support = 0.0004, and k vary from 1 to 4")

There are in total 3632 itemset could be generate with minimum support = 0.0004, and k vary from 1 to 4


### Under setting of min_support = 0.0005, find all frequent itemset.

In [94]:
# Import time module for performance measurement
t1_start = process_time()
itemsets, rules = apriori(records, min_support = 0.0005, min_confidence = 0)
# Stop the stopwatch / counter
t1_stop = process_time()
 
print("Elapsed time:", t1_stop, t1_start) 
print("Elapsed time during the whole program in seconds:", t1_stop - t1_start)

t10_time = t1_stop-t1_start

itemsets, _ = itemsets_from_transactions(records, min_support = 0.0005)

Elapsed time: 232.734375 228.78125
Elapsed time during the whole program in seconds: 3.953125


In [95]:
itemsets

{1: {('0',): 30006,
  ('1',): 69336,
  ('3',): 27018,
  ('4',): 6952,
  ('5',): 2313,
  ('6',): 1104,
  ('7',): 3762,
  ('8',): 5476,
  ('9',): 289,
  ('10',): 247,
  ('11',): 4821,
  ('12',): 3145,
  ('13',): 10718,
  ('14',): 1949,
  ('15',): 20036,
  ('16',): 6947,
  ('17',): 15049,
  ('18',): 2865,
  ('19',): 16370,
  ('20',): 416,
  ('21',): 38397,
  ('22',): 13020,
  ('23',): 15124,
  ('24',): 11523,
  ('25',): 1246,
  ('27',): 3111,
  ('28',): 2885,
  ('29',): 385,
  ('30',): 7992,
  ('31',): 5157,
  ('32',): 3703,
  ('33',): 27459,
  ('34',): 13047,
  ('35',): 13963,
  ('36',): 6066,
  ('37',): 2211,
  ('38',): 3843,
  ('39',): 4103,
  ('40',): 1311,
  ('41',): 499,
  ('42',): 5983,
  ('43',): 2300,
  ('44',): 5742,
  ('45',): 5622,
  ('47',): 713,
  ('48',): 1360,
  ('49',): 4080,
  ('50',): 5790,
  ('51',): 2517,
  ('52',): 2377,
  ('53',): 2768,
  ('55',): 326,
  ('56',): 284,
  ('57',): 3156,
  ('58',): 10006,
  ('59',): 42785,
  ('60',): 4399,
  ('61',): 1783,
  ('62',): 4

In [None]:
x = len(itemsets[1]) + len(itemsets[2]) + len(itemsets[3]) + len(itemsets[4])
print("There are in total",x,"itemset could be generate with minimum support = 0.0005, and k vary from 1 to 4")

There are in total 2749 itemset could be generate with minimum support = 0.0005, and k vary from 1 to 4


### Total time used to find all frequent itemset after optimisation in seconds:

In [None]:
after_op = t6_time + t7_time + t8_time + t9_time + t10_time 
after_op

37.890625

### Time difference in seconds:

In [None]:
#Total time changed
time_changed = before_op - after_op
time_changed

28.21875

In [None]:
#Setting minSupport 0.0001
first = t1_time - t6_time
first

11.296875

In [None]:
#Setting minSupport 0.0002
second = t2_time - t7_time
second

5.828125

In [None]:
#Setting minSupport 0.0003
third = t3_time - t8_time
third

4.46875

In [None]:
#Setting minSupport 0.0004
fourth = t4_time - t9_time
fourth

3.390625

In [None]:
#Setting minSupport 0.0005
fifth = t5_time - t10_time
fifth

3.234375

### Performance Comparison: Original vs. Accelerated Apriori Algorithm

Before using acceleration techniques, it took **66.1 seconds** to find all frequent patterns.

After implementing acceleration techniques, the execution time was reduced to **37.9 seconds**, making it **28.2 seconds faster** than the original approach while also identifying **more** frequent patterns.

#### Execution Time Improvement at Different Minimum Support Levels:
- **minSupport = 0.0001** → **11.3 seconds faster** than the original.
- **minSupport = 0.0002** → **5.8 seconds faster** than the original.
- **minSupport = 0.0003** → **4.5 seconds faster** than the original.
- **minSupport = 0.0004** → **3.4 seconds faster** than the original.
- **minSupport = 0.0005** → **3.2 seconds faster** than the original.

The acceleration method significantly improves efficiency, especially at lower minimum support values.
