# Introduction to Frequent Pattern Mining

Pattern mining is an important subfield for data mining and has been applied on a variety of applications. The intuition behind pattern mining is to discover interesting or useful patterns using pattern mining algorithms. Pattern mining algorithms can be applied on various types of data to discover various types of pattern such as direct/indirect associations and subgraphs. 

In this tutorial, we will be focusing on the problem of mining frequent patterns. And we will be implementing one of the most popular algorithms for finding frequent patterns, the Apriori algorithm. This algorithm is first designed to discover common patterns of transactions made by customer in stores. One famous example is the Walmart's beer and diaper parable. Through mining hundreds of millions transaction records, Walmart surprisingly discovered that young American male who buy diapers tends to buy beer as well. The two products seem unrelated at all. One possible reason is that raising kids can be very stressful so dads decide to buy beer to relax. The sales of beer increased significantly after the beer is repositioned next to the diapers. Although this story is very likely an urban legend, the significant impact of the Apriori algorithm has on mining frequent patterns is undeniable. 

In the following sections, we will introduce the Apriori algorithm and walk you through the implementation of it step by step. On top of that, we will also be optimizing the algorithm to make if more computationally efficient.


# Apriori Algorithm
As mentioned earlier, the Apriori algorithm is a popular data mining algorithm for mining frequent datasets and association rules. It is commonly applied on large transaction databases such as customer transactions for market basket analysis and healthcare databases for adverse drug detection reactions. 

Before diving into how the algorithm works, we will first introduce two important measures that are used in the Apriori algorithm, support and confidence. 


### Support
The support of an itemset denotes the popularity of the itemset and is calculated by the proportion of the transactions in which the itemset X appears. 

### Confidence
The confidence calculates the likelihood of itemset Y being purchased when itemset X is being purchased. For example, if the association rule {'A', 'B'} ==> {'C', 'D'} has a 0.8 confidence interval, it means that 80% of the transactions that contains items A and B will also contain items C and D. 

### Property Used
The Apriori algorithm utilizes two important properties to reduce search space. The first property is called the anti-monotonicity property. The idea is very simple. If there are two itemsets X and Y and X is a subset of Y. Then the support of Y must be less than or equal to the support of X. In other words, the number of transactions that contain Y must not be greater than the number of transactions that contain X. The second property shares a similar idea. For any infrequent itemset, all of its supersets will be infrequent as well. 



Now let's go through the major steps of the Apriori algorithm with an example.
Here is the example transaction database to help illustrate the algorithm.

| Transaction ID | Apple | Bread | Cheese | Dumpling | Egg
| :- | :-: | :-: | :-: | :-: | :-: 
|1|1|1|0|0|1
|2|1|0|0|1|1
|3|0|0|1|1|0
|4|0|1|0|1|1
|5|1|1|1|0|1
|6|1|1|1|1|1


1\. Compute the frequency of all items (itemset with only one item) that have appeared in the transactions. 

| Itemset | Frequency 
| :- | :-: 
| Apple | 4
| Bread |  4
| Cheese | 3
| Dumpling | 4
| Egg | 5

2\. Filter out the items whose support is lower than the threshold since with the second property we know that all supersets of an infrequent itemset will also be infrequent. This will give us the single items that are purchased frequently. For example, if the min support threshold is 0.5, we would be left with

| Itemset | Frequency 
| :- | :-: 
| Apple (A) | 4
| Bread (B) |  4
| Dumpling (D) | 4
| Egg (E) | 5

3\. Based on the items we obtained from step 2, we then generate all possible itemsets of 2 items and record the occurrences of each itemset in the transactions.

| Itemset | Frequency 
| :- | :-: 
| AB | 3
| AD | 2
| AE | 4
| BD | 2
| BE | 4
| DE | 3

4\. Again, filter out the itemsets with support lower than the support threshold.

| Itemset | Frequency 
| :- | :-: 
| AB | 3
| AE | 4
| BE | 4
| DE | 3

5\. Compute the frequency table of itemsets with 3 items based on the frequent itemsets obtained from step 4.

| Itemset | Frequency 
| :- | :-: 
| ABE | 3
| BDE | 2

6\. Filter out itemsets under the min support threshold.

| Itemset | Frequency 
| :- | :-: 
| ABE | 3

Therefore, given 50% support threshold, the frequent common patterns are {A, B, D, E, AB, AE, BE, DE, ABE}

The Apriori algorithm can be divided into two major steps.
With k starting from 1, repeat the following steps until no new itemsets can be obtained by the self join rule.

1\. Find the frequent itemsets with k items from all transactions. 

2\. Find the frequent itemsets with k+1 items by applying the self join rule on the frequent itemsets with k items

In [1]:
from itertools import chain, combinations
import sys
import time

# Dataset
For this tutorial, we will be using the UCI Adult Data Set, which could be found here: https://archive.ics.uci.edu/ml/datasets/adult. The dataset contains the following fields of information for over 32,000 adults.

| Attribute | Type
| :- | :-: 
| age | continuous
| workclass | categorical
| fnlwgt | continuous
| education | categorical
| ed_num | continuous
| marital-status | categorical
| occupation | categorical
| relationship | categorical
| race | categorical
| sex | categorical
| capital-gain | continuous
| capital-loss | continuous
| hrs-per-week | continuous
| native-country | categorical
| annual-income  | categorical




We will start with parsing the adult_data.csv file. The purpose of this function is to parse the data file into a more readable format. Note that the iterator is used so we can deal with larger files.

In [2]:
# parse data file into a readable format
def parseFile(input_file):
    with open(input_file, 'r') as f:
        for line in f:
            lines = line.strip().split(',')
            lines[0] = 'age: '+lines[0]
            lines[1] = 'workclass:'+lines[1]
            lines[2] = 'fnlwgt:'+lines[2]
            lines[3] = 'education:'+lines[3]
            lines[4] = 'ed_num:'+lines[4]
            lines[5] = 'marital-status:'+lines[5]
            lines[6] = 'occupation:'+lines[6]
            lines[7] = 'relationship:'+lines[7]
            lines[8] = 'race:'+lines[8]
            lines[9] = 'sex:'+lines[9]
            lines[10] = 'capital-gain:'+lines[10]
            lines[11] = 'capital-loss:'+lines[11]
            lines[12] = 'hrs-per-week:'+lines[12]
            lines[13] = 'native-country:'+lines[13]
            lines[14] = 'annual-income:'+lines[14]
            # use iterator so we can deal with bigger files
            yield lines


In [3]:
# printing the first 10 lines of the parsed input file
count = 0
for line in parseFile('adult_data.csv'):
    if count == 10:
        break
    print (line)
    count += 1

['age: 39', 'workclass: State-gov', 'fnlwgt: 77516', 'education: Bachelors', 'ed_num: 13', 'marital-status: Never-married', 'occupation: Adm-clerical', 'relationship: Not-in-family', 'race: White', 'sex: Male', 'capital-gain: 2174', 'capital-loss: 0', 'hrs-per-week: 40', 'native-country: United-States', 'annual-income: <=50K']
['age: 50', 'workclass: Self-emp-not-inc', 'fnlwgt: 83311', 'education: Bachelors', 'ed_num: 13', 'marital-status: Married-civ-spouse', 'occupation: Exec-managerial', 'relationship: Husband', 'race: White', 'sex: Male', 'capital-gain: 0', 'capital-loss: 0', 'hrs-per-week: 13', 'native-country: United-States', 'annual-income: <=50K']
['age: 38', 'workclass: Private', 'fnlwgt: 215646', 'education: HS-grad', 'ed_num: 9', 'marital-status: Divorced', 'occupation: Handlers-cleaners', 'relationship: Not-in-family', 'race: White', 'sex: Male', 'capital-gain: 0', 'capital-loss: 0', 'hrs-per-week: 40', 'native-country: United-States', 'annual-income: <=50K']
['age: 53', 'w

In order to run the Apriori algorithm, we first need to create a set of all 1-itemsets (one_cset) that have appeared in the transactions. The initTransactionList function will take the input file, create the one_cset, store all transactions in the transaction_list, and return both of them.

In [4]:
# return a set of single items set that have appeared in the transaction
# one_cset: (A, B, C, ...)
def initTransactionList(input_file):
    # the transaction_list stores all the transactions
    # return the one_cset(C1) so we can start running the Apriori algorithm
    one_cset = set()
    transaction_list = []
    for record in parseFile(input_file):
        # use frozenset because it is hashable
        transaction = frozenset(record)
        # create the 1-item sets
        for item in transaction:
            one_cset.add(frozenset([item]))
        # append each transaction into the transaction list
        transaction_list.append(transaction)
    return one_cset, transaction_list

In [5]:
# print out the first 10 items of the one_cset and transaction_list
one_cset, transaction_list = initTransactionList('adult_data_test.csv')
for item in list(one_cset)[:10]:
    print(item)
for transaction in transaction_list[:10]:
    print(transaction)   

frozenset({'fnlwgt: 96975'})
frozenset({'fnlwgt: 183930'})
frozenset({'fnlwgt: 259014'})
frozenset({'capital-gain: 2176'})
frozenset({'native-country: Haiti'})
frozenset({'fnlwgt: 267989'})
frozenset({'fnlwgt: 188300'})
frozenset({'fnlwgt: 388093'})
frozenset({'fnlwgt: 309634'})
frozenset({'capital-gain: 15024'})
frozenset({'occupation: Adm-clerical', 'ed_num: 13', 'education: Bachelors', 'hrs-per-week: 40', 'annual-income: <=50K', 'native-country: United-States', 'race: White', 'workclass: State-gov', 'relationship: Not-in-family', 'fnlwgt: 77516', 'age: 39', 'capital-gain: 2174', 'sex: Male', 'capital-loss: 0', 'marital-status: Never-married'})
frozenset({'relationship: Husband', 'sex: Male', 'occupation: Exec-managerial', 'ed_num: 13', 'education: Bachelors', 'annual-income: <=50K', 'native-country: United-States', 'race: White', 'workclass: Self-emp-not-inc', 'hrs-per-week: 13', 'age: 50', 'marital-status: Married-civ-spouse', 'capital-loss: 0', 'fnlwgt: 83311', 'capital-gain: 0'})

The getLset function will be used to filter out itemsets in the Cset whose support is lower than the minimum support threshold and return the Lset. The freq_set is also passed in as an argument to keep track of the global count of all itemsets. The support for each item in the current Cset can be calculated based on the local_set dictionary. Only itemsets with support greater than the minimum threshold will be added to the returned Lset.

In [6]:
# calculate the support for each item in the itemset(cset)
# only add the item to the lset if it meets the minimum support requirement
def getLset(cset, transaction_list, freq_set, min_support):
    lset = set()
    local_set = {}
    # C1: local_set = {'A':countA,'B':countB,...}
    # calculate the count for each item
    for item in cset:
        for transaction in transaction_list:
            if item.issubset(transaction):
                # update global and local frequent set dictionaries
                # global
                if item not in freq_set:
                    freq_set[item] = 1
                else:
                    freq_set[item] += 1  
                # local
                if item not in local_set:
                    local_set[item] = 1
                else:
                    local_set[item] += 1
    # add the item to lset if it meets the minimum support requirement
    n = len(transaction_list)
    for item, count in local_set.items():
        support = count / n
        if support >= min_support:
            lset.add(item)
    return lset

In [7]:
# print out the first 10 itemsets in the one_lset and the updated freq_set to have a general idea of what one_lset and freq_set should look like
freq_set = {}
one_lset = getLset(one_cset, transaction_list, freq_set, 0.5)
for item in list(one_lset)[:10]:
    print(item)
for item in list(freq_set.items())[:10]:
    print(item)

frozenset({'annual-income: <=50K'})
frozenset({'workclass: Private'})
frozenset({'native-country: United-States'})
frozenset({'race: White'})
frozenset({'capital-loss: 0'})
frozenset({'capital-gain: 0'})
frozenset({'sex: Male'})
(frozenset({'fnlwgt: 96975'}), 1)
(frozenset({'fnlwgt: 183930'}), 1)
(frozenset({'fnlwgt: 259014'}), 1)
(frozenset({'capital-gain: 2176'}), 1)
(frozenset({'native-country: Haiti'}), 1)
(frozenset({'fnlwgt: 267989'}), 1)
(frozenset({'fnlwgt: 188300'}), 1)
(frozenset({'fnlwgt: 388093'}), 1)
(frozenset({'fnlwgt: 309634'}), 1)
(frozenset({'capital-gain: 15024'}), 9)


To this point, we have only recorded itemsets with only one item with support above the threshold. Therefore, we need a selfJoin function to create all possible 2-element itemsets from joining the 1-element itemsets with itself. Like the getLset function, the selfJoin function will be used repeatedly in the Apriori algorithm for joining (k-1)-element itemsets with itself to create k-element itemsets.

In [8]:
# join a set wtih itself(k-1-element itemsets) and return k-element itemsets
def selfJoin(item_set, k):
    joined_k = set()
    for i in item_set:
        for j in item_set:
            union_set = i.union(j)
            if len(union_set) == k:
                joined_k.add(union_set)
    return joined_k

In [9]:
# print out the first 10 2-element itemsets 
two_itemsets = selfJoin(one_lset, 2)
for item in list(two_itemsets)[:10]:
    print(item)

frozenset({'workclass: Private', 'annual-income: <=50K'})
frozenset({'race: White', 'annual-income: <=50K'})
frozenset({'native-country: United-States', 'capital-gain: 0'})
frozenset({'workclass: Private', 'capital-gain: 0'})
frozenset({'race: White', 'capital-gain: 0'})
frozenset({'native-country: United-States', 'annual-income: <=50K'})
frozenset({'sex: Male', 'capital-gain: 0'})
frozenset({'capital-loss: 0', 'workclass: Private'})
frozenset({'native-country: United-States', 'capital-loss: 0'})
frozenset({'capital-loss: 0', 'race: White'})


Now let's put all the helper functions together to perform the Apriori algorithm. The algorithm follows the following steps.

1\. Initialize the one_cset and the transaction list using the initTransactionList function

2\. Filter out infrequent itemsets from the current cset to get the current lset using the getLset function

3\. Call the selfJoin function on the current lset to form the next cset.

4\. Repeat step 2 and 3 until there are no items left in the current lset. In other words, no k+1-element cset could be formed.

The freq_set dictionary is used to keep track of all frequent itemsets and their counts. The n_item_set dictionary is used to keep track of all the frequent itemsets for each number of elements. With the n_item_set dictionary, we can then output frequent itemsets. On top of that, the corresponding association rules could be generated by doing a binary partition on the frequent itemsets obtained and filter out those with confidence below the minimum confidence threshold. 



In [10]:
def apriori(input_file, min_support, min_confidence):
    # for storing global frequent itemsets
    freq_set = {}
    n_item_set = {}
    items_with_support = [] # storing frequent itemsets
    association_rules = [] # storing association rules
    
    one_cset, transaction_list = initTransactionList(input_file)
    current_lset = getLset(one_cset, transaction_list, freq_set, min_support)
    k = 1
    empty_set = set()
    while(current_lset != empty_set):
        # storing the current lset into the dictionary
        n_item_set[k] = current_lset
        # self join to get next k+1-item cset (C_k+1)
        next_cset = selfJoin(current_lset, k+1)
        # Pass C_k+1 through the min support criteria to get the next_lset (L_k+1)
        next_lset = getLset(next_cset, transaction_list, freq_set, min_support)
        # move to the next lset
        current_lset = next_lset
        k += 1
    
    n = len(transaction_list)
    # n_item_set is in the structure of {n: Ln set}
    # n_item_set = {1:[A,B,C,D,E],2:[AB,AC,AD,BD,CE],3:[ABC,ABD,ACD]}
    # go through every item in every L set, compute their support and save it in items_with_support
    for value in n_item_set.values():
        for item in value:
            support = freq_set[item] / n
            items_with_support.append((tuple(item), support))
    # items_with_support will look like this
    # [(A,supportA),(B,supportB),...,((A,B),supportAB),....]

    for key, value in n_item_set.items():
        for item in value:
            subsets = chain(*[combinations(item, i + 1) for i, a in enumerate(item)])
            subset = set()
            for x in subsets:
                subset.add(frozenset(x))
            for element in subset:
                rest = item.difference(element)
                if item != element:
                    confidence = freq_set[item] / freq_set[element]
                    if confidence >= min_confidence:
                        association_rules.append(((tuple(element), tuple(rest)), confidence))
    return items_with_support, association_rules
    

Print out frequent itemsets sorted by support and association rules sorted by confidence.

In [11]:
def printFrequentItems(items_with_support):
    i = 1
    for item, support in sorted(items_with_support, key=lambda line: line[1], reverse=True):
        print ('Frequent Items %d: %s, Support: %.4f' % (i, str(item), support))
        i += 1
            
def printRules(association_rules):
    i = 1
    for rule, confidence in sorted(association_rules, key=lambda line: line[1], reverse=True):
        left_part = rule[0]
        right_part = rule[1]
        print ("Association Rule %d: %s ==> %s, Confidence: %.4f" % (i, str(left_part), str(right_part), confidence))
        i += 1

In [12]:
# Print out the frequent itemsets and association rules obtained from the Apriori algorithm of the input file.
input_file = 'adult_data.csv'
min_support = 0.7
min_confidence = 0.7
frequent_items, association_rules = apriori(input_file, min_support, min_confidence)
print("========== Frequent Common Patterns ==========")
printFrequentItems(frequent_items)
print("========== Association Rules ==========")
printRules(association_rules)


Frequent Items 1: ('capital-loss: 0',), Support: 0.9533
Frequent Items 2: ('capital-gain: 0',), Support: 0.9167
Frequent Items 3: ('native-country: United-States',), Support: 0.8959
Frequent Items 4: ('capital-loss: 0', 'capital-gain: 0'), Support: 0.8701
Frequent Items 5: ('race: White',), Support: 0.8543
Frequent Items 6: ('native-country: United-States', 'capital-loss: 0'), Support: 0.8535
Frequent Items 7: ('native-country: United-States', 'capital-gain: 0'), Support: 0.8200
Frequent Items 8: ('capital-loss: 0', 'race: White'), Support: 0.8129
Frequent Items 9: ('native-country: United-States', 'race: White'), Support: 0.7869
Frequent Items 10: ('race: White', 'capital-gain: 0'), Support: 0.7803
Frequent Items 11: ('native-country: United-States', 'capital-loss: 0', 'capital-gain: 0'), Support: 0.7776
Frequent Items 12: ('annual-income: <=50K',), Support: 0.7592
Frequent Items 13: ('native-country: United-States', 'capital-loss: 0', 'race: White'), Support: 0.7478
Frequent Items 14

# Optimization of the Apriori Algorithm

Although the Apriori algorithm is easy to implement and understand, it could take up a huge amount of time to compute as the data scales up. The reason is that Apriori generates all possible candidates and require multiple scans through the transaction list in order to determine if the candidates meet the minimum support threshold.  

However, we can significantly decrease the number of scans required and improve the runtime efficiency using the following trick. 

We first scan all transactions to generate L1, which contains the items, their support count, and the transaction IDs that they take place. We will use L1 to help generate L2, L2 to help generate L3,…, just like what we did in the original Apriori algorithm. However, unlike scanning through all the transaction list over again and again, we only need to scan through the transactions that the item with the min support count in the n-itemset took place. For example, let (x,y) be one of the 2-itemset of C2 and x has a lower support count than y. Then we only need to scan through the transactions with ID related with x. Since we need to have both x and y in the transaction, this is simply sufficient for us to determine the correct support count with the least number of scans. By doing so, the number of scans through the transaction list can be greatly decreased.

The idea is pretty straightforward and the implementation is very similar to the original one. Therefore, we won't be implementing the optimized algorithm step by step. Clear comments are provided and should be explanatory. 


In [13]:
def initTransactionListOptimized(input_file):
    L1_TID = {}
    one_cset = set()
    tid=0
    transaction_list = []
    for record in parseFile(input_file):
        transaction = frozenset(record)
        transaction_list.append(transaction)
        tid += 1
        for item in transaction:
            one_cset.add(frozenset([item]))
            # keeps track of the transaction IDs that each item has taken place
            if item not in L1_TID:
                L1_TID[item] = set([tid])
            else:
                L1_TID[item].add(tid)
    return one_cset, transaction_list, L1_TID

def getLsetOptimized(cset, transaction_list, L1_count, L1_TID, freq_set, min_support):
    lset = set()
    local_set = {}
    for item in cset:         
        if all(x in L1_count for x in list(item)):
            min_item = findMinItem(item, L1_count)
            # TID is a list of transaaction IDs for min_item
            TID = L1_TID[min_item]
            # only scan the transactions that has id in TID
            for index in TID:
                if item.issubset(transaction_list[index-1]):
                    if item not in freq_set:
                        freq_set[item] = 1
                    else:
                        freq_set[item] += 1
                    if item not in local_set:
                        local_set[item] = 1
                    else:
                        local_set[item] += 1
    n = len(transaction_list)
    for item, count in local_set.items():
        support = count / n
        if support >= min_support:
            lset.add(item)
            
    return lset

# split the items into single item and return the item with the minimum support
def findMinItem(items, L1_count):
    minimum = sys.maxsize
    for key, value in L1_count.items():
        if key in items and value < minimum:
            minimum = value
            min_key = key
    return min_key

# generate the content of the dictionary L1_count
def getL1Count(one_cset, transaction_list, min_support):
    L1_count = {}
    local_set = {}
    for item in one_cset:
        for transaction in transaction_list:
            if item.issubset(transaction):
                if item not in local_set:
                    local_set[item] = 1
                else:
                    local_set[item] += 1
    n = len(transaction_list)
    for item, count in local_set.items():
        support = count / n
        if support >= min_support:
            key = [k for k in item][0]
            L1_count[key] = count
    return L1_count

def apriori_optimized(input_file, min_support, min_confidence):    
    freq_set = {}
    n_item_set = {}
    items_with_support = [] # storing frequent itemsets
    association_rules = [] # storing association rules
    
    # L1_TID keeps track of the transaction IDs that each item has taken place
    one_cset, transaction_list, L1_TID = initTransactionListOptimized(input_file)
    # L1_count keeps track of the count of L1 itemsets
    L1_count = getL1Count(one_cset, transaction_list, min_support)
    current_lset = getLsetOptimized(one_cset, transaction_list, L1_count, L1_TID, freq_set, min_support)
    k = 1
    empty_set = set()
    
    while(current_lset != empty_set):
        # storing the current lset into the dictionary
        n_item_set[k] = current_lset
        # self join to get next k+1-item cset (C_k+1)
        next_cset = selfJoin(current_lset, k+1)
        # Pass C_k+1 through the min support criteria to get the next_lset (L_k+1)
        next_lset = getLsetOptimized(next_cset, transaction_list, L1_count, L1_TID, freq_set, min_support)
        # move to the next lset
        current_lset = next_lset
        k += 1
        
    n = len(transaction_list)
    for value in n_item_set.values():
        for item in value:
            support = freq_set[item] / n
            items_with_support.append((tuple(item), support))
    for key, value in n_item_set.items():
        for item in value:
            subsets = chain(*[combinations(item, i + 1) for i, a in enumerate(item)])
            subset = set()
            for x in subsets:
                subset.add(frozenset(x))
            for element in subset:
                rest = item.difference(element)
                if item != element:
                    confidence = freq_set[item] / freq_set[element]
                    if confidence >= min_confidence:
                        association_rules.append(((tuple(element), tuple(rest)), confidence))
    return items_with_support, association_rules

In [14]:
# Print out the frequent itemsets and association rules obtained from the Apriori algorithm of the input file.
items_with_support, association_rules = apriori_optimized(input_file, min_support, min_confidence)
print("========== Frequent Common Patterns ==========")
printFrequentItems(items_with_support)
print("========== Association Rules ==========")
printRules(association_rules)


Frequent Items 1: ('capital-loss: 0',), Support: 0.9533
Frequent Items 2: ('capital-gain: 0',), Support: 0.9167
Frequent Items 3: ('native-country: United-States',), Support: 0.8959
Frequent Items 4: ('capital-loss: 0', 'capital-gain: 0'), Support: 0.8701
Frequent Items 5: ('race: White',), Support: 0.8543
Frequent Items 6: ('native-country: United-States', 'capital-loss: 0'), Support: 0.8535
Frequent Items 7: ('native-country: United-States', 'capital-gain: 0'), Support: 0.8200
Frequent Items 8: ('capital-loss: 0', 'race: White'), Support: 0.8129
Frequent Items 9: ('native-country: United-States', 'race: White'), Support: 0.7869
Frequent Items 10: ('race: White', 'capital-gain: 0'), Support: 0.7803
Frequent Items 11: ('native-country: United-States', 'capital-loss: 0', 'capital-gain: 0'), Support: 0.7776
Frequent Items 12: ('annual-income: <=50K',), Support: 0.7592
Frequent Items 13: ('native-country: United-States', 'capital-loss: 0', 'race: White'), Support: 0.7478
Frequent Items 14

# Other resources
In this tutorial, we introduced the Apriori algorithm and one way to optimize it by reducing the number of required scans. If you would like to know more about the algorithm and other optimization techniques, here are some relevant resources. Hope you enjoyed this tutorial! 

1\. A Method to Optimize Apriori Algorithm for Frequent Items Mining: http://ieeexplore.ieee.org/document/7064142/?reload=true

2\. The Optimization and Improvement of the Apriori Algorithm: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4732130

3\. New Approach to Optimize the Time of Association Rules Extraction: https://arxiv.org/pdf/1312.4800.pdf

