Skip to content

Commit

Permalink
Further improve generate_new_combinations
Browse files Browse the repository at this point in the history
The apriori-gen function described in section 2.1.1 of Apriori paper
has two steps; the first step had been implemented in 96dfd4d.

The second step of apriori-gen function is called prune step, it takes
candidates c from first step and check that all (k-1) tuples built by
removing any single element from c is in L(k-1).

Efficient lookups require some dedicated data structure; Apriori paper
describes how to do it with hash-trees; it is also possible to use
prefix trees (also knows as tries).

This commit uses third-party pygtrie module, to check whether this step
provides performance improvements in our case.  It can then be decided
to either keep this import or write a stripped down implementation.
  • Loading branch information
dbarbier committed Dec 18, 2019
1 parent 680e2d8 commit b90a146
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions mlxtend/frequent_patterns/apriori.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

import numpy as np
import pandas as pd
import pygtrie
from ..frequent_patterns import fpcommon as fpc


Expand Down Expand Up @@ -43,15 +44,24 @@ def generate_new_combinations(old_combinations):
"""

length = len(old_combinations)
trie = pygtrie.Trie(list(zip(old_combinations, [1]*length)))
for i, old_combination in enumerate(old_combinations):
*head_i, _ = old_combination
j = i + 1
while j < length:
*head_j, tail_j = old_combinations[j]
if head_i != head_j:
break
yield from old_combination
yield tail_j
# Prune old_combination+(item,) if any subset is not frequent
candidate = tuple(old_combination) + (tail_j,)
for idx in range(len(candidate)):
test_candidate = list(candidate)
del test_candidate[idx]
if tuple(test_candidate) not in trie:
# early exit from for-loop skips else clause just below
break
else:
yield from candidate
j = j + 1


Expand Down

0 comments on commit b90a146

Please sign in to comment.