## Section 1 -- Generating Frequent Itemsets

`from FIM import apriori`

The `apriori` function expects data in a one-hot encoded pandas DataFrame.
Suppose we have the following transaction data:

In [12]:
data = [['onion', 'beer', 'crisps', 'beef'],
        ['beer', 'tomato', 'crisps', 'eggs'],
        ['onion', 'crisps', 'eggs'],
        ['beer', 'eggs', 'beef'],
        ['onion', 'beer', 'carrot', 'crisps'],
        ['onion', 'eggs', 'beef'],
        ['onion', 'beer', 'carrot', 'crisps', 'eggs', 'beef'],
        ['onion', 'beer', 'crisps', 'eggs'],
        ['beer', 'tomato', 'carrot', 'eggs'],
        ['onion', 'crisps', 'eggs', 'beef'],
        ['beer', 'carrot', 'crisps', 'eggs']]

We can transform it into the right format via the `TransactionEncoder` as follows:

In [13]:
from FIM.utils import TransactionEncoder
te = TransactionEncoder()
df = te.fit_transform(data, set_pandas=True)
df

Unnamed: 0,beef,beer,carrot,crisps,eggs,onion,tomato
0,True,True,False,True,False,True,False
1,False,True,False,True,True,False,True
2,False,False,False,True,True,True,False
3,True,True,False,False,True,False,False
4,False,True,True,True,False,True,False
5,True,False,False,False,True,True,False
6,True,True,True,True,True,True,False
7,False,True,False,True,True,True,False
8,False,True,True,False,True,False,True
9,True,False,False,True,True,True,False


Now, let us return the items and itemsets with at least 30% support:

In [14]:
from FIM import apriori
apriori(df, min_support=0.3)

Unnamed: 0,support,itemsets
0,0.454545,(beef)
1,0.727273,(beer)
2,0.363636,(carrot)
3,0.727273,(crisps)
4,0.818182,(eggs)
5,0.636364,(onion)
6,0.363636,"(beef, eggs)"
7,0.363636,"(beef, onion)"
8,0.363636,"(beer, carrot)"
9,0.545455,"(beer, crisps)"


By default, `apriori` returns the column name of items.

In [15]:
apriori(df, min_support=0.3, show_colnames=False)

Unnamed: 0,support,itemsets
0,0.454545,(0)
1,0.727273,(1)
2,0.363636,(2)
3,0.727273,(3)
4,0.818182,(4)
5,0.636364,(5)
6,0.363636,"(0, 4)"
7,0.363636,"(0, 5)"
8,0.363636,"(1, 2)"
9,0.545455,"(1, 3)"


`show_colnames=False` arg. , `apriori` returns the column indices of the items, which may be useful in downstream operations such as association rule mining.

In [16]:
apriori(df, min_support=0.3, show_colnames=True, max_len=2)

Unnamed: 0,support,itemsets
0,0.454545,(beef)
1,0.727273,(beer)
2,0.363636,(carrot)
3,0.727273,(crisps)
4,0.818182,(eggs)
5,0.636364,(onion)
6,0.363636,"(beef, eggs)"
7,0.363636,"(beef, onion)"
8,0.363636,"(beer, carrot)"
9,0.545455,"(beer, crisps)"


`max_len` arg. , we can specify the maximum length of the itemsets returned.

## Section 2 -- Selecting and Filtering Results

The advantage of working with pandas `DataFrames` is that we can use its convenient features to filter the results. For example, let's suppose we are only interested in itemsets of length 2 that have a support of at least 50%. We can get the desired results as follows:

In [18]:
frequent_itemsets = apriori(df, min_support=0.3, show_colnames=True)
frequent_itemsets["length"] = frequent_itemsets["itemsets"].apply(lambda x: len(x))
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.454545,(beef),1
1,0.727273,(beer),1
2,0.363636,(carrot),1
3,0.727273,(crisps),1
4,0.818182,(eggs),1
5,0.636364,(onion),1
6,0.363636,"(beef, eggs)",2
7,0.363636,"(beef, onion)",2
8,0.363636,"(beer, carrot)",2
9,0.545455,"(beer, crisps)",2


In [19]:
frequent_itemsets[ (frequent_itemsets['length'] == 2) &
                   (frequent_itemsets['support'] >= 0.5) ]

Unnamed: 0,support,itemsets,length
9,0.545455,"(beer, crisps)",2
10,0.545455,"(beer, eggs)",2
12,0.545455,"(eggs, crisps)",2
13,0.545455,"(onion, crisps)",2
