## hmine: Frequent itemsets via the Hmine algorithm

Hmine function to extract frequent itemsets for association rule mining

> from mlxtend.frequent_patterns import hmine

## Overview

H-mine [1] (memory-based hyperstructure mining of frequent patterns) is developed the method is extended to handle large and or dense databases.

H-struct, and a new mining algorithm, **H-mine** , which takes advantage of this data structure and dynamically adjusts links in the mining process.
A distinct feature of this method is that it has very limited and precisely predictable space overhead and runs really fast in memory-based setting. Moreover, it can be scaled up
to very large databases by database partitioning, and when the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process.

## References

[1] Pei J, Han J, Lu H, Nishio S, Tang S and Yang D, "[H-Mine: Fast and space-preserving frequent pattern mining in large databases.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bcde042283427e23094f9d4d2b765771db5aa57f)" IIE Transactions, Vol. 39, pp. 593–605, 2007.

## Related
- [FP-Growth](./fpgrowth.md)
- [FP-Max](./fpmax.md)
- [Apriori](./apriori.md)

## Example 1 -- Generating Frequent Itemsets

The `hmine` function expects data in a one-hot encoded pandas DataFrame.
Suppose we have the following transaction data:

In [1]:
dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

We can transform it into the right format via the `TransactionEncoder` as follows:

In [2]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,Apple,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Unicorn,Yogurt
0,False,False,False,True,False,True,True,True,True,False,True
1,False,False,True,True,False,True,False,True,True,False,True
2,True,False,False,True,False,True,True,False,False,False,False
3,False,True,False,False,False,True,True,False,False,True,True
4,False,True,False,True,True,True,False,False,True,False,False


Now, let us return the items and itemsets with at least 60% support:

In [3]:
from mlxtend.frequent_patterns import hmine

hmine(df, min_support=0.6)

Unnamed: 0,support,itemsets
0,0.8,(3)
1,0.8,"(3, 5)"
2,0.6,"(8, 3, 5)"
3,0.6,"(8, 3)"
4,1.0,(5)
5,0.6,"(5, 6)"
6,0.6,"(8, 5)"
7,0.6,"(10, 5)"
8,0.6,(6)
9,0.6,(8)


By default, `hmine` returns the column indices of the items, which may be useful in downstream operations such as association rule mining. For better readability, we can set `use_colnames=True` to convert these integer values into the respective item names: 

In [4]:
hmine(df, min_support=0.6, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,0.8,"(Eggs, Kidney Beans)"
2,0.6,"(Eggs, Kidney Beans, Onion)"
3,0.6,"(Eggs, Onion)"
4,1.0,(Kidney Beans)
5,0.6,"(Milk, Kidney Beans)"
6,0.6,"(Kidney Beans, Onion)"
7,0.6,"(Yogurt, Kidney Beans)"
8,0.6,(Milk)
9,0.6,(Onion)


## Example 2 -- Hmine versus Apriori and FPGrowth

Since the `hmine` algorithm is a memory-based algorithm, it can be magnitudes faster than the alternative Apriori algorithm for large datasets. However, it can be much slower than the FpGrowth algorithm. In the following example, we compare the performance of `hmine` with the `apriori` and `fpgrowth` algorithms on a large dataset.

In [5]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

In [6]:
from mlxtend.frequent_patterns import apriori

%timeit -n 100 -r 10 apriori(df, min_support=0.6, use_colnames=True)

3.41 ms ± 584 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)


In [7]:
%timeit -n 100 -r 10 apriori(df, min_support=0.6, use_colnames=True, low_memory=True)

3.36 ms ± 404 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)


In [8]:
from mlxtend.frequent_patterns import fpgrowth

%timeit -n 100 -r 10 fpgrowth(df, min_support=0.6, use_colnames=True)

1.18 ms ± 76.7 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)


In [9]:
from mlxtend.frequent_patterns import hmine

%timeit -n 100 -r 10 hmine(df, min_support=0.6, use_colnames=True)

1.44 ms ± 94.8 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)


## Example 3 -- Working with Sparse Representations

To save memory, you may want to represent your transaction data in the sparse format.
This is especially useful if you have lots of products and small transactions.

In [10]:
oht_ary = te.fit(dataset).transform(dataset, sparse=True)
sparse_df = pd.DataFrame.sparse.from_spmatrix(oht_ary, columns=te.columns_)
sparse_df

Unnamed: 0,Apple,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Unicorn,Yogurt
0,0,0,0,1,0,True,1,1,1,0,1
1,0,0,1,1,0,True,0,1,1,0,1
2,1,0,0,1,0,True,1,0,0,0,0
3,0,1,0,0,0,True,1,0,0,1,1
4,0,1,0,1,1,True,0,0,1,0,0


In [11]:
hmine(sparse_df, min_support=0.6, use_colnames=True, verbose=1)

2 itemset(s) from the suffixes on item(s) (Eggs)
1 itemset(s) from the suffixes on item(s) (Eggs, Kidney Beans)
0 itemset(s) from the suffixes on item(s) (Eggs, Kidney Beans, Onion)
0 itemset(s) from the suffixes on item(s) (Eggs, Onion)
3 itemset(s) from the suffixes on item(s) (Kidney Beans)
0 itemset(s) from the suffixes on item(s) (Kidney Beans, Milk)
0 itemset(s) from the suffixes on item(s) (Kidney Beans, Onion)
0 itemset(s) from the suffixes on item(s) (Kidney Beans, Yogurt)
0 itemset(s) from the suffixes on item(s) (Milk)
0 itemset(s) from the suffixes on item(s) (Onion)
0 itemset(s) from the suffixes on item(s) (Yogurt)


Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,0.8,"(Eggs, Kidney Beans)"
2,0.6,"(Eggs, Kidney Beans, Onion)"
3,0.6,"(Eggs, Onion)"
4,1.0,(Kidney Beans)
5,0.6,"(Milk, Kidney Beans)"
6,0.6,"(Kidney Beans, Onion)"
7,0.6,"(Yogurt, Kidney Beans)"
8,0.6,(Milk)
9,0.6,(Onion)


## More Examples

Please note that since the `hmine` function is a drop-in replacement for `apriori`, it comes with the same set of function arguments and return arguments. Thus, for more examples, please see the [`apriori`](./apriori.md) documentation.

## API

In [None]:
with open('../../api_modules/mlxtend.frequent_patterns/hmine.md', 'r') as f:
    print(f.read())