let fpgrowth and fpmax work directly with sparse input #622

If transactions are stored in a sparse matrix, it was first converted into a dense Numpy array in setup_fptree before processing. This commit allows to build the FPTree directly from sparse input, which may save memory and processing time. This is used by both fpgrowth and fpmax. There is no change after FPTree is being built.

In a212806 we assumed that all values in sparse DataFrame are non null, which should always be true. In order to avoid this corner case, we now call itemsets.eliminate_zeros(). The alternative would be to replace nonnull = itemsets.indices[itemsets.indptr[i]:itemsets.indptr[i+1]] by values = itemsets.data[itemsets.indptr[i]:itemsets.indptr[i+1]] nonnull = itemsets.indices[itemsets.indptr[i] + values.nonzero()[0]] but it is slower. Add a test case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

let fpgrowth and fpmax work directly with sparse input #622

let fpgrowth and fpmax work directly with sparse input #622

Commits on Nov 6, 2019