Binary matrix as input #8
Labels
documentation
Improvements or additions to documentation
enhancement
New feature or request
help wanted
Extra attention is needed
Projects
Workflow
For many people in the machine learning community representing transactional datasets as we do in skmine is not something usual
Moreover, a common workflow we will encounter at some point consists in using the output of our Transformers as input to sklearn.
In pattern mining researchers are already familiar with matrix representations of transactional databases. Quoting Vreeken and Al. from SLIM, Section 2.1
Proposed solution
To bridge this gap we should show some example working with a transactional dataset
We should be able to
The most straight-forward solution seems to be using the sklearn.preprocessing.MultilabelBinarizer, in this way
Describe alternatives you've considered, if relevant
If scikit-learn does not fit out purpose we can still implement our own transformer.
But the preferred solution would be to make a PR to scikit-learn
Additional context
Note that the
MutliLabelBinarizer
is only suitable to mere itemsets, it does not work out of the box on eg. sequential itemsets.Even if it can be twicked for this purpose, eg:
The text was updated successfully, but these errors were encountered: