init preprocessing/ module with an ItemsetEncoder #6

remiadon · 2020-04-21T19:18:30Z

Describe the workflow you want to enable

In pattern mining it is well established that items are represented as non-negative integers (ids). FIMI contain only integer entries.
Therefore we should implement an encoder to get from any raw data to an integer-based representation of items, getting closer from the kind of input one can find in the literature

Describe your proposed solution

Instantiate a preprocessing/ module with a simple class, namely ItemsetEncoder, to provide a simple 2-way mapping (something like a bi-dict) to encode any hashable items into integers, and the other way around

Describe alternatives you've considered, if relevant

It is not straightforward though that this brings a performance improvement for the mining method applied downstream. From my first benchmark it is true if the mining method makes a lot of inter-items comparisons (eg. our LCM implementation), but may not hold true for every method

Additional context

In terms of data-engineering : Should we make this integer-based representation mandatory and forbid any other input from our method ? This will come at the cost of harder data ingestion, and may not guarantee better algorithms ...

remiadon · 2020-06-16T13:38:00Z

two things here

pandas already provide some utilities to convert labeled indexes to their position, so better not reinvent the wheel
defining this "convert to integer" step as a preprocessing step makes an external post-processing step mandatory in most cases (convert integers back to labels)

From my own experience it's not very costly to do this as an internal step, eg in SLIM

So closing this

remiadon added the enhancement New feature or request label Apr 21, 2020

remiadon closed this as completed Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init preprocessing/ module with an ItemsetEncoder #6

init preprocessing/ module with an ItemsetEncoder #6

remiadon commented Apr 21, 2020

remiadon commented Jun 16, 2020

init preprocessing/ module with an ItemsetEncoder #6

init preprocessing/ module with an ItemsetEncoder #6

Comments

remiadon commented Apr 21, 2020

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

remiadon commented Jun 16, 2020