-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update slim doc and add sklearn compt doc page
- Loading branch information
Showing
4 changed files
with
437 additions
and
51 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,3 +21,5 @@ Welcome to scikit-mine's documentation! | |
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: About | ||
|
||
sklearn_compat.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
. highlight:: shell | ||
|
||
=============================== | ||
Compatibility with Scikit-Learn | ||
=============================== | ||
`scikit-learn <https://scikit-learn.org/stable/>`_ is the golden standard for general | ||
puprose machine learning. As a rule of thumb, we follow scikit-learn functional definitions. | ||
|
||
----------------- | ||
|
||
*scikit-learn* is a library for statistical learning, or **machine-learning**. | ||
|
||
*scikit-mine*, on its side, is a library for (yet statistical) **pattern mining**. | ||
|
||
So what does this change ? | ||
*scikit-mine* gives more attention to discrete values, because **it looks for co-occuring symbols in the data**. | ||
To this purpose, we sometimes need to extend scikit-learn capabilities to tightly integrate the notion | ||
of symbols in our learning processes. | ||
|
||
|
||
Preprocessing | ||
------------- | ||
The Preprocessing module implements a set of Transformers/Encoders | ||
to get you from raw data to a more advanced, structured kind of data : | ||
the kind a data that is easily manageable and prone to give you the best performance. | ||
|
||
Sometimes *scikit-learn* provides us the tools we exactly need, sometimes not. | ||
**Scikit-mine addresses data ingestion by implementing its own preprocessing blocks, | ||
in a fully scikit-learn compatible way**. | ||
|
||
The *preprocessing* module is designed to take all of the burden off you, and manage ingestion | ||
in a smooth way : use it !! | ||
|
||
|
||
Pipelines | ||
--------- | ||
scikit-mine models are designed for possible integration in `scikit-learn pipelines <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_. | ||
This makes possible to build "symbolic classifiers", using scikit-mine pattern encoding schemes | ||
to serve predictions. See the tutorials sections. | ||
|
||
|
||
Other implementation details | ||
---------------------------- | ||
We use `joblib <https://joblib.readthedocs.io/en/latest/>`_ as default to parallelise our code. | ||
We also set the *prefer* parameter when instantiating `joblib.Parallel <https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html>`_, | ||
so users don't have to manually choose between threads and processes for optimal execution. | ||
|
||
|
||
Finally, we also leverage `Cython <https://cython.org/>`_ code where performance matters. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.