Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples master: Add and use jupytext to convert all examples to notebooks Oct 27, 2018
recipes release/0.7.0: Improve and replace 'porter' by native sdist Nov 11, 2018
sklearn_porter release/0.7.0: Improve and replace 'porter' by native sdist Nov 11, 2018
tests release/0.7.0: Fix CLI module path Nov 12, 2018
.gitignore master: Add and use jupytext to convert all examples to notebooks Oct 27, 2018
.pylintrc Improve code quality (Pylint) May 13, 2017
.ruby-version Upgrade Ruby to v2.3.4 Jul 10, 2017
.travis.yml master: Enable pip cache in CI Oct 27, 2018
MANIFEST.in Start changelog file and add it to the future releases Sep 9, 2017
Makefile release/0.7.0: Improve and replace 'porter' by native sdist Nov 11, 2018
changelog.md release/0.7.0: Refactor Shell and Environment class Nov 11, 2018
environment.yml master: Refactor requirements, scripts and testing Oct 25, 2018
license.txt master: Update year Jun 25, 2018
pytest.ini master: Refactor requirements, scripts and testing Oct 25, 2018
readme.md release/0.7.0: Improve and replace 'porter' by native sdist Nov 11, 2018
requirements.development.txt master: Add and use jupytext to convert all examples to notebooks Oct 27, 2018
requirements.examples.txt master: Refactor requirements, scripts and testing Oct 25, 2018
requirements.txt master: Refactor requirements, scripts and testing Oct 25, 2018
setup.cfg Add license file Jan 31, 2017
setup.py release/0.7.0: Improve and replace 'porter' by native sdist Nov 11, 2018

readme.md

sklearn-porter

Build Status PyPI PyPI GitHub license Join the chat at https://gitter.im/nok/sklearn-porter Twitter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.

Machine learning algorithms

Algorithm Programming language
Classifier Java * JS C Go PHP Ruby
svm.SVC , ✓ ᴵ
svm.NuSVC , ✓ ᴵ
svm.LinearSVC , ✓ ᴵ
tree.DecisionTreeClassifier , ✓ ᴱ, ✓ ᴵ , ✓ ᴱ , ✓ ᴱ , ✓ ᴱ , ✓ ᴱ , ✓ ᴱ
ensemble.RandomForestClassifier ✓ ᴱ, ✓ ᴵ ✓ ᴱ ✓ ᴱ ✓ ᴱ ✓ ᴱ ✓ ᴱ
ensemble.ExtraTreesClassifier ✓ ᴱ, ✓ ᴵ ✓ ᴱ ✓ ᴱ ✓ ᴱ ✓ ᴱ
ensemble.AdaBoostClassifier ✓ ᴱ, ✓ ᴵ ✓ ᴱ, ✓ ᴵ ✓ ᴱ
neighbors.KNeighborsClassifier , ✓ ᴵ , ✓ ᴵ
naive_bayes.GaussianNB , ✓ ᴵ
naive_bayes.BernoulliNB , ✓ ᴵ
neural_network.MLPClassifier , ✓ ᴵ , ✓ ᴵ
Regressor
neural_network.MLPRegressor

✓ = is full-featured, ᴱ = with embedded model data, ᴵ = with imported model data, * = default language

Installation

$ pip install sklearn-porter

If you want the latest changes, you can install the module from the master branch:

$ pip uninstall -y sklearn-porter
$ pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master

Usage

Export

The following example demonstrates how you can transpile a decision tree estimator to Java:

from sklearn.datasets import load_iris
from sklearn.tree import tree
from sklearn_porter import Porter

# load data and train the classifier:
samples = load_iris()
X, y = samples.data, samples.target
clf = tree.DecisionTreeClassifier()
clf.fit(X, y)

# export:
porter = Porter(clf, language='java')
output = porter.export(embed_data=True)
print(output)

The exported result matches the official human-readable version of the decision tree.

Integrity

You should always check and compute the integrity between the original and the transpiled estimator:

# ...
porter = Porter(clf, language='java')

# accuracy:
integrity = porter.integrity_score(X)
print(integrity)  # 1.0

Prediction

You can compute the prediction(s) in the target programming language:

# ...
porter = Porter(clf, language='java')

# prediction(s):
Y_java = porter.predict(X)
y_java = porter.predict(X[0])
y_java = porter.predict([1., 2., 3., 4.])

Notebooks

You can run and test all notebooks by starting a Jupyter notebook server locally:

$ make open.examples
$ make stop.examples 

Command-line interface

In general you can use the porter on the command line:

$ porter --input <PICKLE_FILE> [--output <DEST_DIR>]
         [--class_name <CLASS_NAME>] [--method_name <METHOD_NAME>]
         [--export] [--checksum] [--data] [--pipe]
         [--c] [--java] [--js] [--go] [--php] [--ruby]
         [--help] [--version]

The following example shows how you can save a trained estimator to the pickle format:

# ...

# extract estimator:
joblib.dump(clf, 'estimator.pkl', compress=0)

After that the estimator can be transpiled to JavaScript by using the following command:

$ porter -i estimator.pkl --js

The target programming language is changeable on the fly:

$ porter -i estimator.pkl --c
$ porter -i estimator.pkl --java
$ porter -i estimator.pkl --php
$ porter -i estimator.pkl --java
$ porter -i estimator.pkl --ruby

For further processing the argument --pipe can be used to pass the result:

$ porter -i estimator.pkl --js --pipe > estimator.js

For instance the result can be minified by using UglifyJS:

$ porter -i estimator.pkl --js --pipe | uglifyjs --compress -o estimator.min.js

Development

Environment

You have to install required modules for broader development:

$ make install.environment  # conda environment (optional)
$ make install.requirements.development  # pip requirements

Independently, the following compilers and intepreters are required to cover all tests:

Name Version Command
GCC >=4.2 gcc --version
Java >=1.6 java -version
PHP >=5.6 php --version
Ruby >=2.4.1 ruby --version
Go >=1.7.4 go version
Node.js >=6 node --version

Testing

The tests cover module functions as well as matching predictions of transpiled estimators. Start all tests with:

$ make test 

The test files have a specific pattern: '[Algorithm][Language]Test.py':

$ pytest tests -v -o python_files='RandomForest*Test.py'
$ pytest tests -v -o python_files='*JavaTest.py'

While you are developing new features or fixes, you can reduce the test duration by changing the number of tests:

$ N_RANDOM_FEATURE_SETS=5 N_EXISTING_FEATURE_SETS=10 \
  pytest tests -v -o python_files='*JavaTest.py'

Quality

It's highly recommended to ensure the code quality. For that Pylint is used. Start the linter with:

$ make lint

Citation

If you use this implementation in you work, please add a reference/citation to the paper. You can use the following BibTeX entry:

@unpublished{skpodamo,
  author = {Darius Morawiec},
  title = {sklearn-porter},
  note = {Transpile trained scikit-learn estimators to C, Java, JavaScript and others},
  url = {https://github.com/nok/sklearn-porter}
}

License

The module is Open Source Software released under the MIT license.

Questions?

Don't be shy and feel free to contact me on Twitter or Gitter.