Skip to content

Latest commit

 

History

History
352 lines (277 loc) · 14.1 KB

readme.md

File metadata and controls

352 lines (277 loc) · 14.1 KB

sklearn-porter

Build Status PyPI PyPI GitHub license Join the chat at https://gitter.im/nok/sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.

Machine learning algorithms

Algorithm Programming language
Classification C Java* JavaScript Go PHP Ruby
sklearn.svm.SVC
sklearn.svm.NuSVC
sklearn.svm.LinearSVC
sklearn.tree.DecisionTreeClassifier
sklearn.ensemble.RandomForestClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.ensemble.AdaBoostClassifier
sklearn.neighbors.KNeighborsClassifier
sklearn.neural_network.MLPClassifier
sklearn.naive_bayes.GaussianNB
sklearn.naive_bayes.BernoulliNB
Regression
sklearn.neural_network.MLPRegressor

✓ = is full-featured, ○ = has minor exceptions, * = default language

Installation

pip install sklearn-porter

If you want the latest bleeding edge changes, you can install the module from the master (development) branch:

pip uninstall -y sklearn-porter
pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master

Minimum requirements

- python>=2.7.3
- scikit-learn>=0.14.1

If you want to transpile a multilayer perceptron, you have to upgrade the scikit-learn package:

- python>=2.7.3
- scikit-learn>=0.18.0

Usage

Export

The following example shows how you can port a decision tree model to Java:

from sklearn.datasets import load_iris
from sklearn.tree import tree
from sklearn_porter import Porter

# Load data and train the classifier:
samples = load_iris()
X, y = samples.data, samples.target
clf = tree.DecisionTreeClassifier()
clf.fit(X, y)

# Export:
porter = Porter(clf, language='java')
output = porter.export()
print(output)

The exported result matches the official human-readable version of the decision tree.

Prediction

Run the prediction(s) in the target programming language directly:

# ...
porter = Porter(clf, language='java')

# Prediction(s):
Y_preds = porter.predict(X)
y_pred = porter.predict(X[0])
y_pred = porter.predict([1., 2., 3., 4.])

Accuracy

Always compute the accuracy between the original and the ported estimator:

# ...
porter = Porter(clf, language='java')

# Accuracy:
accuracy = porter.predict_test(X)
print(accuracy) # 1.0

Command-line interface

This example shows how you can port a model from the command line. First of all you have to store the model to the pickle format:

# ...

# Extract estimator:
joblib.dump(clf, 'estimator.pkl')

After that the model can be transpiled by using the following command:

python -m sklearn_porter --input <PICKLE_FILE> [--output <DEST_DIR>] [--pipe] [--c] [--java] [--js] [--go] [--php] [--ruby]
python -m sklearn_porter -i <PICKLE_FILE> [-o <DEST_DIR>] [-p] [--c] [--java] [--js] [--go] [--php] [--ruby]

For instance the following command transpiles the estimator to the target programming language JavaScript:

python -m sklearn_porter -i estimator.pkl --js

The target programming language is changeable on the fly:

python -m sklearn_porter -i estimator.pkl --c
python -m sklearn_porter -i estimator.pkl --go
python -m sklearn_porter -i estimator.pkl --php
python -m sklearn_porter -i estimator.pkl --java
python -m sklearn_porter -i estimator.pkl --ruby

The transpiled estimator is useable for further processing by using the --pipe parameter:

python -m sklearn_porter -i estimator.pkl --js --pipe > estimator.js

For instance the generated JavaScript code can be minified by using UglifyJS:

python -m sklearn_porter -i estimator.pkl --js --pipe | uglifyjs --compress -o estimator.min.js 

Further information will be shown by using the --help parameter:

python -m sklearn_porter --help
python -m sklearn_porter -h

Development

Environment

Install the required environment modules by executing the script environment.sh:

bash ./recipes/environment.sh
conda env create -c conda-forge -n sklearn-porter python=2 -f environment.yml
source activate sklearn-porter

The following compilers or intepreters are required to cover all tests:

Testing

The tests cover module functions as well as matching predictions of transpiled models. Run all tests by executing the script test.sh:

bash ./recipes/test.sh
python -m unittest discover -vp '*Test.py'

The test files have a specific pattern: '[Algorithm][Language]Test.py':

python -m unittest discover -vp 'RandomForest*Test.py'
python -m unittest discover -vp '*JavaTest.py'

While you are developing new features or fixes, you can reduce the test duration by setting the number of model tests:

N_RANDOM_FEATURE_SETS=15 N_EXISTING_FEATURE_SETS=30 python -m unittest discover -vp '*Test.py'

Quality

It's highly recommended to ensure the code quality. For that I use Pylint, which you can run by executing the script lint.sh:

bash ./recipes/lint.sh
find ./sklearn_porter -name '*.py' -exec pylint {} \;

License

The module is Open Source Software released under the MIT license.

Questions?

Don't be shy and feel free to contact me on Twitter or Gitter.