Skip to content
Go to file


Java library and command-line application for converting Scikit-Learn pipelines to PMML.

Table of Contents



The Python side of operations

Validating Python installation:

import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml


The JPMML-SkLearn side of operations

  • Java 1.8 or newer.


Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces an executable uber-JAR file target/jpmml-sklearn-executable-1.6-SNAPSHOT.jar.


A typical workflow can be summarized as follows:

  1. Use Python to train a model.
  2. Serialize the model in pickle data format to a file in a local filesystem.
  3. Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Loading data to a pandas.DataFrame object:

import pandas

df = pandas.read_csv("Iris.csv")

iris_X = df[df.columns.difference(["Species"])]
iris_y = df["Species"]

First, creating a sklearn_pandas.DataFrameMapper object, which performs column-oriented feature engineering and selection work:

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain

column_preprocessor = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])

Second, creating Transformer and Selector objects, which perform table-oriented feature engineering and selection work:

from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
from sklearn2pmml import SelectorProxy

table_preprocessor = Pipeline([
	("pca", PCA(n_components = 3)),
	("selector", SelectorProxy(SelectKBest(k = 2)))

Please note that stateless Scikit-Learn selector objects need to be wrapped into an sklearn2pmml.SelectprProxy object.

Third, creating an Estimator object:

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(min_samples_leaf = 5)

Combining the above objects into a sklearn2pmml.pipeline.PMMLPipeline object, and running the experiment:

from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
    ("columns", column_preprocessor),
    ("table", table_preprocessor),
    ("classifier", classifier)
]), iris_y)

Embedding model verification data:

pipeline.verify(iris_X.sample(n = 15))

Storing the fitted PMMLPipeline object in pickle data format:

from sklearn.externals import joblib

joblib.dump(pipeline, "pipeline.pkl.z", compress = 9)

Please see the test script file for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the pipeline pickle file pipeline.pkl.z to a PMML file pipeline.pmml:

java -jar target/jpmml-sklearn-executable-1.6-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml

Getting help:

java -jar target/jpmml-sklearn-executable-1.6-SNAPSHOT.jar --help



Slightly outdated:


JPMML-SkLearn is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use JPMML-SkLearn in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-SkLearn available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact


Java library and command-line application for converting Scikit-Learn pipelines to PMML



You can’t perform that action at this time.