Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

JPMML-SkLearn Build Status

Java library and command-line application for converting Scikit-Learn pipelines to PMML.

Table of Contents



  • Functionality:
    • Three times more supported Python packages, transformers and estimators than all the competitors combined!
    • Thorough collection, analysis and encoding of feature information:
      • Names.
      • Data and operational types.
      • Valid, invalid and missing value spaces.
      • Descriptive statistics.
    • Pipeline extensions:
      • Pruning.
      • Decision engineering (prediction post-processing).
      • Model verification.
    • Conversion options.
  • Extensibility:
    • Rich Java APIs for developing custom converters.
    • Automatic discovery and registration of custom converters based on META-INF/ resource files.
    • Direct interfacing with other JPMML conversion libraries such as JPMML-H2O, JPMML-LightGBM, JPMML-StatsModels and JPMML-XGBoost.
  • Production quality:
    • Complete test coverage.
    • Fully compliant with the JPMML-Evaluator library.

Supported packages



Category Encoders

Examples: extensions/ and extensions/



Examples: extensions/




Examples: N/A


Examples: extensions/


Examples: extensions/

  • pycaret.internal.pipeline.Pipeline
  • pycaret.internal.preprocess.transformers.CleanColumnNames
  • pycaret.internal.preprocess.transformers.FixImbalancer
  • pycaret.internal.preprocess.transformers.RareCategoryGrouping
  • pycaret.internal.preprocess.transformers.RemoveMulticollinearity
  • pycaret.internal.preprocess.transformers.TransformerWrapper
  • pycaret.internal.preprocess.transformers.TransformerWrapperWithInverse

Examples: extensions/

  • sklego.meta.EstimatorTransformer
    • Predict functions apply, decision_function, predict and predict_proba.
  • sklego.pipeline.DebugPipeline
  • sklego.preprocessing.IdentityTransformer

Examples: and extensions/

  • Helpers:
    • sklearn2pmml.EstimatorProxy
    • sklearn2pmml.SelectorProxy
    • sklearn2pmml.h2o.H2OEstimatorProxy
  • Feature specification and decoration:
    • sklearn2pmml.decoration.Alias
    • sklearn2pmml.decoration.CategoricalDomain
    • sklearn2pmml.decoration.ContinuousDomain
    • sklearn2pmml.decoration.ContinuousDomainEraser
    • sklearn2pmml.decoration.DateDomain
    • sklearn2pmml.decoration.DateTimeDomain
    • sklearn2pmml.decoration.DiscreteDomainEraser
    • sklearn2pmml.decoration.MultiAlias
    • sklearn2pmml.decoration.MultiDomain
    • sklearn2pmml.decoration.OrdinalDomain
  • Ensemble methods:
    • sklearn2pmml.ensemble.EstimatorChain
    • sklearn2pmml.ensemble.GBDTLMRegressor
      • The GBDT side: All Scikit-Learn decision tree ensemble regressors, LGBMRegressor, XGBRegressor, XGBRFRegressor.
      • The LM side: A Scikit-Learn linear regressor (eg. ElasticNet, LinearRegression, SGDRegressor).
    • sklearn2pmml.ensemble.GBDTLRClassifier
      • The GBDT side: All Scikit-Learn decision tree ensemble classifiers, LGBMClassifier, XGBClassifier, XGBRFClassifier.
      • The LR side: A Scikit-Learn binary linear classifier (eg. LinearSVC, LogisticRegression, SGDClassifier).
    • sklearn2pmml.ensemble.SelectFirstClassifier
    • sklearn2pmml.ensemble.SelectFirstRegressor
  • Feature selection:
    • sklearn2pmml.feature_selection.SelectUnique
  • Linear models:
    • sklearn2pmml.statsmodels.StatsModelsClassifier
    • sklearn2pmml.statsmodels.StatsModelsRegressor
  • Neural networks:
    • sklearn2pmml.neural_network.MLPTransformer
  • Pipeline:
    • sklearn2pmml.pipeline.PMMLPipeline
  • Postprocessing:
    • sklearn2pmml.postprocessing.BusinessDecisionTransformer
  • Preprocessing:
    • sklearn2pmml.preprocessing.Aggregator
    • sklearn2pmml.preprocessing.BSplineTransformer
    • sklearn2pmml.preprocessing.CastTransformer
    • sklearn2pmml.preprocessing.ConcatTransformer
    • sklearn2pmml.preprocessing.CutTransformer
    • sklearn2pmml.preprocessing.DataFrameConstructor
    • sklearn2pmml.preprocessing.DateTimeFormatter
    • sklearn2pmml.preprocessing.DaysSinceYearTransformer
    • sklearn2pmml.preprocessing.ExpressionTransformer
      • Ternary conditional expression <expression_true> if <condition> else <expression_false>.
      • Array indexing expressions X[<column index>] and X[<column name>].
      • String concatenation expressions.
      • String slicing expressions <str>[<start>:<stop>].
      • Arithmetic operators +, -, *, / and %.
      • Identity comparison operators is None and is not None.
      • Comparison operators in <list>, not in <list>, <=, <, ==, !=, > and >=.
      • Logical operators and, or and not.
      • Numpy constant numpy.NaN.
      • Numpy function numpy.where.
      • Numpy universal functions (too numerous to list).
      • Pandas constants pandas.NA and pandas.NaT.
      • Pandas functions pandas.isna, pandas.isnull, pandas.notna and pandas.notnull.
      • Scipy functions scipy.special.expit and scipy.special.logit.
      • String functions startswith(<prefix>), endswith(<suffix>), lower, upper and strip.
      • String length function len(<str>)
    • sklearn2pmml.preprocessing.FilterLookupTransformer
    • sklearn2pmml.preprocessing.LookupTransformer
    • sklearn2pmml.preprocessing.MatchesTransformer
    • sklearn2pmml.preprocessing.MultiLookupTransformer
    • sklearn2pmml.preprocessing.NumberFormatter
    • sklearn2pmml.preprocessing.PMMLLabelBinarizer
    • sklearn2pmml.preprocessing.PMMLLabelEncoder
    • sklearn2pmml.preprocessing.PowerFunctionTransformer
    • sklearn2pmml.preprocessing.ReplaceTransformer
    • sklearn2pmml.preprocessing.SecondsSinceMidnightTransformer
    • sklearn2pmml.preprocessing.SecondsSinceYearTransformer
    • sklearn2pmml.preprocessing.StringNormalizer
    • sklearn2pmml.preprocessing.SubstringTransformer
    • sklearn2pmml.preprocessing.WordCountTransformer
    • sklearn2pmml.preprocessing.h2o.H2OFrameConstructor
    • sklearn2pmml.util.Reshaper
    • sklearn2pmml.util.Slicer
  • Rule sets:
    • sklearn2pmml.ruleset.RuleSetClassifier
  • Decision trees:
    • sklearn2pmml.tree.chaid.CHAIDClassifier
    • sklearn2pmml.tree.chaid.CHAIDRegressor


  • sklearn_pandas.CategoricalImputer
  • sklearn_pandas.DataFrameMapper



Examples: extensions/

  • tpot.builtins.stacking_estimator.StackingEstimator

Examples:, extensions/ and extensions/


The Python side of operations

Validating Python installation:

import joblib, sklearn, sklearn_pandas, sklearn2pmml


The JPMML-SkLearn side of operations

  • Java 1.8 or newer.


Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces a library JAR file pmml-sklearn/target/pmml-sklearn-1.7-SNAPSHOT.jar, and an executable uber-JAR file pmml-sklearn-example/target/pmml-sklearn-example-executable-1.7-SNAPSHOT.jar.


A typical workflow can be summarized as follows:

  1. Use Python to train a model.
  2. Serialize the model in pickle data format to a file in a local filesystem.
  3. Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Loading data to a pandas.DataFrame object:

import pandas

df = pandas.read_csv("Iris.csv")

iris_X = df[df.columns.difference(["Species"])]
iris_y = df["Species"]

First, creating a sklearn_pandas.DataFrameMapper object, which performs column-oriented feature engineering and selection work:

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain

column_preprocessor = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])

Second, creating Transformer and Selector objects, which perform table-oriented feature engineering and selection work:

from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
from sklearn2pmml import SelectorProxy

table_preprocessor = Pipeline([
    ("pca", PCA(n_components = 3)),
    ("selector", SelectorProxy(SelectKBest(k = 2)))

Please note that stateless Scikit-Learn selector objects need to be wrapped into an sklearn2pmml.SelectprProxy object.

Third, creating an Estimator object:

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(min_samples_leaf = 5)

Combining the above objects into a sklearn2pmml.pipeline.PMMLPipeline object, and running the experiment:

from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
    ("columns", column_preprocessor),
    ("table", table_preprocessor),
    ("classifier", classifier)
]), iris_y)

Recording feature importance information in a pickle data format-compatible manner:

classifier.pmml_feature_importances_ = classifier.feature_importances_

Embedding model verification data:

pipeline.verify(iris_X.sample(n = 15))

Storing the fitted PMMLPipeline object in pickle data format:

import joblib

joblib.dump(pipeline, "pipeline.pkl.z", compress = 9)

Please see the test script file for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the pipeline pickle file pipeline.pkl.z to a PMML file pipeline.pmml:

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.7-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml

Getting help:

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.7-SNAPSHOT.jar --help



Slightly outdated:


JPMML-SkLearn is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use JPMML-SkLearn in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-SkLearn available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact


Java library and command-line application for converting Scikit-Learn pipelines to PMML







No packages published