Skip to content
PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.travis.yml Updated Travis CI configuration Feb 1, 2018
LICENSE.txt Import Nov 29, 2015
README.md
pom.xml

README.md

JPMML-Evaluator-Spark Build Status

PMML evaluator library for the Apache Spark cluster computing system (https://spark.apache.org/).

Features

  • Full support for PMML specification versions 3.0 through 4.3. The evaluation is handled by the JPMML-Evaluator library.

Prerequisites

  • Apache Spark version 2.0.X, 2.1.X, 2.2.X, 2.3.X or 2.4.X.

Installation

The JPMML-Evaluator-Spark library JAR file (together with accompanying Java source and Javadocs JAR files) is released via Maven Central Repository.

The current version is 1.2.2 (16 January, 2019).

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>jpmml-evaluator-spark</artifactId>
	<version>1.2.2</version>
</dependency>

A note about building and packaging JPMML-Evaluator-Spark applications. The JPMML-Evaluator library depends on JPMML-Model and Google Guava library versions that are in conflict with the ones that are bundled with Apache Spark and/or Apache Hadoop. This conflict can be easily solved by relocating JPMML-Evaluator library dependencies to a different namespace using the Apache Maven Shade Plugin.

Usage

Building a generic transformer based on a PMML byte stream:

InputStream pmmlIs = ...;

EvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
	.setLocatable(false)
	.setVisitors(new DefaultVisitorBattery())
	.load(pmmlIs);

Evaluator evaluator = evaluatorBuilder.build();

// Performing a self-check (duplicates as a warm-up)
evaluator.verify();

TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
	.withTargetCols()
	.withOutputCols()
	.exploded(false);

Transformer pmmlTransformer = pmmlTransformerBuilder.build();

Building an Apache Spark ML-style regressor when the PMML document is known to contain a regression model (eg. auto-mpg dataset):

TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
	.withLabelCol("MPG") // Double column
	.exploded(true);

Building an Apache Spark ML-style classifier when the PMML document is known to contain a classification model (eg. iris-species dataset):

TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
	.withLabelCol("Species") // String column
	.withProbabilityCol("Species_probability", Arrays.asList("setosa", "versicolor", "virginica")) // Vector column
	.exploded(true);

Scoring data:

Dataset<?> inputDs = ...;

Dataset<?> resultDs = pmmlTransformer.transform(inputDs);

In default mode, the transformation appends an intermediary "pmml" column to the data frame, which contains all the requested result columns:

root
 |-- Sepal_Length: double (nullable = true)
 |-- Sepal_Width: double (nullable = true)
 |-- Petal_Length: double (nullable = true)
 |-- Petal_Width: double (nullable = true)
 |-- pmml: struct (nullable = true)
 |    |-- Species: string (nullable = false)
 |    |-- Species_probability: vector (nullable = false)

In exploded mode, the transformation appends all the requested result columns to the data frame:

root
 |-- Sepal_Length: double (nullable = true)
 |-- Sepal_Width: double (nullable = true)
 |-- Petal_Length: double (nullable = true)
 |-- Petal_Width: double (nullable = true)
 |-- Species: string (nullable = false)
 |-- Species_probability: vector (nullable = false)

License

JPMML-Evaluator-Spark is dual-licensed under the GNU Affero General Public License (AGPL) version 3.0, and a commercial license.

Additional information

JPMML-Evaluator-Spark is developed and maintained by Openscoring Ltd, Estonia.

Interested in using JPMML software in your application? Please contact info@openscoring.io

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.