PMML evaluator library for the Apache Pig platform (http://pig.apache.org/).
- Full support for PMML specification versions 3.0 through 4.2. The evaluation is handled by the [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) library.
- Apache Pig version 0.8.0 or newer.
A working JPMML-Pig setup consists of a library JAR file and a number of model JAR files. The library JAR is centered around the utility class
org.jpmml.pig.PMMLUtil, which provides Pig compliant utility methods for handling most common PMML evaluation scenarios. A model JAR file contains one or more model launcher classes and a PMML resource.
The main responsibility of a model launcher class is to formalize the "public interface" of a PMML resource. A model launcher class must extend abstract Pig user-defined function (UDF) class
org.apache.pig.EvalFunc and provide concrete implementations for the following methods:
#exec(Tuple). Handled either by the method
#outputSchema(Schema). Handled by the method
All in all, a typical model launcher class can be implemented in 5 to 10 lines of boilerplate-esque Java source code.
The example model JAR file contains a DecisionTree model for the "iris" dataset. This model is exposed in two ways. First, the model launcher class
org.jpmml.pig.DecisionTreeIris defines a custom function that returns the PMML target field ("Species") together with four output fields ("Predicted_Species", "Probability_setosa", "Probability_versicolor", "Probability_virginica") as a tuple. Second, the model launcher class
org.jpmml.pig.DecisionTreeIris_Species defines a custom function that returns the PMML target field ("Species") as a string.
Enter the project root directory and build using [Apache Maven] (http://maven.apache.org/):
mvn clean install
The build produces two JAR files:
pmml-pig/target/pmml-pig-runtime-1.0-SNAPSHOT.jar- Library uber-JAR file. It contains the classes of the library JAR file
pmml-pig/target/pmml-pig-1.0-SNAPSHOT.jar, plus all the classes of its transitive dependencies.
pmml-pig-example/target/pmml-pig-example-1.0-SNAPSHOT.jar- Example model JAR file.
Add the library uber-JAR file to Pig classpath:
Add the example model JAR file to Pig classpath:
iris data set:
data = LOAD '/tmp/iris.csv' USING PigStorage(',') AS (Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double);
Evaluating this data set using an UDF:
result = FOREACH data GENERATE org.jpmml.pig.DecisionTreeIris(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width); DESCRIBE result; DUMP result;
JPMML-Pig is dual-licensed under the [GNU Affero General Public License (AGPL) version 3.0] (http://www.gnu.org/licenses/agpl-3.0.html) and a commercial license.