PMML evaluator library for the Cascading application framework (http://www.cascading.org/)
Java
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
pmml-cascading-example
pmml-cascading
.travis.yml
LICENSE.txt
NOTICE.txt
README.md
pom.xml

README.md

JPMML-Cascading Build Status

PMML evaluator library for the Cascading application framework.

Features

  • Full support for PMML specification versions 3.0 through 4.2. The evaluation is handled by the JPMML-Evaluator library.

Prerequisites

  • Cascading application framework version 2.2.0 or greater.

Installation

Library

JPMML-Cascading library JAR file is released via Maven Central Repository.

The current version is 1.2.2 (19 February, 2016).

<dependency>
    <groupId>org.jpmml</groupId>
    <artifactId>pmml-cascading</artifactId>
    <version>1.2.2</version>
</dependency>

Example Hadoop job

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces two JAR files:

  • pmml-cascading/target/pmml-cascading-1.2-SNAPSHOT.jar - Library JAR file.
  • pmml-cascading-example/target/example-1.2-SNAPSHOT-job.jar - Example Hadoop job JAR file.

Usage

Library

Constructing an instance of Cascading planner class org.jpmml.cascading.PMMLPlanner based on a PMML document in local filesystem:

File pmmlFile = ...;
Evaluator evaluator = PMMLPlannerUtil.createEvaluator(pmmlFile);
PMMLPlanner pmmlPlanner = new PMMLPlanner(evaluator);

Building a simple flow for scoring data:

FlowDef flowDef = ...;

flowDef = flowDef.addSource("input", ...);
flowDef = flowDef.addSink("output", ...);

pmmlPlanner.setHeadName("input");
pmmlPlanner.setTailName("output");

flowDef = flowDef.addAssemblyPlanner(pmmlPlanner);

Please see the example application for full picture.

Example Hadoop job

The example Hadoop job JAR file contains a single executable class org.jpmml.cascading.Main.

This class expects three command-line arguments:

  1. The path of the model PMML file in local filesystem.
  2. The path of the Cascading source CSV resource in Hadoop filesystem.
  3. The path of the Cascading sink CSV resource in Hadoop filesystem.

For example:

hadoop jar example-1.2-SNAPSHOT-job.jar /tmp/cascading/model.pmml file:///tmp/cascading/input.csv file:///tmp/cascading/output

License

JPMML-Cascading is dual-licensed under the GNU Affero General Public License (AGPL) version 3.0 and a commercial license.

Additional information

Please contact info@openscoring.io