JPMML-SparkML plugin for converting XGBoost4J-Spark models to PMML
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
LICENSE.txt
README.md
pom.xml

README.md

JPMML-SparkML-XGBoost

JPMML-SparkML plugin for converting XGBoost4J-Spark models to PMML.

Prerequisites

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build installs JPMML-SparkML-XGBoost library into local repository using coordinates org.jpmml:jpmml-sparkml-xgboost:1.0-SNAPSHOT.

Usage

The JPMML-SparkML-XGBoost library extends the JPMML-SparkML library with support for ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel and ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel prediction model classes.

Launch the Spark shell with XGBoost-extended JPMML-SparkML-Package; use --packages to include the XGBoost4J-Spark runtime dependency:

spark-shell --packages ml.dmlc:xgboost4j-spark:0.7 --jars jpmml-sparkml-package-1.1-SNAPSHOT.jar

Fitting and exporting an example pipeline model:

import ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.RFormula
import org.jpmml.sparkml.ConverterUtil

val df = spark.read.option("header", "true").option("inferSchema", "true").csv("Iris.csv")

val formula = new RFormula().setFormula("Species ~ .")
var estimator = new XGBoostEstimator(Map("objective" -> "multi:softmax", "num_class" -> 3))
estimator = estimator.set(estimator.round, 11)

val pipeline = new Pipeline().setStages(Array(formula, estimator))
val pipelineModel = pipeline.fit(df)

val pmmlBytes = ConverterUtil.toPMMLByteArray(df.schema, pipelineModel)
println(new String(pmmlBytes, "UTF-8"))

License

JPMML-SparkML-XGBoost is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.

Additional information

Please contact info@openscoring.io