Java library and command-line application for converting R models to PMML
Java R

README.md

JPMML-R

Java library and command-line application for converting R models to PMML.

Features

  • Fast and memory-efficient:
    • Can produce a 5 GB Random Forest PMML file in less than 1 minute on a desktop PC
  • Supported model and transformation types:
    • caret package:
      • preProcess - Transformation methods "range", "center", "scale" and "medianImpute"
      • train.formula ("formula interface") - Selected JPMML-R model types
      • train ("matrix interface") - Selected JPMML-R model types
    • earth package:
      • earth - Multivariate Adaptive Regression Spline (MARS) regression
    • gbm package:
      • gbm - Gradient Boosting Machine (GBM) regression and classification
    • IsolationForest package:
      • iForest - Isolation Forest (IF) anomaly detection
    • party package:
      • ctree - Conditional Inference Tree (CIT) classification
    • pls package
      • mvr - Multivariate Regression (MVR) regression
    • randomForest package:
      • randomForest.formula ("formula interface") - Random Forest (RF) regression and classification
      • randomForest ("matrix interface") - Random Forest regression and classification
    • ranger package:
      • ranger - Random Forest regression and classification
    • r2pmml package:
      • scorecard - Scorecard regression
    • stats package:
      • glm - Generalized linear (GLM) regression and classification
      • kmeans - K-Means clustering
      • lm - Linear (LM) regression
    • xgboost package:
      • xgb.Booster - XGBoost (XGB) regression and classification
  • Production quality:
    • Complete test coverage.
    • Fully compliant with the JPMML-Evaluator library.

Prerequisites

  • Java 1.7 or newer.

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces an executable uber-JAR file target/converter-executable-1.2-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use R to train a model.
  2. Serialize the model in RDS data format to a file in a local filesystem.
  3. Use the JPMML-R command-line converter application to turn the RDS file to a PMML file.

The R side of operations

The following R script trains a Random Forest (RF) model and saves it in RDS data format to a file rf.rds:

library("randomForest")

rf = randomForest(Species ~ ., data = iris)

saveRDS(rf, "rf.rds")

The JPMML-R side of operations

Converting the RDS file rf.rds to a PMML file rf.pmml:

java -jar target/converter-executable-1.2-SNAPSHOT.jar --rds-input rf.rds --pmml-output rf.pmml

Getting help:

java -jar target/converter-executable-1.2-SNAPSHOT.jar --help

The conversion of large files (1 GB and beyond) can be sped up by increasing the JVM heap size using -Xms and -Xmx options:

java -Xms4G -Xmx8G -jar target/converter-executable-1.2-SNAPSHOT.jar --rds-input rf.rds --pmml-output rf.pmml

License

JPMML-R is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.

Additional information

Please contact info@openscoring.io