Skip to content

Latest commit

 

History

History
166 lines (134 loc) · 7.85 KB

README.md

File metadata and controls

166 lines (134 loc) · 7.85 KB

JPMML-R Build Status

Java library and command-line application for converting R models to PMML.

Table of Contents

Features

  • Fast and memory-efficient:
    • Can produce a 5 GB Random Forest PMML file in less than 1 minute on a desktop PC
  • Supported model and transformation types:
    • ada package:
      • ada - Stochastic Boosting (SB) classification
    • adabag package:
      • bagging - Bagging classification
      • boosting - Boosting classification
    • caret package:
      • preProcess - Transformation methods "range", "center", "scale" and "medianImpute"
      • train - Selected JPMML-R model types
    • caretEnsemble package:
      • caretEnsemble - Ensemble regression and classification
    • CHAID package:
      • party - CHi-squared Automated Interaction Detection (CHAID) classification
    • earth package:
      • earth - Multivariate Adaptive Regression Spline (MARS) regression
    • elmNNRcpp package:
      • elm - Extreme Learning Machine (ELM) regression
    • evtree package:
      • party - Evolutionary Learning of Trees (EvTree) regression and classification
    • e1071 package:
      • naiveBayes - Naive Bayes (NB) classification
      • svm - Support Vector Machine (SVM) regression, classification and anomaly detection
    • gbm package:
      • gbm - Gradient Boosting Machine (GBM) regression and classification
    • glmnet package:
      • glmnet (elnet, fishnet, lognet and multnet subtypes) - Generalized Linear Model with lasso or elasticnet regularization (GLMNet) regression and classification
      • cv.glmnet - Cross-validated GLMNet regression and calculation
    • IsolationForest package:
      • iForest - Isolation Forest (IF) anomaly detection
    • mlr package:
      • WrappedModel - Selected JPMML-R model types.
    • neuralnet package:
      • nn - Neural Network (NN) regression
    • nnet package:
      • multinom - Multinomial log-linear classification
      • nnet.formula - Neural Network (NNet) regression and classification
    • party package:
      • ctree - Conditional Inference Tree (CIT) classification
    • partykit package:
      • party - Recursive Partytioning (Party) regression and classification
    • pls package:
      • mvr - Multivariate Regression (MVR) regression
    • randomForest package:
      • randomForest - Random Forest (RF) regression and classification
    • ranger package:
      • ranger - Random Forest (RF) regression and classification
    • rms package:
      • lrm - Binary Logistic Regression (LR) classification
      • ols - Ordinary Least Squares (OLS) regression
    • rpart package:
      • rpart - Recursive Partitioning (RPart) regression and classification
    • r2pmml package:
      • scorecard - Scorecard regression
    • stats package:
      • glm - Generalized Linear Model (GLM) regression and classification
      • kmeans - K-Means clustering
      • lm - Linear Model (LM) regression
    • xgboost package:
      • xgb.Booster - XGBoost (XGB) regression and classification
  • Data pre-processing using model formulae:
    • Interaction terms
    • base::I(..) function terms:
      • Logical operators &, | and !
      • Relational operators ==, !=, <, <=, >= and >
      • Arithmetic operators +, -, *, /, and %
      • Exponentiation operators ^ and **
      • The is.na function
      • Arithmetic functions abs, ceiling, exp, floor, log, log10, round and sqrt
    • base::cut() and base::ifelse() function terms
    • plyr::revalue() and plyr::mapvalues() function terms
  • Production quality:
    • Complete test coverage.
    • Fully compliant with the JPMML-Evaluator library.

Prerequisites

  • Java 1.8 or newer.

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces a library JAR file pmml-rexp/target/pmml-rexp-1.5-SNAPSHOT.jar, and an executable uber-JAR file pmml-rexp-example/target/pmml-rexp-example-executable-1.5-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use R to train a model.
  2. Serialize the model in RDS data format to a file in a local filesystem.
  3. Use the JPMML-R command-line converter application to turn the RDS file to a PMML file.

The R side of operations

The following R script trains a Random Forest (RF) model and saves it in RDS data format to a file rf.rds:

library("randomForest")

rf = randomForest(Species ~ ., data = iris)

saveRDS(rf, "rf.rds", version = 2)

The JPMML-R side of operations

Converting the RDS file rf.rds to a PMML file rf.pmml:

java -jar pmml-rexp-example/target/pmml-rexp-example-executable-1.5-SNAPSHOT.jar --rds-input rf.rds --pmml-output rf.pmml

Getting help:

java -jar pmml-rexp-example/target/pmml-rexp-example-executable-1.5-SNAPSHOT.jar --help

The conversion of large files (1 GB and beyond) can be sped up by increasing the JVM heap size using -Xms and -Xmx options:

java -Xms4G -Xmx8G -jar pmml-rexp-example/target/pmml-rexp-example-executable-1.5-SNAPSHOT.jar --rds-input rf.rds --pmml-output rf.pmml

Documentation

Up-to-date:

Slightly outdated:

License

JPMML-R is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use JPMML-R in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-R available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-R is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io