jBPM prediction service demo
Demo project for the prediction service API in jBPM.
First we will go through the necessary steps to setup the demo and lastly we will look at some implementation details on how the prediction API works. This will allow you to learn how to create your own machine learning (ML) based prediction services and how to integrate them with jBPM.
Download and install jBPM from here.
This repository contains two example prediction service implementations as Maven modules and a REST client to populate the project with task to allow the predictive model training. Start by downloading, or alternatively cloning, the repository:
$ git clone email@example.com:ruivieira/jbpm-recommendation-demo.git
For this demo, two random forest-based services, one using the SMILE library and another as Predictive Model Markup Language (PMML) model, will be used.
The services, located respectively in
services/jbpm-recommendation-pmml-random-forest, can be built with (using SMILE as an example):
$ cd services/jbpm-recommendation-smile-random-forest $ mvn clean install -T1C -DskipTests -Dgwt.compiler.skip=true \ -Dfindbugs.skip=true -Drevapi.skip=true -Denforcer.skip=true \ -Dcheckstyle.skip=true
The resulting JARs files can then be included in the Workbench's
kie-server.war located in
standalone/deployments directory of your jBPM server installation. To do this, simply create a
WEB-INF/lib , copy the compiled jars into it and run
$ zip -r kie-server.war WEB-INF
The PMML-based service expects to find the PMML model in
META-INF, so after copying the PMML file in
META-INF, it should also be included in the WAR by using
$ zip -r kie-server.war META-INF
jBPM will search for a prediction service with an identifier specified by a Java property named
org.jbpm.task.prediction.service. Since in our demo, the random forest service has the indentifier
SMILERandomForest, we can set this value before starting the workbench, for instance as an environment variable:
$ export JAVA_OPTS="-Dorg.jbpm.task.prediction.service=SMILERandomForest"
For the purpose of this documentation we will illustrate the steps using the SMILE-based service. The PMML-based service can be used by setting the above environment variable as
$ export JAVA_OPTS="-Dorg.jbpm.task.prediction.service=PMMLRandomForest"
Installing the project
Start the WB by running
One the WB has completed the startup, you can go to http://localhost:8080/business-central/ and login using the default admin credential
wbadmin/wbadmin. After chosing the default workspace (or creating your own), then select "Import project" and use the project git URL:
The project consists of a single Human Task, which can be inspected using the WB. The task is generic and simple enough in order to demonstrate the working of the jBPM's prediction API.
For the purposes of the demonstration, this task will be used to model a simple purchasing task where the purchase of a laptop of a certain brand is requested and must be, eventually, manually approved. The tasks inputs are:
Stringwith the brand's name
Floatrepresenting the laptop's price
Stringrepresenting the user requesting the purchase
The task provides as outputs:
Booleanspecifying whether the purchased was approved or not
Batch creation of tasks
This repository contains a REST client (under
client) which allows to add Human Tasks in batch in order to have sufficient data points to train the model, so that we can have meaningful predictions.
NOTE: Before running the REST client, make sure that the Workbench is running and the demo project is deployed and also running.
org.jbpm.recommendation.demo.RESTClient performs this task and can be executed from the
client directory with:
$ mvn exec:java -Dexec.mainClass="org.jbpm.recommendation.demo.RESTClient"
The client will then simulate the creation and completion of human tasks, during which the model will be trained.
The tasks' completion will adhere to the following logic:
- The purchase of a laptop of brand
Lenovorequested by user
Marywill be approved if the price is around $1500
- The purchase of a laptop of brand
Applerequested by user
Marywill be approved if the price is around $2500
- The purchase of a laptop of brand
Lenovorequested by user
Marywill be rejected if the price is around $2500
The prices for Lenovo and Apple laptop are drawn from Normal distributions with respective means of 1500 and 2500 (pictured below). Although the prediction service is not aware of the deterministic rules we've used to set the task outcome, it will train the model based on the data it receives.
In the following sections we will explain the internal working of a prediction service, how to test this project in the Workbench and how to create your own prediction service.
jBPM offers an API which allows for predictive models to be trained with Human Tasks (HT) data and for HT to incorporate the model's predictions as outputs ore even complete a HT.
This is achieved by connecting the HT handling to a prediction service. A prediction service is simply any third-party class wich implements the
This interface consists of three methods:
getIdentifier()- this methods simply returns a unique (
String) identifier for your prediction service
predict(Task task, Map<String, Object> inputData)- this method takes task information and the task's inputs from which we will derive the model's inputs, as a map. The method returns a
PredictionOutcomeinstance, which we will look in closer detail later on
train(Task task, Map<String, Object> inputData, Map<String, Object> outputData)- this method, similarly to
predict, takes task info and the task's inputs, but now we also need to provide the task's outputs, as a map, for training
By default, if no other prediction service is specified, jBPM will use a no-op service as defined in
org.jbpm.services.task.prediction.NoOpPredictionService. This service returns an empty prediction and performs no training. jBPM processes will behave as if no prediction service is present.
It is important to note that the prediction service makes no assumptions about which features will be used for model training and prediction. The API exposes the task information, inputs and outputs, but it is up to the developer/data scientist to select which inputs and outputs will be used for training, or if pre-processing is necessary, for instance.
PredictionOutcome is a class which encapsulates the model's prediction for a certain
Map<String, Object> inputData.
This class will contain:
Map<String, Object> outcomecontaining the prediction outputs, each entry represents a output attribute name and value. This map can be empty, which corresponds to the model not providing any prediction.
confidencevalue. The meaning of this field is left to the developer. As an example, it could represent a probability between
1.0. It's relevance is related to the
confidenceThreshold- this value represents the
confidencecutoff after which an action can be taken by the HT item handler.
As example, let's assume our
confidence represents a prediction probability between
1.0. If the
0.7, that would mean that for
confidence > 0.7 the HT outputs would be set to the
outcome and the task automatically closed. If the
confidence < 0.7, then the HT would set the prediction
outcome as suggested values, but the task would not be closed and still need human interaction. If the
outcome is empty, then the HT lifecycle would proceed as if no prediction was made.
The initial step is then, as defined above, the
In the scenario where the the prediction's confidence is above the threshold, the task is automatically completed. If that the confidence is not above the threshold, however, when the task is eventually completed both the inputs and the outputs will then be used to further train the model by calling the prediction service's
As we've seen previously, when creating and completing a batch of tasks (as previously) we are simultaneously training the predictive model. The service implementation is based on a random forest model a popular ensemble learning method.
When running the
RESTClient, 1200 task will be created and completed to allow for a reasonably sized training dataset. The prediction service initially has a confidence threshold of
1.0 and after a sufficiently large number (arbitrarily chosen as 1200) of observations are use for training, the confidence threshold drops to
0.75. This is simply to demonstrate the two possible actions, i.e. prediction without completing and completing the task. This also allows us to avoid any cold start problems.
After the model is trained with the task from
RESTClient, we will now create a new Human Task.
If we create a HT requesting the purchase of an
Apple laptop from
John with the price $2500, we should expect it to be approved.
If fact, when claiming the task, we can see that the prediction service recommends the purchase to be approved with a "confidence" of 91%.
If he now create a task for the request of a
Lenovo laptop from
Mary with the price $1437, he would expect it to be approved. We can see that this is the case, where the form is filled in by the prediction service with an approved status with a "confidence" of 86.5%.
We can also see, as expected, what happens when
John tries to order a
Lenovo for $2700. The prediction service fills the form as "not approved" with a "confidence" of 71%.
In this service, the confidence threshold is set as
0.95 and as such the task was not closed automatically.
The second example implementation is the PMML-based prediction service. PMML is a predictive model interchange standard, which allows for a wide variety of models to be reused in different platforms and programming languages.
The service included in this demo consists of pre-trained model (with a dataset similar to the one generate by the
RESTClient) which is executed by a PMML engine. For this demo, the engine used was jpmml-evaluator, the de facto reference implementation of the PMML specification.
There are two main differences when comparing this service to the SMILE-based one:
- The model doesn't need the training phase. The model has been already trained and serialised into the PMML format. This means that we can start using predictions straight away from jBPM.
trainAPI method is a no-op in this case. This means that whenever the service's
trainmethod is called, it will not be used for training in this example (only the
predictmethod is needed for a "read-only" model), as we can see from the figure below.
A demonstration video is available here (description in subtitles).