# Training and Tracking Machine Learning Models in Java with Tribuo

This notebook is a companion to the JavaOne 2022 talk "Training and Tracking Machine Learning Models in Java with Tribuo". It covers how to train, evaluate and examine models in [Tribuo](https://tribuo.org) an open-source Java ML library. Tribuo is Apache 2.0 licensed and available on GitHub (https://github.com/oracle/tribuo).

In this notebook we'll build a simple model, evaluate its performance, and examine the provenance information collected about the training and evaluation procedures. We'll then export the Tribuo model in ONNX format, before importing a model trained in scikit-learn to do the same task then evaluating both the exported Tribuo model and the imported scikit-learn model.

## Setup

First we need to load in some jars and import the relevant packages from Tribuo and the JDK. The jars can be built from the Tribuo source tree with `mvn package`. We'll also need MNIST in IDX format if you don't already have that.

MNIST training set:

`wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz`

`wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz`

MNIST test set:

`wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz`

`wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz`

In [1]:
%jars tribuo-classification-experiments-4.3.0-jar-with-dependencies.jar
%jars tribuo-json-4.3.0-jar-with-dependencies.jar
%jars tribuo-onnx-4.3.0-jar-with-dependencies.jar

In [2]:
import java.nio.file.*;

import ai.onnxruntime.*;
import com.fasterxml.jackson.databind.*;
import com.oracle.labs.mlrg.olcut.provenance.primitives.*;
import com.oracle.labs.mlrg.olcut.provenance.ProvenanceUtil;
import com.oracle.labs.mlrg.olcut.util.Pair;

import org.tribuo.*;
import org.tribuo.classification.*;
import org.tribuo.classification.evaluation.*;
import org.tribuo.classification.sgd.linear.*;
import org.tribuo.classification.sgd.objectives.LogMulticlass;
import org.tribuo.datasource.IDXDataSource;
import org.tribuo.math.optimisers.*;
import org.tribuo.interop.onnx.*;
import org.tribuo.util.Util;

## Training models with Tribuo

Before we can train a model we need to load some data, we'll pull in MNIST in the original IDX format using the built in `IDXDataSource`.

In [3]:
var labelFactory = new LabelFactory();
var labelEvaluator = new LabelEvaluator();
var mnistTrainSource = new IDXDataSource<>(Paths.get("train-images-idx3-ubyte.gz"),Paths.get("train-labels-idx1-ubyte.gz"),labelFactory);
var mnistTestSource = new IDXDataSource<>(Paths.get("t10k-images-idx3-ubyte.gz"),Paths.get("t10k-labels-idx1-ubyte.gz"),labelFactory);
var mnistTrain = new MutableDataset<>(mnistTrainSource);
var mnistTest = new MutableDataset<>(mnistTestSource);


System.out.println(String.format("MNIST train size = %d, number of features = %d, number of classes = %d",mnistTrain.size(),mnistTrain.getFeatureMap().size(),mnistTrain.getOutputInfo().size()));
System.out.println(String.format("MNIST test size = %d, number of features = %d, number of classes = %d",mnistTest.size(),mnistTest.getFeatureMap().size(),mnistTest.getOutputInfo().size()))

MNIST train size = 60000, number of features = 717, number of classes = 10
MNIST test size = 10000, number of features = 668, number of classes = 10


MNIST is 28x28 pixel handwritten digits, for a total of 784 possible features, though 67 of them are always blank (and so Tribuo ignores them). A sample of them is given below.

<img src="mnist-3.0.1.png" alt="MNIST Digits" style="width: 400px;"/>

Now we can define the model we're going to train. We'll use a simple logistic regression, trained using a gradient descent algorithm called AdaGrad.

In [4]:
var lrTrainer = new LinearSGDTrainer( // training algorithm name
                                     new LogMulticlass(),  // Loss function
                                     new AdaGrad(0.1),     // Gradient optimiser
                                     3,                    // Number of training epochs
                                     30000,                // Logging interval
                                     Trainer.DEFAULT_SEED  // RNG seed
                                    );

Training the model is simple, we supply the training dataset to the trainer's train method.

In [5]:
var lrStartTime = System.currentTimeMillis();
var lrModel = lrTrainer.train(mnistTrain, Map.of("Extra", new StringProvenance("Extra","Some runtime information")));
var lrEndTime = System.currentTimeMillis();

System.out.println("Training logistic regression took " + Util.formatDuration(lrStartTime,lrEndTime));

Training logistic regression took (00:00:06:652)


Evaluation is similarly straightforward, we supply the model and test dataset to the classification evaluator.

In [6]:
lrStartTime = System.currentTimeMillis();
var mnistEval = labelEvaluator.evaluate(lrModel,mnistTest);
lrEndTime = System.currentTimeMillis();

System.out.println("Scoring logistic regression took " + Util.formatDuration(lrStartTime,lrEndTime));
System.out.println("Evaluation:\n"+mnistEval.toString());
System.out.println("\nConfusion Matrix:");
System.out.println(mnistEval.getConfusionMatrix().toString());

Scoring logistic regression took (00:00:00:366)
Evaluation:
Class                           n          tp          fn          fp      recall        prec          f1
0                             980         944          36          41       0.963       0.958       0.961
1                           1,135       1,121          14          75       0.988       0.937       0.962
2                           1,032         906         126          88       0.878       0.911       0.894
3                           1,010         781         229          44       0.773       0.947       0.851
4                             982         888          94          77       0.904       0.920       0.912
5                             892         770         122         149       0.863       0.838       0.850
6                             958         923          35         111       0.963       0.893       0.927
7                           1,028         954          74         112       0.928       0.89

The evaluation exposes accessors for all the metrics it computes, and displays a subset of them in the `toString()` method in a format suitable for human consumption. There are also methods on the evaluation to print the results in a HTML table. The confusion matrix (which shows how many of each ground truth label were predicted as another label) also has accessors and again has a method to convert it into HTML.

Now let's inspect the information Tribuo captured as part of the training procedure. We'll convert it into a moderately human readable JSON format, though there are many options for displaying the provenance and for accessing it programmatically.

In [7]:
ObjectMapper objMapper = new ObjectMapper();
objMapper = objMapper.enable(SerializationFeature.INDENT_OUTPUT);

String jsonEvaluationProvenance = objMapper.writeValueAsString(ProvenanceUtil.convertToMap(lrModel.getProvenance()));
System.out.println(jsonEvaluationProvenance);

{
  "instance-values" : {
    "Extra" : "Some runtime information"
  },
  "tribuo-version" : "4.3.0",
  "java-version" : "19-panama",
  "trainer" : {
    "seed" : "12345",
    "tribuo-version" : "4.3.0",
    "minibatchSize" : "1",
    "train-invocation-count" : "0",
    "is-sequence" : "false",
    "shuffle" : "true",
    "epochs" : "3",
    "optimiser" : {
      "epsilon" : "1.0E-6",
      "initialLearningRate" : "0.1",
      "initialValue" : "0.0",
      "host-short-name" : "StochasticGradientOptimiser",
      "class-name" : "org.tribuo.math.optimisers.AdaGrad"
    },
    "host-short-name" : "Trainer",
    "class-name" : "org.tribuo.classification.sgd.linear.LinearSGDTrainer",
    "loggingInterval" : "30000",
    "objective" : {
      "host-short-name" : "LabelObjective",
      "class-name" : "org.tribuo.classification.sgd.objectives.LogMulticlass"
    }
  },
  "os-arch" : "x86_64",
  "trained-at" : "2022-10-19T10:24:53.429354-07:00",
  "os-name" : "Mac OS X",
  "dataset" : {
    "nu

The provenance has captured the training algorithm and it's parameters, along with the dataset including paths to the various files, timestamps and hashes.

## Working with external models

Unfortunately people don't do all their machine learning in Java, so we need to interact with external systems. In Tribuo we have support for loading models trained in other systems using TensorFlow and XGBoost, along with any model that can be exported into the [ONNX format](https://onnx.ai).

We can also export many Tribuo models in ONNX format for use elsewhere.

We're going to import an ONNX model, compare it to the Tribuo model we just trained, and finally export our Tribuo model in ONNX so it can be deployed in other systems.

Much of the complexity in working with external models is ensuring that Tribuo's view of the input data matches the external model's view. This means that we need to ensure each named feature in Tribuo is assigned the index that the external model expects. Tribuo's features are named, so if we have a `price_per_unit` feature in Tribuo, then we need to make sure that it is presented to the external model using the right index e.g., `9`. Similarly the output dimensions from a Tribuo model are named, and so we need to map from the indices of the output tensor to Tribuo's output dimension names.

We're going to demonstrate this mapping by using a model trained in scikit-learn on MNIST, where the features are given the opposite index (e.g., feature `783` is supplied as feature `0` and feature `0` is supplied as feature `783`).

First we load in some of the ONNX support classes:

In [8]:
var denseTransformer = new DenseTransformer();
var labelTransformer = new LabelTransformer();
var onnxSklPath = Paths.get("..","..","..","tutorials","external-models","skl_lr_mnist.onnx");
var ortEnv = OrtEnvironment.getEnvironment();
var sessionOpts = new OrtSession.SessionOptions();

Then we compute the inverted mapping in Tribuo, and load in the scikit-learn model:

In [9]:
Map<String, Integer> sklFeatMapping = new HashMap<>();
for (int i = 0; i < 784; i++) {
    // This MNIST model has the feature indices transposed to test a non-trivial mapping.
    int id = (783 - i);
    sklFeatMapping.put(String.format("%03d", i), id);
}
Map<Label, Integer> sklOutMapping = new HashMap<>();
for (Label l : mnistTrain.getOutputInfo().getDomain()) {
    sklOutMapping.put(l, Integer.parseInt(l.getLabel()));
}
Model<Label> sklModel = ONNXExternalModel.createOnnxModel(labelFactory, sklFeatMapping, sklOutMapping, 
                    denseTransformer, labelTransformer, sessionOpts, onnxSklPath, "float_input");

The `sklModel` is a Tribuo model like any other, so we can evaluate it in the same way we evaluated the logistic regression earlier:

In [10]:
var sklEvaluation = labelEvaluator.evaluate(sklModel,mnistTest);
sklEvaluation.toString();

Class                           n          tp          fn          fp      recall        prec          f1
0                             980         963          17          46       0.983       0.954       0.968
1                           1,135       1,112          23          37       0.980       0.968       0.974
2                           1,032         926         106          70       0.897       0.930       0.913
3                           1,010         916          94          98       0.907       0.903       0.905
4                             982         910          72          64       0.927       0.934       0.930
5                             892         776         116          83       0.870       0.903       0.886
6                             958         910          48          55       0.950       0.943       0.946
7                           1,028         951          77          70       0.925       0.931       0.928
8                             974         869 

Due to differences in the default hyperparameters, the scikit-learn model performs a little differently to the one trained in Tribuo, such is the way of ML systems, they are hard to replicate across platforms. With a bit of tweaking to the Tribuo model's hyperparameters we could get the same performance, but that's an exercise for another time. 

Now lets export the Tribuo model into ONNX format and load it back in using the same external model support.

In [11]:
var lrModelPath = Paths.get(".","javaone-lr-mnist.onnx");
lrModel.saveONNXModel("org.tribuo.javaone.lr", // namespace for the model
                      0,                       // model version number
                      lrModelPath              // path to save the model
                      );

Again we'll need to define the mapping between feature names and indices for our ONNX model, but fortunately Tribuo builds all that information anyway, so we simply copy it into the right data structures:

In [12]:
Map<String, Integer> mnistFeatureMap = new HashMap<>();
for (VariableInfo f : lrModel.getFeatureIDMap()){
    VariableIDInfo id = (VariableIDInfo) f;
    mnistFeatureMap.put(id.getName(),id.getID());
}
Map<Label, Integer> mnistOutputMap = new HashMap<>();
for (Pair<Integer,Label> l : lrModel.getOutputIDInfo()) {
    mnistOutputMap.put(l.getB(), l.getA());
}

Loading the model is the same as before, just with the correct mappings and path.

In [13]:
ONNXExternalModel<Label> onnxLR = ONNXExternalModel.createOnnxModel(labelFactory, mnistFeatureMap, mnistOutputMap,
                    denseTransformer, labelTransformer, sessionOpts, lrModelPath, "input");

And we can evaluate it the same way.

In [14]:
var onnxEvaluation = labelEvaluator.evaluate(onnxLR,mnistTest);
onnxEvaluation.toString();

Class                           n          tp          fn          fp      recall        prec          f1
0                             980         944          36          41       0.963       0.958       0.961
1                           1,135       1,121          14          75       0.988       0.937       0.962
2                           1,032         906         126          88       0.878       0.911       0.894
3                           1,010         781         229          44       0.773       0.947       0.851
4                             982         888          94          77       0.904       0.920       0.912
5                             892         770         122         149       0.863       0.838       0.850
6                             958         923          35         111       0.963       0.893       0.927
7                           1,028         954          74         112       0.928       0.895       0.911
8                             974         853 

## Conclusion

We saw how to train & evalutate models in Tribuo, inspected the provenance captured by the training procedure, and then discussed how to import and export models from Tribuo.