# In Java

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/yggdrasil-decision-forests/blob/main/documentation/public/docs/tutorial/java_standalone.ipynb)

## Setup

In [None]:
pip install ydf -U

## How can I use the Java Standalone export?

YDF models can be integrated in two ways:

-   **Direct Code Generation:** Call `model.to_standalone_java()` to generate
    the source code. This option is simple and great for experimentation.

-   **Build Rule Integration:** For production use, save your model (e.g., in
    Google3) and use a *java_ydf_embedded_model* Blaze rule. This option
    automatically call *to_standalone_java* call during compilation, simplifying
    model updates and option testing. Note that this build rule is currently not
    available in the open-source build / Bazel.

Both methods are demonstrated in this tutorial.

## Import libraries

In [None]:
import pandas as pd
import ydf

## Training a small model

First, we train a small YDF model on the Adult dataset.

In [None]:
# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/adult_train.csv")

model = ydf.GradientBoostedTreesLearner(label="income", num_trees=2).train(
    train_ds
)
# Note: Only train 2 trees to make the generated code smaller.

model.describe()

Train model on 22792 examples
Model trained in 0:00:00.025254


## Direct Code Generation

Let's generate the model `.java` file and the model data `.bin` file.
The `.java` file contains the following symbols:

-   **`Instance` class:** An input example.
-   **`Predict` method:** A thread safe method that consumes an *Instance*
    and returns a label class / probability (for classification) or value
    (for regression).
-   **`Label` enum:** The label values. In this case, this is a binary
    classification model with two labels `Label.LT_50K` and `Label.GT_50K`.
-   **Categorical enums:** An enum class for each of the categorical input
    features e.g. *FeatureWorkclass*, *FeatureEducation*.

The model data is stored in a separate `.bin` file, which needs to be in the classpath when running the model.

In [None]:
# Generate the Java code and binary data
java_model_files = model.to_standalone_java(export_dir=".")

# Print the content of the Java file
print(java_model_files["YdfModel.java"].decode())

Save the contents of `java_model_files["YdfModelData.bin"]` in the classpath.

```java
import ydf_model.YdfModel;
import ydf_model.YdfModel.Instance;
import ydf_model.YdfModel.Label;
import ydf_model.YdfModel.FeatureWorkclass;
import ydf_model.YdfModel.FeatureEducation;
import ydf_model.YdfModel.FeatureMaritalStatus;
import ydf_model.YdfModel.FeatureOccupation;
import ydf_model.YdfModel.FeatureRelationship;
import ydf_model.YdfModel.FeatureRace;
import ydf_model.YdfModel.FeatureSex;
import ydf_model.YdfModel.FeatureNativeCountry;

public class Predictor {

    public static void main(String[] args) {
        try {
            YdfModel model = new YdfModel(); // Loads data from YdfModel.bin in classpath

            Instance instance = new Instance();
            instance.age = 39;
            instance.workclass = FeatureWorkclass.STATE_GOV;
            instance.fnlwgt = 77516;
            instance.education = FeatureEducation.BACHELORS;
            instance.education_num = 13;
            instance.marital_status = FeatureMaritalStatus.NEVER_MARRIED;
            instance.occupation = FeatureOccupation.ADM_CLERICAL;
            instance.relationship = FeatureRelationship.NOT_IN_FAMILY;
            instance.race = FeatureRace.WHITE;
            instance.sex = FeatureSex.MALE;
            instance.capital_gain = 2174;
            instance.capital_loss = 0;
            instance.hours_per_week = 40;
            instance.native_country = FeatureNativeCountry.UNITED_STATES;

            Label prediction = model.Predict(instance);

            if (prediction == Label.LT_50K) {
                System.out.println("Prediction: <=50K");
            } else if (prediction == Label.GT_50K) {
                System.out.println("Prediction: >50K");
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

By default, `Predict` returns a class for classification model. Instead, the
method can return a probability (or probabilities in case of multi-class) or
scores (e.g., logits) with the `classification_output` argument. For example:

-   `model.to_standalone_java(classification_output='PROBABILITY')`: Returns a
    probabilitiy (`float`) or probabilities (`std::array<float>`).
-   `model.to_standalone_java(classification_output='SCORE')`: Returns scores.

Categorical feature values are created from the corresponding enum class e.g. `FeatureRelationship.NOT_IN_FAMILY`.

If you look at the content of the `Predict` function, you will see a for-loop
over the trees and a while-loop over the nodes. This is called the "routing"
algorithm, and it is a simple and generally efficient way to generate
predictions with a decision forest.

## Build Rule Integration

Instead of saving manually the result of `model.to_standalone_java()` to a file,
you can use the `java_ydf_standalone_model` Blaze/Bazel rule. The steps are:


1\.

Save the model with `model.save(...)` in a new directory in your source code
(e.g., in Google3).

```python
model.save("my_project/ydf_model_data")
```

2\.

Create a BUILD file with a `filegroup` in the model directory:

*File: my_project/ydf_model_data/BUILD*

```python
filegroup(name = "ydf_model_data", srcs = glob(["**"]))
```

3\.

In your library's `BUILD`, create a `java_ydf_standalone_model` build rule.

*File: my_project/BUILD*

```python
load("//third_party/yggdrasil_decision_forests/serving/embed:embed.bzl", "java_ydf_standalone_model")

java_ydf_standalone_model (
  name = "ydf_mode", # Rule name, .java filename, generated .bin filename.
  package_name = "ydf_model", # Name of the Java package where this rule is defined.
  data = "//my_project/ydf_model_data",
  # Compilation options here.
  classification_output = "PROBABILITY",
  constraints = ["android"], # Add this if building for android.
)
```

4\.

In your `java_library`, add ":my_model" as a dependency.

*File: my_project/BUILD* `python java_library( name = "main", srcs = ["MyClass.java"],
deps = [":ydf_model"], )`

5\.

In your Java code, import and call the model as shown in the example above.