# Forest Inference Library (FIL)
The forest inference library is used to load saved forest models of xgboost, lightgbm and perform inference on them. It can be used to perform both classification and regression. In this notebook, we'll begin by fitting a model with XGBoost and saving it. We'll then load the saved model into FIL and use it to infer on new data.

FIL works in the same way with lightgbm model as well.

The model accepts both numpy arrays and cuDF dataframes. In order to convert your dataset to cudf format please read the cudf documentation on https://docs.rapids.ai/api/cudf/stable. 

For additional information on the forest inference library please refer to the documentation on https://docs.rapids.ai/api/cuml/stable/api.html#forest-inferencing

In [None]:
import cupy
import os

from cuml.testing.utils import array_equal
from cuml.internals.import_utils import has_xgboost

from cuml.datasets import make_classification
from cuml.metrics import accuracy_score
from cuml.model_selection import train_test_split
    
from cuml import ForestInference

### Check for xgboost
Checks if xgboost is present, if not then it throws an error.

In [None]:
if has_xgboost():
    import xgboost as xgb
else:
    raise ImportError("Please install xgboost using the conda package,"
                      "e.g.: conda install -c conda-forge xgboost")

## Define parameters

In [None]:
# synthetic data size
n_rows = 10000
n_columns = 100
n_categories = 2
random_state = cupy.random.RandomState(43210)

# fraction of data used for model training
train_size = 0.8

# trained model output filename
model_path = 'xgb.model'

# num of iterations for which xgboost is trained
num_rounds = 100

# maximum tree depth in each training round
max_depth = 20

## Generate data

In [None]:
# create the dataset
X, y = make_classification(
    n_samples=n_rows,
    n_features=n_columns,
    n_informative=int(n_columns/5),
    n_classes=n_categories,
    random_state=42
)

# convert the dataset to float32
X = X.astype('float32')
y = y.astype('float32')

# split the dataset into training and validation splits
X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.8)

## Train helper function
Defines a simple function that trains the XGBoost model and returns the trained model.

For additional information on the xgboost library please refer to the documentation on : 
https://xgboost.readthedocs.io/en/latest/parameter.html

In [None]:
def train_xgboost_model(
    X_train, 
    y_train,
    model_path='xgb.model',
    num_rounds=100, 
    max_depth=20
):
    
    # set the xgboost model parameters
    params = {
        'verbosity': 0, 
        'eval_metric':'error',
        'objective':'binary:logistic',
        'max_depth': max_depth,
        'tree_method': 'gpu_hist'
    }
    
    # convert training data into DMatrix
    dtrain = xgb.DMatrix(X_train, label=y_train)
    
    # train the xgboost model
    trained_model = xgb.train(params, dtrain, num_rounds)

    # save the trained xgboost model
    trained_model.save_model(model_path)

    return trained_model

## Predict helper function
Uses the trained xgboost model to perform prediction and return the labels.

In [None]:
def predict_xgboost_model(X_validation, y_validation, xgb_model):

    # predict using the xgboost model
    dvalidation = xgb.DMatrix(X_validation, label=y_validation)
    predictions = xgb_model.predict(dvalidation)

    # convert the predicted values from xgboost into class labels
    predictions = cupy.around(predictions)
    
    return predictions

## Train and Predict the model
Invoke the function to train the model and get predictions so that we can validate them.

In [None]:
%%time
# train the xgboost model
xgboost_model = train_xgboost_model(
    X_train, 
    y_train, 
    model_path,
    num_rounds,
    max_depth
)

In [None]:
%%time
# test the xgboost model
trained_model_preds = predict_xgboost_model(
    X_validation,
    y_validation,
    xgboost_model
)

## Load Forest Inference Library (FIL)

The load function of the ForestInference class accepts the following parameters:

       filename : str
           Path to saved model file in a treelite-compatible format
           (See https://treelite.readthedocs.io/en/latest/treelite-api.html
        output_class : bool
           If true, return a 1 or 0 depending on whether the raw prediction
           exceeds the threshold. If False, just return the raw prediction.
        threshold : float
           Cutoff value above which a prediction is set to 1.0
           Only used if the model is classification and output_class is True
        algo : string name of the algo from (from algo_t enum)
             'NAIVE' - simple inference using shared memory
             'TREE_REORG' - similar to naive but trees rearranged to be more
                              coalescing-friendly
             'BATCH_TREE_REORG' - similar to TREE_REORG but predicting
                                    multiple rows per thread block
        model_type : str
            Format of saved treelite model to load.
            Can be 'xgboost', 'lightgbm'

## Loaded the saved model
Use FIL to load the saved xgboost model

In [None]:
fil_model = ForestInference.load(
    filename=model_path,
    algo='BATCH_TREE_REORG',
    output_class=True,
    threshold=0.50,
    model_type='xgboost'
)

## Predict using FIL

In [None]:
%%time
# perform prediction on the model loaded from path
fil_preds = fil_model.predict(X_validation)

## Evaluate results

Verify the predictions for the original and FIL model match.

In [None]:
print("The shape of predictions obtained from xgboost : ", (trained_model_preds).shape)
print("The shape of predictions obtained from FIL : ", (fil_preds).shape)
print("Are the predictions for xgboost and FIL the same : ",  array_equal(trained_model_preds, fil_preds))

# Distributed FIL with Dask

Now lets demonstrate how we can use FIL and Dask together to run parallel inference on multiple-GPUs while leveraging our trained model. 

Below we will:
1. **create a Dask cluster** with n_GPU workers,

2. **generate synthetic data** and partition it evenly among the workers,
    
3. **load FIL model** on each worker,

4. and **run parallel FIL .predict()** on each worker

*Optional Kernel Restart*

## Dask Imports

In [None]:
from dask_cuda import LocalCUDACluster
from distributed import Client, wait, get_worker

import dask.dataframe
import dask.array
import dask_cudf

from cuml import ForestInference
import time

## Create a LocalCUDACluster

Note that we'll be partitioning the data equally among the workers. 

In [None]:
cluster = LocalCUDACluster()
client = Client(cluster)

workers = client.has_what().keys()
n_workers = len(workers)
n_partitions = n_workers

## Define size of synthetic data

In [None]:
rows = 1_000_000
cols = 100

## Generate synthetic query/inference data

Next we will generate data on the CPU as a dask array, then move it into GPU memory as a dask.dataframe and ultimately convert it into a dask_cudf.dataframe so that it can be used in the upstream FIL predict.

In [None]:
x = dask.array.random.random(
    size=(rows, cols), 
    chunks=(rows//n_partitions, cols)
).astype('float32')

In [None]:
df = dask_cudf.from_dask_dataframe(
    dask.dataframe.from_array(x)
)

## Persist data in GPU memory
We can optionally persist our generated data (see [Persist documentation](https://docs.dask.org/en/latest/dataframe-best-practices.html?highlight=persist#persist-intelligently)), so that our lazy dataframe starts to be executed and saved in memory.

In [None]:
df = df.persist()
wait(df)

## Pre-load FIL model on each worker

Before we run inference on our distributed dataset let's first load the FIL model (trained by XGBoost above), onto each worker.

Here we'll leverage the worker's local storage which will persist after the function/task completes.

For more see the [Dask worker documentation on storage](https://distributed.dask.org/en/latest/worker.html#storing-data).

In [None]:
def worker_init(dask_worker, model_file='xgb.model'):
   dask_worker.data["fil_model"] = ForestInference.load(
       filename=model_file,
       algo='BATCH_TREE_REORG',
       output_class=True,
       threshold=0.50,
       model_type='xgboost'
    )

In [None]:
%%time
client.run(worker_init)

## Distributed FIL Predict on persisted data

In [None]:
def predict(input_df):
   worker = get_worker()
   return worker.data["fil_model"].predict(input_df)

Lets map the predict call to each of our partitions (i.e., the dask_cudf.dataframe chunks that we distributed among the workers ).

In [None]:
distributed_predictions = df.map_partitions(predict, meta="float")

In [None]:
tic = time.perf_counter()
distributed_predictions.compute()
toc = time.perf_counter()

fil_inference_time = toc-tic

## Summarize the performance

In [None]:
total_samples = len(df)
print(f' {total_samples:,} inferences in {fil_inference_time:.5f} seconds'
      f' -- {int(total_samples/fil_inference_time):,} inferences per second ')