# Using Forest Inference Library (FIL) with multiple GPUs

See [Forest Inference Library demo](./forest_inference_demo.ipynb) for a basic introduction to Forest Inference Library (FIL). In this notebook, we will show how to use FIL to run inference with tree models using multiple GPUs.

We will FIL and Dask together. 

Below we will:
1. **Create a Dask cluster** with multiple workers, where each worker is assigned a single GPU;

2. **Generate synthetic data** and partition it evenly among the workers;
    
3. **Load FIL model** on each worker; and

4. **Run parallel FIL .predict()** on each worker

*Optional Kernel Restart*

```python
import IPython
IPython.Application.instance().kernel.do_shutdown(restart=True)
```

## Dask imports

In [None]:
from dask_cuda import LocalCUDACluster
from distributed import Client, wait, get_worker

import dask.dataframe
import dask.array
import dask_cudf

from cuml import ForestInference
import time

## Create a LocalCUDACluster

In [None]:
cluster = LocalCUDACluster()
client = Client(cluster)

workers = client.has_what().keys()
n_workers = len(workers)
n_partitions = n_workers

## Define size of synthetic data

In [None]:
rows = 1_000_000
cols = 100

## Generate synthetic query/inference data

We will generate data on the CPU as a Dask array, then move it into GPU memory as a `dask.dataframe`, and ultimately convert it into a `dask_cudf.dataframe` so that it can be used in FIL.

In [None]:
x = dask.array.random.random(
    size=(rows, cols),
    chunks=(rows//n_partitions, cols)
).astype("float32")

df = dask.dataframe.from_array(x).to_backend("cudf")

## Persist data in GPU memory

We can optionally persist our generated data (see [Persist documentation](https://docs.dask.org/en/latest/dataframe-best-practices.html?highlight=persist#persist-intelligently)), so that our lazy dataframe starts to be executed and saved in memory.

In [None]:
df = df.persist()
wait(df)

## Pre-load FIL model on each worker

Before we run inference on our distributed dataset let's first load the tree model onto each worker. **Make sure to run [Forest Inference Library demo](./forest_inference_demo.ipynb) first to obtain the `xgb.model` file.**

Here we'll leverage the worker's local storage which will persist after the function/task completes.

For more see the [Dask worker documentation on storage](https://distributed.dask.org/en/latest/worker.html#storing-data).

In [None]:
def worker_init(dask_worker, model_file="xgb.model"):
   dask_worker.data["fil_model"] = ForestInference.load(
       model_file,
       layout="depth_first",
       is_classifier=True,
       model_type="xgboost_ubj"
    )

In [None]:
%%time
client.run(worker_init)

## Distributed FIL Predict on persisted data

In [None]:
def predict(input_df):
   worker = get_worker()
   return worker.data["fil_model"].predict(input_df, threshold=0.50)

Let's map the `predict` call to each of our partitions (i.e., the `dask_cudf.dataframe` chunks that we distributed among the workers ).

In [None]:
distributed_predictions = df.map_partitions(predict, meta=("predict", int))

In [None]:
tic = time.perf_counter()
distributed_predictions.compute()
toc = time.perf_counter()

fil_inference_time = toc - tic

## Summarize the performance

In [None]:
total_samples = len(df)
print(f" {total_samples:,} inferences in {fil_inference_time:.5f} seconds"
      f" -- {int(total_samples/fil_inference_time):,} inferences per second ")