# Deploying with Triton Inference Server

*This notebook is adapted from [Real-time Serving for XGBoost, Scikit-Learn RandomForest, LightGBM, and More](https://developer.nvidia.com/blog/real-time-serving-for-xgboost-scikit-learn-randomforest-lightgbm-and-more/).*

Now that we have a model to work with, let's deploy it for real-time serving using Triton. In order to do so, we will need to first serialize the models in the directory structure that Triton expects and then add configuration files to tell Triton exactly how we wish to use these models.

### Starting the server
With valid models and configuration files in place, we can now start the server. Below, we do so, use the Python client to wait for it to come fully online, and then check the logs to make sure we didn't get any unexpected warnings or errors while loading the models. Find the host for the PyTorch container in [Setup PyTorch and Triton Containers].

In [None]:
!pip install tritonclient[all]

In [2]:
import time
import tritonclient.grpc as triton_grpc
from tritonclient import utils as triton_utils
HOST = '172.18.0.3'
PORT = 8001
TIMEOUT = 15

In [3]:
client = triton_grpc.InferenceServerClient(url=f'{HOST}:{PORT}')

In [4]:
# Wait for server to come online
server_start = time.time()
while True:
    try:
        if client.is_server_ready() or time.time() - server_start > TIMEOUT:
            break
    except triton_utils.InferenceServerException:
        pass
    time.sleep(1)

## Submitting inference requests
With our models now deployed on a running Triton server, let's confirm that we get the same results from the deployed model as we get locally. Note that we will occasionally see slight divergences due to floating point errors during parallel execution, but otherwise, results should match.

In [5]:
import pandas as pd
def convert_to_numpy(df):
    df = df.copy()
    cat_cols = df.select_dtypes('category').columns
    for col in cat_cols:
        df[col] = df[col].cat.codes
    for col in df.columns:
        df[col] =  pd.to_numeric(df[col], downcast='float')
    return df.values

In [6]:
X_test = pd.read_pickle('data/X_test.pkl')
np_data = convert_to_numpy(X_test[0:5])

In [7]:
def triton_predict(model_name, arr):
    triton_input = triton_grpc.InferInput('input__0', arr.shape, 'FP32')
    triton_input.set_data_from_numpy(arr)
    triton_output = triton_grpc.InferRequestedOutput('output__0')
    response = client.infer(model_name, model_version='1', inputs=[triton_input], outputs=[triton_output])
    return response.as_numpy('output__0')

In [8]:
triton_result = triton_predict('model', np_data)

In [9]:
print("Result computed on Triton: ")
print(triton_result)

Result computed on Triton: 
[[0.98703283 0.01296717]
 [0.9928937  0.00710629]
 [0.9917474  0.00825262]
 [0.95682836 0.04317162]
 [0.99205303 0.00794696]]


In [11]:
import xgboost as xgb
model = xgb.XGBClassifier()
model.load_model("/workspace/volume1/model_repository/model/1/xgboost.json")
local_result = model.predict_proba(X_test[0:5])
print("\nResult computed locally: ")
print(local_result)


Result computed locally: 
[[0.98703283 0.01296715]
 [0.99289376 0.00710626]
 [0.9917474  0.00825259]
 [0.95682836 0.04317162]
 [0.99205303 0.00794695]]
