Deploying NVIDIA Triton Server on GKE

This repository compiles prescriptive guidance and reference architecture for deploying NVIDIA Triton Inference Server on Google Kubernetes Engine (GKE)

Environment setup

This section outlines the steps to configure Google Cloud environment required to run the workflow demonstrated in this repo:

Environment Requirements

All services should be provisioned in the same project and the same compute region
NVIDIA Triton Inference Server is deployed to a dedicated GPU node pool on a GKE cluster
Anthos Service Mesh is used to manage, observe and secure communication to Triton Inference Server
All external traffic to Triton is routed through Istio Ingress Gateway, enabling fine-grained traffic management and progressive deployments
Managed Prometheus is used to monitor the Triton Inference Server pods
A Cloud Storage bucket located in the same region as the GKE cluster for managing model artifacts as in the model repository hosted on Triton server.
Docker repository in Google Artifact Registry to manage images required to run the steps of the workflow

Google Cloud Build jobs with Terraform will be used to provision the environment. The setup builds the environment as follows:

Enable APIs
Run Terraform to provision the required resources
Deploy Ingress Gateway
Deploy NVIDIA GPU drivers
Configure and deploy Triton Inference Server
Run health check to validate the Triton deployment

A few things to note:

You need to be a project owner to set up the environment.
You will be using Cloud Shell to start and monitor the setup process.

Click on the link below to navigate to Cloud Shell and clone the repo.

To set up the environment execute the following steps.

Provision infrastructure

Select a Google Cloud project

In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

NOTE: You need to be a project owner in order to set up the environment

Enable the required services

Clone the GitHub repo. Skip this step if you have launched through Cloud Shell link.

git clone https://github.com/jarokaz/triton-on-gke-sandbox

Granting permissions to your Cloud Build service account

export PROJECT_ID=<YOUR_PROJECT_ID>
gcloud config set project $PROJECT_ID

Retrieve the email for your project's Cloud Build service account:

CLOUDBUILD_SA="$(gcloud projects describe $PROJECT_ID \
    --format 'value(projectNumber)')@cloudbuild.gserviceaccount.com"

Grant the required access to your Cloud Build service account:

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$CLOUDBUILD_SA --role roles/owner

Run environment provisioning job

Environment provisioning is done using a Cloud Build job that runs Terraform scripts and environment setup steps. The Terraform configuration supports a number of configurable inputs. Refer to the /env-setup/variables.tf for the full list and the default settings. You need to set a small set of the required parameters. Set the below environment variables to reflect your environment.

PROJECT_ID - your project ID
REGION - the region for a GKE cluster network
ZONE - the zone for your GKE cluster
NETWORK_NAME - the name for the network
SUBNET_NAME - the name for the subnet
GCS_BUCKET_NAME - the name of the model repository GCS bucket
GKE_CLUSTER_NAME - the name of your cluster
TRITON_SA_NAME - the name for the service account that will be used as the Triton's workload identity
TRITON_NAMESAPCE - the name of a namespace where the solution's components are deployed
MACHINE_TYPE - The machine type for the Triton GPU node pool (default: n1-standard-4)
ACCELERATOR_TYPE - Type of accelerator (GPUs) for the Triton node pool (default: nvidia-tesla-t4)
ACCELERATOR_COUNT - Number of accelerator(s) (GPUs) for the Triton node pool (default: 1)

export PROJECT_ID=jk-mlops-dev
export REGION=us-central1
export ZONE=us-central1-a
export NETWORK_NAME=jk-gke-network
export SUBNET_NAME=jk-gke-subnet
export GCS_BUCKET_NAME=jk-triton-repository
export GKE_CLUSTER_NAME=jk-ft-gke
export TRITON_SA_NAME=triton-sa
export TRITON_NAMESPACE=triton
export MACHINE_TYPE=n1-standard-4
export ACCELERATOR_TYPE=nvidia-tesla-t4
export ACCELERATOR_COUNT=1

By default, the Terraform configuration uses Cloud Storage for the Terraform state. Set the following environment variables to the GCS location for the state.

export TF_STATE_BUCKET=jk-mlops-dev-tf-state
export TF_STATE_PREFIX=jax-to-ft-demo

Create Cloud Storage bucket to save Terraform State

gcloud storage buckets create gs://$TF_STATE_BUCKET --location=$REGION

Start provisioning by using Cloud Build job to run Terraform and provision resources, deploy Triton Inference server and finalize the setup.

gcloud builds submit \
  --region $REGION \
  --config cloudbuild.provision.yaml \
  --substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_REGION=$REGION,_ZONE=$ZONE,_NETWORK_NAME=$NETWORK_NAME,_SUBNET_NAME=$SUBNET_NAME,_GCS_BUCKET_NAME=$GCS_BUCKET_NAME,_GKE_CLUSTER_NAME=$GKE_CLUSTER_NAME,_TRITON_SA_NAME=$TRITON_SA_NAME,_TRITON_NAMESPACE=$TRITON_NAMESPACE,_MACHINE_TYPE=$MACHINE_TYPE,_ACCELERATOR_TYPE=$ACCELERATOR_TYPE,_ACCELERATOR_COUNT=$ACCELERATOR_COUNT \
  --timeout "2h" \
  --machine-type=e2-highcpu-32 \
  --quiet

Navigate to the Cloud Build logs using the link displayed on Cloud Shell or go to the Cloud Build page on the Cloud console. You should see similar page when the environment provision job is completed successfully:

Invoking sample model on Triton

You can now invoke the sample model. Use the NVIDIA Triton Inference Server SDK container image.

Start by configuring access to the cluster.

gcloud container clusters get-credentials ${GKE_CLUSTER_NAME} --project ${PROJECT_ID} --zone ${ZONE} 
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user "$(gcloud config get-value account)"

Get gateway IP address to access Triton server

ISTIO_GATEWAY_IP_ADDRESS=$(kubectl get services -n $TRITON_NAMESPACE \
   -o=jsonpath='{.items[?(@.metadata.name=="istio-ingressgateway")].status.loadBalancer.ingress[0].ip}')

Run Triton server locally

docker run -it --rm --net=host  \
-e ISTIO_GATEWAY_IP_ADDRESS=${ISTIO_GATEWAY_IP_ADDRESS} \
nvcr.io/nvidia/tritonserver:22.01-py3-sdk

After the container starts execute the following command from the containers command line:

/workspace/install/bin/image_client -u  $ISTIO_GATEWAY_IP_ADDRESS -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

Clean up

To clean up the environment run the Cloud Build job that runs Terraform to clean up the resources.

gcloud builds submit \
  --region $REGION \
  --config cloudbuild.destroy.yaml \
  --substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_REGION=$REGION,_ZONE=$ZONE,_NETWORK_NAME=$NETWORK_NAME,_SUBNET_NAME=$SUBNET_NAME,_GCS_BUCKET_NAME=$GCS_BUCKET_NAME,_GKE_CLUSTER_NAME=$GKE_CLUSTER_NAME,_TRITON_SA_NAME=$TRITON_SA_NAME,_TRITON_NAMESPACE=$TRITON_NAMESPACE \
  --timeout "2h" \
  --machine-type=e2-highcpu-32 \
  --quiet

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
env-setup		env-setup
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cloudbuild.destroy.yaml		cloudbuild.destroy.yaml
cloudbuild.provision.yaml		cloudbuild.provision.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deploying NVIDIA Triton Server on GKE

Environment setup

Environment Requirements

Provision infrastructure

Select a Google Cloud project

Enable the required services

Run environment provisioning job

Invoking sample model on Triton

Clean up

About

Releases

Packages

Contributors 3

Languages

License

jarokaz/triton-on-gke-sandbox

Folders and files

Latest commit

History

Repository files navigation

Deploying NVIDIA Triton Server on GKE

Environment setup

Environment Requirements

Provision infrastructure

Select a Google Cloud project

Enable the required services

Run environment provisioning job

Invoking sample model on Triton

Clean up

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages