triton-fastapi-docker

This repository demonstrates how to use NVIDIA Triton to deploy and serve ML models. It provides a set of services packaged in Docker containers that work together to serve a model capable of distinguishing between bees and ants.

This is the companion repository to my blog: Beyond FastAPI: Using Nvidia Triton for serving ML models

Prerequisites

Before getting started, ensure that you have the following prerequisites:

Docker: You need to have Docker installed on your system to run the services. You can download and install Docker from the official Docker website: https://www.docker.com/get-docker

Services

1. triton

The triton service runs the NVIDIA Triton inference engine and serves the ML model. It is responsible for handling the inference requests and providing predictions based on the input data.

2. fastapi

The fastapi service is a Python backend that communicates with the triton service using gRPC. It acts as an intermediary between the client and the triton service. Clients can send REST API requests to this service to predict whether an image contains a bee or an ant.

3. gradio

The gradio service provides a user-friendly interface using Gradio, which allows users to perform ML inference by uploading images. This interface sends API requests to the fastapi service for ML inference, which in turn communicates with the triton server to obtain the predictions.

Usage

To get started, follow these steps:

Clone this repository:

git clone https://github.com/jayeshmahapatra/triton-fastapi-docker.git

Navigate to the triton-fastapi-docker directory:
```
cd triton-fastapi-docker
```
Build and start the services using Docker Compose:
```
docker-compose up --build
```
This command will start all three services: triton, fastapi, and gradio.
Once the services are up and running, you can access the Gradio interface by visiting localhost:8080 in your web browser. The interface allows you to upload images and receive predictions on whether they contain bees or ants.
(Optional) Using GPU: If you want to deploy using GPU, uncomment Line 14-20 in the compose.yaml so that triton service has access to GPU.

Notebooks and data

The notebooks directory contains a Jupyter notebook that guides you through the process of creating a TorchScript model to distinguish between bee_vs_ant. It was used to create the model served by the triton service. If you want to train your own model, you can look at the notebook for reference.

The data directory can be used to keep training data, when using notebooks to train.

For exact instructions refer to the readme in the notebooks folder.

Contributing

Contributions are welcome! If you have any suggestions, bug fixes, or improvements, feel free to open an issue or submit a pull request.

License

This repository is licensed under the MIT License, except the contents of the notebooks folder which are licensed using BSD 3-Clause License. See the LICENSE file for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

triton-fastapi-docker

Prerequisites

Services

1. triton

2. fastapi

3. gradio

Usage

Notebooks and data

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
fastapi		fastapi
gradio		gradio
notebooks		notebooks
triton		triton
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml

License

jayeshmahapatra/triton-fastapi-docker

Folders and files

Latest commit

History

Repository files navigation

triton-fastapi-docker

Prerequisites

Services

1. triton

2. fastapi

3. gradio

Usage

Notebooks and data

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages