Skip to content

Latest commit

 

History

History
26 lines (18 loc) · 1.28 KB

README.md

File metadata and controls

26 lines (18 loc) · 1.28 KB

FastAPI Server for Huggingface Inference Endpoints

A custom fastapi server packaged as docker image for Huggingface inference endpoints deployment. This repository serves as a minimal working example to deploy a custom language model on Huggingface Inference Endpoints using Docker and FastAPI. The server exposes a /health endpoint for health checks and a /predict endpoint for next token predictions.

To easily and quickly try the server, you can e.g. use the following small language model: cerebras/Cerebras-GPT-111M

The loader assumes that model and tokenizer files are available in the /repository directory which is mounted by HF endpoints (https://huggingface.co/docs/inference-endpoints/en/guides/custom_container). HF will make the files of the repository that you select under Model Repository during deployment available at /repository.

Docker

You can build the docker image by running the following in the root of the repo:

docker build . -t server

There is also a docker image available on dockerhub for your convenience. You can pull a copy by running:

docker pull schrammsm/fastapi_hf_inference:1.0.0