The microservices-base project is a base template for creating a microservices based architecture. It is using protobuf for communication between services. The services are using nodejs by default but is flexible enough to be replaced with other languages like Go, Python, etc.
- Docker
- Kubernetes
- Skaffold (for simplifying development by automating building, pushing and deploying)
- Minikube (for creating a local k8s cluster)
This is the preferred local development set up especially when using a Mac or any virtualized docker environment.
This speeds up local AI inference significantly. This will use docker for services
except the llama-service, and instead points to the llama-service on your localhost:8080.
Make sure to install llama.cpp via brew install llama.cpp
- Clone the repository
- Download the AI model into the
src/llama-service/modelsfolder:curl -L "https://huggingface.co/unsloth/Llama-3.1-8B-Instruct-GGUF/resolve/main/Llama-3.1-8B-Instruct-Q4_K_M.gguf" -o src/llama-service/models/model.gguf3 Start llama service locally:llama-server -m src/llama-service/models/model.gguf - Start the services:
docker compose -f docker-compose.llama-local.yml up - Open the frontend:
http://localhost:3000
When developing locally we recommend that you use docker-compose. This is because docker-compose is easier to setup and supports running one-off containers. This is helpful when adding npm packages among other things that skaffold + minikube does not support.
- Clone the repository
- Download the AI model into the
src/llama-service/modelsfolder:curl -L "https://huggingface.co/unsloth/Llama-3.1-8B-Instruct-GGUF/resolve/main/Llama-3.1-8B-Instruct-Q4_K_M.gguf" -o src/llama-service/models/model.gguf - Start the services:
docker compose up - Open the frontend:
http://localhost:3000
Only use local k8s (via minikube) when you want to test the local k8s cluster that mimics production. In you're planning to do local development, please use docker-compose. This is more of a sanity check to make sure the k8s configuration is working locally before deploying to production.
The local k8s requires that you download the model and server locally before starting the cluster. This is to speed up local builds since llama-service needs to download the model everytime before starting.
- Clone the repository
- Download the AI model once:
curl -L "https://huggingface.co/unsloth/Llama-3.1-8B-Instruct-GGUF/resolve/main/Llama-3.1-8B-Instruct-Q4_K_M.gguf" -o src/llama-service/models/model.gguf - Start a local python server to serve the AI model:
cd src/llama-service/models && python3 -m http.server 8000 - Start the k8s cluster:
minikube start - Run development environment:
skaffold dev - In another terminal, open the tunnel:
minikube tunnel - Open the frontend:
http://localhost:3000