External inference service for Unison, providing LLM integration with multiple providers.
- Multi-provider support: OpenAI, Ollama (local), Azure OpenAI
- Intent-driven: Handles
inference.requestandinference.responseintents - Provider abstraction: Easy switching between providers via configuration
- Cost-aware: Designed to work with Policy service for cost/risk checks
- Observability: Structured JSON logging and Prometheus metrics
- Environment:
OPENAI_API_KEY,OPENAI_BASE_URL(optional) - Models:
gpt-4,gpt-3.5-turbo, etc.
- Environment:
OLLAMA_BASE_URL(default:http://ollama:11434) - Models:
llama3.2,mistral,codellama, etc. - No API keys required
- Environment:
AZURE_OPENAI_ENDPOINT,AZURE_OPENAI_API_KEY,AZURE_OPENAI_API_VERSION - Models: Your Azure deployment names
| Variable | Default | Description |
|---|---|---|
UNISON_INFERENCE_PROVIDER |
ollama |
Default provider (openai/ollama/azure) |
UNISON_INFERENCE_MODEL |
llama3.2 |
Default model name |
OPENAI_API_KEY |
- | OpenAI API key |
OPENAI_BASE_URL |
https://api.openai.com/v1 |
OpenAI base URL |
OLLAMA_BASE_URL |
http://ollama:11434 |
Ollama API URL |
AZURE_OPENAI_ENDPOINT |
- | Azure OpenAI endpoint |
AZURE_OPENAI_API_KEY |
- | Azure OpenAI API key |
AZURE_OPENAI_API_VERSION |
2024-02-15-preview |
Azure API version |
Handle inference requests as intents.
Request:
{
"intent": "summarize.doc",
"prompt": "Summarize this document...",
"provider": "ollama",
"model": "llama3.2",
"max_tokens": 1000,
"temperature": 0.7
}Response:
{
"ok": true,
"intent": "summarize.doc",
"provider": "ollama",
"model": "llama3.2",
"result": "Document summary...",
"event_id": "uuid",
"timestamp": 1698673200
}Service health check.
Readiness check including provider availability.
Prometheus metrics.
# Install dependencies
pip install -r requirements.txt
# Run locally
python src/server.py
# Run with Docker
docker build -t unison-inference .
docker run -p 8087:8087 unison-inferenceThe inference service integrates with:
- Orchestrator: Registers inference intents and routes requests
- Policy: Cost/risk evaluation for external API calls
- Context: Stores inference history and results
- Storage: Persists prompts and responses
summarize.doc: Summarize documents or textanalyze.code: Analyze or generate codetranslate.text: Translate between languagesgenerate.idea: Brainstorm ideas or suggestions