Demo code for the Open Responses API using Hugging Face Inference Providers and local models via Ollama.
Video Tutorial: Open Responses API Overview
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or with Homebrew
brew install uvcd demo
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activatecp .env.example .envEdit .env with your settings:
# Required for HF demo
HF_TOKEN=hf_your_token_here
# Optional: Ollama settings (defaults shown)
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3.2Get your HF token at: https://huggingface.co/settings/tokens (Enable "Make calls to Inference Providers" permission)
Hugging Face hosted models:
python open_responses_demo.pyLocal models with Ollama:
python ollama_demo.pyOr run with UV directly:
uv run python open_responses_demo.py
uv run python ollama_demo.pyUses the Open Responses API with Hugging Face's hosted models.
| Demo | Description |
|---|---|
| Basic Call | Simple request/response |
| Streaming | Event-based streaming with semantic events |
| Tool Calling | Let the model call functions |
| Reasoning | View raw reasoning traces |
| Multi-turn | Continue conversations |
Test the Open Responses API with locally-running models via Ollama.
| Demo | Description |
|---|---|
| Basic Call | Simple request/response |
| Streaming | Event-based streaming with semantic events |
| Tool Calling | Let the model call functions |
| Reasoning | View raw reasoning traces |
| Multi-turn | Continue conversations |
Note: If Ollama doesn't yet support the Open Responses API, the script shows the API format for when support is added. Ollama is part of the Open Responses initiative.
Focused demo on tool calling and reasoning with gpt-oss:20b.
| Demo | Description |
|---|---|
| Basic Tool Call | Single tool calling example |
| Multiple Tools | Model chooses from several tools |
| Tool Loop | Complete request → tool → response cycle |
| Reasoning Basic | Step-by-step reasoning |
| Reasoning Streaming | Stream reasoning tokens live |
| Tools + Reasoning | Combine both capabilities |
Download from https://ollama.com/download or:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | shollama pull llama3.2Other good options:
ollama pull mistral
ollama pull qwen2.5
ollama pull llama3.3If not, run:
ollama servepython ollama_demo.pySpecify a provider using model:provider syntax:
# Default routing
model = "moonshotai/Kimi-K2-Instruct-0905"
# Specific provider
model = "moonshotai/Kimi-K2-Instruct-0905:groq"
model = "Qwen/Qwen2.5-72B-Instruct:together"
model = "meta-llama/Llama-3.3-70B-Instruct:fireworks"Browse available models: https://huggingface.co/inference/models
# Set in .env or use defaults
OLLAMA_MODEL=llama3.2
OLLAMA_MODEL=mistral
OLLAMA_MODEL=qwen2.5:14bList installed models: ollama list
demo/
├── .env.example # Environment template
├── .env # Your config (gitignored)
├── open_responses_demo.py # HF hosted demo (Open Responses API)
├── ollama_demo.py # Ollama demo (Open Responses API)
├── ollama_tools_reasoning.py # Ollama tools & reasoning deep dive
├── pyproject.toml # UV project config
├── requirements.txt # Pip fallback
└── README.md
- Open Responses Spec: https://openresponses.org
- HF Responses API Docs: https://huggingface.co/docs/inference-providers/guides/responses-api
- HF Blog Post: https://huggingface.co/blog/open-responses
- Ollama: https://ollama.com
- GitHub: https://github.com/openresponses/openresponses