Skip to content

demo of Using the Open Responses API on Ollama and HF

Notifications You must be signed in to change notification settings

samwit/Open-Responses-Demo

Repository files navigation

Open Responses API Demo

Demo code for the Open Responses API using Hugging Face Inference Providers and local models via Ollama.

Video Tutorial: Open Responses API Overview

Quick Setup (using UV)

1. Install UV (if not already installed)

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with Homebrew
brew install uv

2. Create venv and install dependencies

cd demo
uv sync
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Configure environment variables

cp .env.example .env

Edit .env with your settings:

# Required for HF demo
HF_TOKEN=hf_your_token_here

# Optional: Ollama settings (defaults shown)
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3.2

Get your HF token at: https://huggingface.co/settings/tokens (Enable "Make calls to Inference Providers" permission)

4. Run the demos

Hugging Face hosted models:

python open_responses_demo.py

Local models with Ollama:

python ollama_demo.py

Or run with UV directly:

uv run python open_responses_demo.py
uv run python ollama_demo.py

Demo Scripts

open_responses_demo.py - HF Inference Providers

Uses the Open Responses API with Hugging Face's hosted models.

Demo Description
Basic Call Simple request/response
Streaming Event-based streaming with semantic events
Tool Calling Let the model call functions
Reasoning View raw reasoning traces
Multi-turn Continue conversations

ollama_demo.py - Local Models

Test the Open Responses API with locally-running models via Ollama.

Demo Description
Basic Call Simple request/response
Streaming Event-based streaming with semantic events
Tool Calling Let the model call functions
Reasoning View raw reasoning traces
Multi-turn Continue conversations

Note: If Ollama doesn't yet support the Open Responses API, the script shows the API format for when support is added. Ollama is part of the Open Responses initiative.

ollama_tools_reasoning.py - Tools & Reasoning Deep Dive

Focused demo on tool calling and reasoning with gpt-oss:20b.

Demo Description
Basic Tool Call Single tool calling example
Multiple Tools Model chooses from several tools
Tool Loop Complete request → tool → response cycle
Reasoning Basic Step-by-step reasoning
Reasoning Streaming Stream reasoning tokens live
Tools + Reasoning Combine both capabilities

Ollama Setup

1. Install Ollama

Download from https://ollama.com/download or:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

2. Pull a model

ollama pull llama3.2

Other good options:

ollama pull mistral
ollama pull qwen2.5
ollama pull llama3.3

3. Ollama starts automatically

If not, run:

ollama serve

4. Run the local demo

python ollama_demo.py

Available Models

Hugging Face Inference Providers

Specify a provider using model:provider syntax:

# Default routing
model = "moonshotai/Kimi-K2-Instruct-0905"

# Specific provider
model = "moonshotai/Kimi-K2-Instruct-0905:groq"
model = "Qwen/Qwen2.5-72B-Instruct:together"
model = "meta-llama/Llama-3.3-70B-Instruct:fireworks"

Browse available models: https://huggingface.co/inference/models

Ollama Local Models

# Set in .env or use defaults
OLLAMA_MODEL=llama3.2
OLLAMA_MODEL=mistral
OLLAMA_MODEL=qwen2.5:14b

List installed models: ollama list


File Structure

demo/
├── .env.example              # Environment template
├── .env                      # Your config (gitignored)
├── open_responses_demo.py    # HF hosted demo (Open Responses API)
├── ollama_demo.py            # Ollama demo (Open Responses API)
├── ollama_tools_reasoning.py # Ollama tools & reasoning deep dive
├── pyproject.toml            # UV project config
├── requirements.txt          # Pip fallback
└── README.md

Resources

About

demo of Using the Open Responses API on Ollama and HF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages