Skip to content

feat: support custom OpenAI-compatible endpoints#84

Merged
olearycrew merged 1 commit intopinchbench:mainfrom
kodee2k:feat/custom-base-url
Apr 6, 2026
Merged

feat: support custom OpenAI-compatible endpoints#84
olearycrew merged 1 commit intopinchbench:mainfrom
kodee2k:feat/custom-base-url

Conversation

@kodee2k
Copy link
Copy Markdown
Contributor

@kodee2k kodee2k commented Apr 1, 2026

Adds --base-url and --api-key flags to benchmark.py so you can point PinchBench
at any OpenAI-compatible endpoint instead of only going through OpenRouter.

Useful for testing against local inference servers (vLLM, ollama, llama.cpp),
hosted providers like Together or Fireworks, or just hitting the OpenAI API directly.

What changed

scripts/benchmark.py

  • New --base-url arg for the API endpoint URL
  • New --api-key arg (falls back to $OPENAI_API_KEY if not provided)
  • Skips OpenRouter model validation when a custom base URL is set

scripts/lib_agent.py

  • ensure_agent_exists now accepts base_url and api_key keyword args
  • When a base URL is given, writes a custom provider into the bench agent's
    models.json using the OpenClaw provider config format (openai-completions api type)
  • When no base URL is given, the existing OpenRouter flow is unchanged

Usage

# local server
python benchmark.py --model my-model --base-url http://localhost:8000/v1

# hosted provider with explicit key
python benchmark.py --model meta-llama/llama-3-70b \
  --base-url https://api.together.xyz/v1 --api-key tgr_xxx

# key from env (default if --api-key is omitted)
export OPENAI_API_KEY=sk-xxx
python benchmark.py --model gpt-4o --base-url https://api.openai.com/v1

The provider gets registered as "custom" in models.json with sane defaults
(200k context window, 8192 max output tokens). These work fine for most
endpoints but can be adjusted later if needed.

No changes to the grading, upload, or reporting paths.

Add --base-url and --api-key flags to benchmark.py for targeting
any OpenAI-compatible API instead of only OpenRouter.
Copy link
Copy Markdown

@ScuttleBot ScuttleBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScuttleBot review 🦀

Great flexibility addition. Testing against local inference servers, Together, Fireworks, or direct OpenAI is a common ask.

What's good:

  • --base-url and --api-key are the right knobs
  • Falls back to $OPENAI_API_KEY when --api-key is omitted
  • Skips OpenRouter model validation when custom base URL is set (correct — you can't validate arbitrary endpoints)
  • Provider registered as "custom" with reasonable defaults (200k context, 8192 output)

Suggestions:

  • Document the default context/output limits in the README — users testing models with different limits might be surprised
  • Consider --custom-context-length flag for edge cases? (Low priority, can be a follow-up)

Clean implementation. Merge it.

@olearycrew olearycrew merged commit 5a35ec1 into pinchbench:main Apr 6, 2026
@olearycrew
Copy link
Copy Markdown
Member

@kodee2k thanks for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants