feat: support custom OpenAI-compatible endpoints by kodee2k · Pull Request #84 · pinchbench/skill

kodee2k · 2026-04-01T02:40:59Z

Adds --base-url and --api-key flags to benchmark.py so you can point PinchBench
at any OpenAI-compatible endpoint instead of only going through OpenRouter.

Useful for testing against local inference servers (vLLM, ollama, llama.cpp),
hosted providers like Together or Fireworks, or just hitting the OpenAI API directly.

What changed

scripts/benchmark.py

New --base-url arg for the API endpoint URL
New --api-key arg (falls back to $OPENAI_API_KEY if not provided)
Skips OpenRouter model validation when a custom base URL is set

scripts/lib_agent.py

ensure_agent_exists now accepts base_url and api_key keyword args
When a base URL is given, writes a custom provider into the bench agent's
models.json using the OpenClaw provider config format (openai-completions api type)
When no base URL is given, the existing OpenRouter flow is unchanged

Usage

# local server
python benchmark.py --model my-model --base-url http://localhost:8000/v1

# hosted provider with explicit key
python benchmark.py --model meta-llama/llama-3-70b \
  --base-url https://api.together.xyz/v1 --api-key tgr_xxx

# key from env (default if --api-key is omitted)
export OPENAI_API_KEY=sk-xxx
python benchmark.py --model gpt-4o --base-url https://api.openai.com/v1

The provider gets registered as "custom" in models.json with sane defaults
(200k context window, 8192 max output tokens). These work fine for most
endpoints but can be adjusted later if needed.

No changes to the grading, upload, or reporting paths.

Add --base-url and --api-key flags to benchmark.py for targeting any OpenAI-compatible API instead of only OpenRouter.

ScuttleBot

ScuttleBot review 🦀

Great flexibility addition. Testing against local inference servers, Together, Fireworks, or direct OpenAI is a common ask.

What's good:

--base-url and --api-key are the right knobs
Falls back to $OPENAI_API_KEY when --api-key is omitted
Skips OpenRouter model validation when custom base URL is set (correct — you can't validate arbitrary endpoints)
Provider registered as "custom" with reasonable defaults (200k context, 8192 output)

Suggestions:

Document the default context/output limits in the README — users testing models with different limits might be surprised
Consider --custom-context-length flag for edge cases? (Low priority, can be a follow-up)

Clean implementation. Merge it.

olearycrew · 2026-04-06T18:09:42Z

@kodee2k thanks for this contribution!

feat: support custom OpenAI-compatible endpoints

950e52f

Add --base-url and --api-key flags to benchmark.py for targeting any OpenAI-compatible API instead of only OpenRouter.

ScuttleBot reviewed Apr 6, 2026

View reviewed changes

ScuttleBot mentioned this pull request Apr 6, 2026

Clean up some recent changes #83

Closed

olearycrew merged commit 5a35ec1 into pinchbench:main Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support custom OpenAI-compatible endpoints#84

feat: support custom OpenAI-compatible endpoints#84
olearycrew merged 1 commit intopinchbench:mainfrom
kodee2k:feat/custom-base-url

kodee2k commented Apr 1, 2026

Uh oh!

ScuttleBot left a comment

Uh oh!

olearycrew commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kodee2k commented Apr 1, 2026

What changed

Usage

Uh oh!

ScuttleBot left a comment

Choose a reason for hiding this comment

Uh oh!

olearycrew commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants