-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[Feature]: Support Local AI via OpenAI-Compatible Tool Calling #1720
Description
Before submitting
- I searched existing issues and did not find a duplicate.
- I am describing a concrete problem or use case, not just a vague idea.
Area
apps/desktop
Problem or use case
Right now, T3 assumes a hosted provider (OpenAI or similar) for:
• Chat completions
• Tool/function calling
• Agent orchestration
However, many developers are now running local inference servers (especially with GPUs like RTX 6000 / 5090) using:
• OpenAI-compatible APIs
• Custom models (coding models, distilled models, etc.)
• Cost-sensitive or privacy-sensitive workloads
The issue:
• Local endpoints often support tool calling, but T3 cannot reliably plug into them
• Developers must build custom adapters or bypass T3 tooling entirely
• This breaks the otherwise clean T3 developer experience
Proposed solution
Add a “Custom OpenAI-Compatible Provider” option with:
- Configurable Base URL
Allow overriding:
{
baseURL: "http://localhost:8000/v1",
apiKey: "optional-or-local"
}
-
Tool Calling Compatibility Layer
Ensure T3:
• Sends tools / functions in standard OpenAI format
• Supports:
• tool_choice: "auto"
• structured tool responses
• Accepts responses from:
• vLLM
• Ollama (via adapters like LiteLLM)
• other OpenAI-compatible backends -
Model-Agnostic Execution
Allow specifying:
model: "local/qwen-coder" // or any arbitrary string
No provider validation blocking execution.
- Streaming + Tool Invocation Support
Ensure compatibility with:
• streaming responses
• partial tool calls
• multi-step agent loops
Example Use Case
A developer runs:
vllm serve Qwen/Qwen3-Coder \
--port 8000 \
--api-key local
Then inside T3:
const ai = createAI({
provider: "openai-compatible",
baseURL: "http://127.0.0.1:8000/v1",
apiKey: "local",
});
T3 should:
• Send tool definitions
• Receive tool calls
• Execute tools normally
• Continue agent loop
Why this matters
-
Cost Reduction
Developers running local GPUs can:
• Avoid per-token costs
• Use T3 as the orchestration layer -
Privacy & Compliance
• Sensitive workloads never leave local infrastructure
• Important for healthcare, finance, and internal tooling -
Performance
• Local inference = lower latency in many setups
• Especially for iterative coding workflows -
Ecosystem Alignment
The ecosystem is rapidly standardizing around:
• “OpenAI-compatible APIs” as the universal interface
Supporting this makes T3:
• Future-proof
• Compatible with emerging infra
Smallest useful scope
Add support for a custom OpenAI-compatible base URL in T3, allowing tool calling to work against non-OpenAI endpoints.
Requirements
• Allow overriding:
{
baseURL: "http://localhost:8000/v1",
apiKey: "any-string"
}
• Do not enforce provider/model validation (accept arbitrary model names)
• Pass through existing OpenAI request format unchanged:
• messages
• tools
• tool_choice
• Accept standard OpenAI-style responses:
• tool_calls
• choices[].message
Why this is enough
With just this:
• I can plug T3 directly into:
• vLLM
• Ollama (via LiteLLM)
• any OpenAI-compatible server
• Tool calling works without any additional abstraction
• No new UI, presets, or provider logic required
⸻
What this explicitly does NOT require
• No provider-specific integrations
• No UI changes
• No model capability detection
• No special handling for local models
Alternatives considered
No response
Risks or tradeoffs
No response
Examples or references
No response
Contribution
- I would be open to helping implement this.