shiftgate ⚡

shiftgate is an intelligent routing layer that automatically selects the right LoRA adapter for each task in your local agent loop.

shiftgate does not manage weights. It stores adapter metadata only — no downloading, caching, or loading LoRA files. You start Ollama or vLLM with your models and adapters loaded; shiftgate embeds each query, picks the best task cluster, and tells the backend which adapter to use.

shiftgate run requires a running inference backend. Routing-only commands (shiftgate route, shiftgate init) work without one. To generate text, Ollama (localhost:11434) or vLLM (localhost:8000) must already be running with your adapters loaded.

Instead of hardcoding which adapter to use, shiftgate matches your query against a catalog of task clusters using cosine similarity — then routes to the best-fit LoRA adapter on that backend.

Quickstart

Requires Python 3.10+ and a running Ollama or vLLM instance for inference.

1. Install

uv tool install shiftgate
# or: pip install shiftgate

2. Start your backend

vLLM (example — load adapters with --lora-modules):

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Meta-Llama-3-8B \
    --enable-lora \
    --lora-modules python-lora=/path/to/python-lora

Ollama (example — create a model that bundles base + adapter, then serve):

ollama create python-lora-ollama -f my-python-lora.Modelfile
ollama serve

3. Initialise shiftgate

Creates ~/.shiftgate/ and computes task embeddings (one-time model download for routing):

shiftgate init

4. Register adapters

Pick the option that matches your setup (see Bring Your Own Models for details):

# Option 1 — adapter already loaded in vLLM
shiftgate adapter add python-lora --runtime python-lora --tags python --base meta-llama/Meta-Llama-3-8B

# Option 2 — adapter already loaded in Ollama
shiftgate adapter add python-lora --runtime python-lora-ollama --tags python --base llama3

# Option 3 — metadata-only (catalogue a HuggingFace repo; no weights downloaded)
shiftgate adapter add teknium/python-lora --tags python --base llama3

5. Run a query

# Route only — shows the decision, no inference
shiftgate route "write a python sorting function"

# Route + run through your backend
shiftgate run "write a python sorting function"

Essential commands: init · adapter add · route · run · doctor · serve

Use as an OpenAI-compatible proxy

shiftgate serve exposes the router as a drop-in OpenAI endpoint. Any client that speaks OpenAI can point at it and get auto-routing for free — just pass model="auto".

# Start the proxy (defaults to http://127.0.0.1:9000)
shiftgate serve

# Use it from any OpenAI client
from openai import OpenAI

client = OpenAI(base_url="http://localhost:9000/v1", api_key="not-needed")
client.chat.completions.create(
    model="auto",  # ← shiftgate picks the right adapter
    messages=[{"role": "user", "content": "write a sql query"}],
)

When model="auto", shiftgate routes the request to the best adapter and rewrites model to that adapter's backend name before forwarding upstream. The response carries an X-Shiftgate-Route: <adapter_id> (<score>) header so you can see what was chosen. Passing any other model id bypasses routing and forwards verbatim. Streaming (stream: true) is piped straight through via SSE.

shiftgate serve --port 9000 --host 127.0.0.1 --backend auto   # backend: auto | ollama | vllm | cerebras

Bind defaults to 127.0.0.1 (localhost only). Pass --host 0.0.0.0 to expose it on your network.

Drop-in for Cursor / Aider / LangChain

Point each tool's OpenAI base URL at the proxy and use model="auto":

# Cursor → Settings → Models → Override OpenAI Base URL
http://localhost:9000/v1

# Aider
aider --openai-api-base http://localhost:9000/v1 --openai-api-key not-needed --model auto

# LangChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:9000/v1",
    api_key="not-needed",
    model="auto",
)

Example

shiftgate run "write a python sorting function"

╭────────────────────────── Routing Decision ──────────────────────────╮
│  Query          "write a python sorting function"                    │
│  Matched Task   Python Code Generation  ████████████████░░  91.2%    │
│  Adapter        python-lora  [meta-llama/Meta-Llama-3-8B]            │
│  Backend        vllm                                                 │
╰──────────────────────────────────────────────────────────────────────╯

Running via vllm…

────────────────────────────────── Response ──────────────────────────────────
def sort_array(arr):
    """Return a sorted copy using Python's Timsort."""
    return sorted(arr)
───────────────────────────────────────────────────────────────────────────────
Inference: 6204 ms · Total: 6246 ms

Use shiftgate route "<query>" --explain to see the full decision tree — top task matches, similarity scores, and why an adapter was chosen.

Verify your setup

Run a full health check anytime something feels off:

shiftgate doctor

shiftgate doctor checks:

Check	What it tells you
Embedder	Whether the routing embedding model loads and produces vectors
Backend	Whether Ollama (`localhost:11434`) or vLLM (`localhost:8000`) is reachable
Task embeddings	Whether all task clusters have computed centroids (`shiftgate init`)
Adapter runtime availability	For each registered adapter: linked status and whether it is loaded in the backend
Unlinked task clusters	Task clusters with no adapter wired — routing will match the task but cannot run inference

Runtime adapter verification runs automatically when you register a backend-loaded adapter:

shiftgate adapter add python-lora --runtime python-lora --tags python --base llama3
#   Backend: vllm ✓ verified        ← adapter found in the running backend
#   Backend: vllm ⚠ runtime 'python-lora' not loaded — did you pass --lora-modules?
#   Backend: not running (verification skipped)

Backend detection is automatic. shiftgate run, shiftgate status, and shiftgate doctor probe Ollama first, then vLLM. No config file required.

Architecture

User query
    │
    ▼
┌──────────────────────────────────────────────────┐
│                   shiftgate CLI                  │
│  shiftgate route / shiftgate run                 │
└────────────────────┬─────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────┐
│                    Router                        │
│                                                  │
│  1. Embed query  (fastembed BAAI/bge-small-en)   │
│  2. Cosine similarity vs task centroids          │
│  3. top-K tasks → walk preferred_adapters list   │
│  4. Return RoutingTrace                          │
└──────────┬───────────────────────┬───────────────┘
           │                       │
           ▼                       ▼
┌─────────────────┐   ┌────────────────────────────┐
│  Task Registry  │   │     Adapter Registry       │
│  ~/.shiftgate/  │   │  ~/.shiftgate/adapters.json│
│  tasks.json     │   │                            │
│  (10 defaults)  │   │  Add via:                  │
└─────────────────┘   │  shiftgate adapter add     │
                      └────────────┬───────────────┘
                                   │
                                   ▼
              ┌────────────────────────────────┐
              │        BackendRouter           │
              │                                │
              │  Ollama  (localhost:11434)     │
              │  vLLM    (localhost:8000)      │
              │  Auto-detected at runtime      │
              └────────────────────────────────┘
                                   │
                                   ▼
              ┌────────────────────────────────┐
              │       Feedback Loop            │
              │  ~/.shiftgate/traces.jsonl     │
              │  shiftgate feedback accept     │
              │  shiftgate feedback stats      │
              └────────────────────────────────┘

How routing works

When a backend is active, shiftgate filters candidate adapters to only those actually loaded on that backend. Switch from vLLM to Cerebras and shiftgate automatically picks Cerebras-compatible adapters — no re-registration needed. (When you run shiftgate route with no backend running, no filtering is applied, so you still see the full routing preview.)

Bring Your Own Models

shiftgate is a routing layer. You load weights into Ollama or vLLM first, then register what you loaded so shiftgate can route to it.

You can also catalogue adapters you have not loaded yet (Option 3) — useful for shiftgate route, but shiftgate run will not produce output until the adapter is available in a running backend.

Option 1 — Adapter already loaded in vLLM

Start vLLM with your adapters:

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Meta-Llama-3-8B \
    --enable-lora \
    --lora-modules sql-lora=/path/to/sql-lora

Register using the --lora-modules key as --runtime:

shiftgate adapter add sql-lora --runtime sql-lora --tags sql --base meta-llama/Meta-Llama-3-8B

shiftgate sends "model": "<runtime_name>" in each /v1/chat/completions request.

Option 2 — Adapter already loaded in Ollama

Create a Modelfile that bundles your base model and adapter:

# my-sql-lora.Modelfile
FROM llama3
ADAPTER /path/to/sql-lora.safetensors

ollama create sql-lora-ollama -f my-sql-lora.Modelfile
ollama serve

Register using the Ollama model name as --runtime:

shiftgate adapter add sql-lora --runtime sql-lora-ollama --tags sql --base llama3

shiftgate passes runtime_name (or falls back to id) as the Ollama model name.

Option 3 — Metadata-only registration

Catalogue an adapter without downloading weights — metadata only:

shiftgate adapter add teknium/sql-lora --tags sql --base llama3

You can also record a local path for your own reference (shiftgate still does not load the file):

shiftgate adapter add sql-lora --local /models/sql-lora --tags sql --base llama3

Useful for exploring routing decisions before your backend is set up. To run inference, load the adapter in vLLM or Ollama and re-register with --runtime.

Option 4 — Cerebras (cloud)

shiftgate also supports Cerebras as a cloud fallback. It uses Cerebras' OpenAI-compatible API and authenticates with a bearer token from the CEREBRAS_API_KEY environment variable (or the --cerebras-key global flag).

export CEREBRAS_API_KEY=csk-...
shiftgate adapter add llama3.1-8b --runtime llama3.1-8b --tags general --base llama3.1
shiftgate run "write a python sorting function"

shiftgate auto-detects backends in the order Ollama → vLLM → Cerebras, so local backends always win and Cerebras is used only when no local backend is running.

Honest status: shiftgate routes to Cerebras' base-model inference today. When Cerebras Multi-LoRA goes public, register your adapter with --runtime <cerebras-lora-id> and it just works — no shiftgate update needed.

How to contribute adapters

Fork this repo.
Publish your adapter to HuggingFace and open a PR that documents it in a Community Adapters section (or add it to your local registry with shiftgate adapter add).
The adapter registry ships empty by design — adapters are user-managed via ~/.shiftgate/adapters.json.

To add a task cluster that better matches your domain, run shiftgate task add interactively or edit ~/.shiftgate/tasks.json and add validation_examples that represent real queries your users ask. Run shiftgate init to recompute centroids.

`~/.shiftgate/` layout

~/.shiftgate/
├── adapters.json          # your registered adapters
├── tasks.json             # task clusters (copied from defaults on first init)
├── traces.jsonl           # append-only routing trace log
└── embeddings_cache.npy   # cached centroids — delete to force re-embedding

Roadmap

Version	Focus
v0.1	Single base model, multi-adapter routing ← current
v0.2	Feedback loop + adapter scoring (auto-demote bad adapters)
v0.3	Multi-model routing (route to different base models per task)
v1.0	Community registry + web UI

Development

# Clone and install in editable mode with all dev dependencies
git clone https://github.com/shiftgate-ai/shiftgate
cd shiftgate
uv sync --extra dev   # creates .venv, installs shiftgate + dev deps

# Run tests (no GPU needed — tests use synthetic embeddings)
uv run pytest

# Run the demo inside the venv
uv run shiftgate demo

Note: uv sync reads pyproject.toml and resolves a locked environment.
There is no need to run pip install manually. Activate the venv with
.venv/Scripts/activate (Windows) or source .venv/bin/activate (macOS/Linux)
if you want the shiftgate command on your PATH without the uv run prefix.

Releases and Publishing

Releases are managed through a CI release workflow (e.g. GitHub Actions).
No manual PyPI API token management is required for normal releases.

The recommended flow:

Bump the version in pyproject.toml (version = "x.y.z").
Open a PR, get it reviewed and merged.
Tag the commit: git tag vx.y.z && git push origin vx.y.z.
The CI workflow builds the wheel with uv build and publishes to PyPI using Trusted Publishing (OIDC)
— no stored API token needed.

For a one-off manual publish (maintainers only):

uv build                    # produces dist/shiftgate-x.y.z-py3-none-any.whl
uv publish                  # authenticates via OIDC or a scoped PyPI token

Project layout

shiftgate/
├── cli.py               # Typer CLI — all user commands
├── registry/
│   ├── schemas.py       # Pydantic models: AdapterEntry, TaskCluster, RoutingTrace
│   ├── adapter_registry.py
│   └── task_registry.py
├── router/
│   ├── embedder.py      # fastembed wrapper (CPU, singleton)
│   ├── matcher.py       # cosine similarity, top-K, adapter selection
│   └── router.py        # orchestrates embed → match → trace
├── runtime/
│   └── backend.py       # OllamaBackend, VLLMBackend, BackendRouter
├── feedback/
│   └── loop.py          # trace persistence, accept/reject, scoring
└── utils/
    └── display.py       # Rich panels, tables, animations

All commands

Command	Description
`shiftgate init`	First-time setup: initialise `~/.shiftgate/`, compute task embeddings
`shiftgate route "<query>"`	Route a query and show the decision — no inference
`shiftgate route "<query>" --explain`	Full decision tree: task scores, candidates, selection reason
`shiftgate run "<query>"`	Route + run via Ollama or vLLM
`shiftgate serve [--port 9000] [--host …] [--backend …]`	Run an OpenAI-compatible auto-routing proxy
`shiftgate doctor`	Full health check: embedder, backend, adapters, task embeddings
`shiftgate adapter add <hf_repo> [--tags …] [--base …]`	Register adapter from HuggingFace (metadata only)
`shiftgate adapter add <id> --local <path> [--tags …]`	Register a local adapter path
`shiftgate adapter add <id> --runtime <name> [--tags …]`	Register a backend-loaded adapter by its runtime name
`shiftgate adapter list`	Table of all registered adapters
`shiftgate adapter remove <id>`	Remove an adapter
`shiftgate task list`	Table of all task clusters
`shiftgate task add`	Interactively add a new task cluster
`shiftgate feedback accept`	Mark last routing as good
`shiftgate feedback reject`	Mark last routing as bad
`shiftgate feedback stats`	Adapter acceptance rate table
`shiftgate status`	Backend connectivity + registry summary
`shiftgate demo`	Animated demo with fake routing traces

References

LORAUTER — Effective LoRA Adapter Routing using Task Representations (Dhasade et al., EPFL, 2026). shiftgate's task-level semantic routing is inspired by this work; it is not a reimplementation of the paper's full algorithm.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
assets		assets
data		data
shiftgate		shiftgate
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shiftgate ⚡

Quickstart

1. Install

2. Start your backend

3. Initialise shiftgate

4. Register adapters

5. Run a query

Use as an OpenAI-compatible proxy

Drop-in for Cursor / Aider / LangChain

Example

Verify your setup

Architecture

How routing works

Bring Your Own Models

Option 1 — Adapter already loaded in vLLM

Option 2 — Adapter already loaded in Ollama

Option 3 — Metadata-only registration

Option 4 — Cerebras (cloud)

How to contribute adapters

`~/.shiftgate/` layout

Roadmap

Development

Releases and Publishing

Project layout

All commands

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

shiftgate ⚡

Quickstart

1. Install

2. Start your backend

3. Initialise shiftgate

4. Register adapters

5. Run a query

Use as an OpenAI-compatible proxy

Drop-in for Cursor / Aider / LangChain

Example

Verify your setup

Architecture

How routing works

Bring Your Own Models

Option 1 — Adapter already loaded in vLLM

Option 2 — Adapter already loaded in Ollama

Option 3 — Metadata-only registration

Option 4 — Cerebras (cloud)

How to contribute adapters

~/.shiftgate/ layout

Roadmap

Development

Releases and Publishing

Project layout

All commands

References

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`~/.shiftgate/` layout

Packages