Modelplane v0.1.0
The first release of Modelplane, the open source control plane for AI inference.
Open-weight models are becoming the default, and inference almost always outgrows a single cluster. The in-cluster stack (vLLM, SGLang, llm-d, Dynamo, LeaderWorkerSet, DRA) is strong, but the fleet above it has been left to proprietary stacks: placement, routing, capacity, and caching. Modelplane does for a fleet of inference clusters what Kubernetes does for one.
Platform teams describe the fleet. ML teams describe a model. Modelplane fills in everything between, continuously reconciling the fleet toward the state you declare. It's built on Crossplane, Apache 2.0 licensed, and runs in your own environment across cloud, neocloud, and on-premise.
In this release:
- Provisioning. Create GPU clusters and node pools, or bring your own, with the serving stack installed on each.
- Fleet scheduling. A two-level scheduler pins each replica to a cluster and pool whose hardware fits, then DRA binds GPUs to pods.
- Routing. One unified, OpenAI-compatible gateway that balances load across replicas wherever they run, with fallback to managed providers.
- Caching. Stage model weights once per cluster on shared storage.
- Universal serving. Any model, any container-based engine, any accelerator, from a single GPU to multi-node and prefill/decode disaggregation.
Install:
xpkg.upbound.io/modelplane/modelplane:v0.1.0
Get started: go from an empty control plane to a live endpoint served across regions in about 45 minutes at https://v0-1.docs.modelplane.ai/getting-started/
The API is v1alpha1 and will evolve. We're building Modelplane in the open with the inference community, and plan to donate it to a neutral open source foundation later this year.