Releases: modelplaneai/modelplane
Modelplane v0.1.0
Modelplane v0.1.0
The first release of Modelplane, the open source control plane for AI inference.
Open-weight models are becoming the default, and inference almost always outgrows a single cluster. The in-cluster stack (vLLM, SGLang, llm-d, Dynamo, LeaderWorkerSet, DRA) is strong, but the fleet above it has been left to proprietary stacks: placement, routing, capacity, and caching. Modelplane does for a fleet of inference clusters what Kubernetes does for one.
Platform teams describe the fleet. ML teams describe a model. Modelplane fills in everything between, continuously reconciling the fleet toward the state you declare. It's built on Crossplane, Apache 2.0 licensed, and runs in your own environment across cloud, neocloud, and on-premise.
In this release:
- Provisioning. Create GPU clusters and node pools, or bring your own, with the serving stack installed on each.
- Fleet scheduling. A two-level scheduler pins each replica to a cluster and pool whose hardware fits, then DRA binds GPUs to pods.
- Routing. One unified, OpenAI-compatible gateway that balances load across replicas wherever they run, with fallback to managed providers.
- Caching. Stage model weights once per cluster on shared storage.
- Universal serving. Any model, any container-based engine, any accelerator, from a single GPU to multi-node and prefill/decode disaggregation.
Install:
xpkg.upbound.io/modelplane/modelplane:v0.1.0
Get started: go from an empty control plane to a live endpoint served across regions in about 45 minutes at https://v0-1.docs.modelplane.ai/getting-started/
The API is v1alpha1 and will evolve. We're building Modelplane in the open with the inference community, and plan to donate it to a neutral open source foundation later this year.
v0.1.0-rc.2
What's Changed
- docs: Getting Started with Modelplane by @tr0njavolta in #169
- Allow fieldRef env in ModelDeployment engine templates by @dennis-upbound in #196
- Make the NVIDIA DRA driver work on GKE by @dennis-upbound in #203
- Propagate ModelCache authSecret to inference clusters by @negz in #205
- Set an explicit 5s readiness probe timeout on the engine container by @negz in #207
- Replace the composition function skill with test guidance in CONTRIBUTING by @negz in #191
- Constrain ModelDeployment placement to its ModelCache footprint by @negz in #189
- Support EFA fabric on EKS GPU node pools by @negz in #198
- Stop function images from shipping a second Python and the whole repo by @negz in #195
- docs: Restructure Get Started by @tr0njavolta in #200
- docs: GKE tab in part 2 by @tr0njavolta in #211
- Docs site: restyle, app-shell layout, and docs.modelplane.ai by @bassam in #214
- Increase Nix log-lines in CI to show full failure output by @negz in #216
- Index the docs site for search with its own Algolia app by @bassam in #217
- Repin crossplane-cli from negz/cli fork to crossplane/cli main by @negz in #219
- Make the docs AI-forward: Markdown, llms.txt, and an MCP server by @bassam in #222
- docs: Navigation and code block copy by @tr0njavolta in #223
- Rework the docs example manifests by @negz in #213
- Enable the Filestore CSI driver addon on GKE clusters by @negz in #228
- Orphan EKS and GKE in-cluster resources to unblock teardown by @negz in #227
Full Changelog: v0.1.0-rc.1...v0.1.0-rc.2
v0.1.0-rc.1
What's Changed
- Update with manifests path by @tr0njavolta in #167
- Auto-provision EFS RWX storage for ModelCache on EKS by @dennis-upbound in #164
- Disable the HTTPRoute request timeout for model traffic by @dennis-upbound in #174
- Pin dependencies to exact versions by @negz in #170
- Make PrefillDecode actually disaggregate by @dennis-upbound in #175
- Build function images without cross-compilation or emulation by @negz in #181
- Support backing EKS GPU node groups with Capacity Blocks by @negz in #113
- Move ModelCache storage provisioning into the cluster XRs by @negz in #176
- Build function image wheels on the build host, not the target arch by @negz in #185
- Autoscale EKS GPU node pools with the cluster autoscaler by @negz in #183
- Free disk space before building and pushing the package by @negz in #188
- Build the Crossplane CLI for one platform and cache CI builds by @negz in #190
Full Changelog: v0.1.0-rc.0...v0.1.0-rc.1
v0.1.0-rc.0
The first release candidate for Modelplane v0.1.0, and the first published Modelplane package.
At this point Modelplane can provision GKE or EKS clusters, or run on existing ones via a supplied kubeconfig. It serves models with any OpenAI-compatible engine (vLLM tested), running either single-node or multi-node across a LeaderWorkerSet gang, and schedules replicas across a multi-cluster GPU fleet using Dynamic Resource Allocation to bind GPUs. A ModelDeployment scales via spec.replicas, and a ModelService routes traffic across replicas behind one unified OpenAI-compatible endpoint. It can stage models onto a per-cluster cache ahead of serving, and supports disaggregated prefill/decode serving.
Install with the Crossplane CLI:
crossplane xpkg install configuration xpkg.upbound.io/modelplane/modelplane:v0.1.0-rc.0
See the getting started guide to deploy Modelplane and serve a model.