Skip to content

Conversation

justinsb
Copy link
Member

@justinsb justinsb commented Sep 9, 2025

This example demonstrates how we can serve models from inside the cluster,
without needing to bake them into the container images,
or rely on pulling them from services like huggingface.

We may also in future want to support storing models in GCS or S3,
but this example focuses on storing models without cloud dependencies.

We may also want to investigate serving models from container images,
particularly given the upcoming support for mounting container images
as volumes, but this approach works today and allows for more
dynamic model loading (e.g. loading new models without restarting pods).
Moreover, a container image server is backed by a blob server,
as introduced here.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: justinsb
Once this PR has been reviewed and has the lgtm label, please assign idvoretskyi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from kow3ns September 9, 2025 13:29
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 9, 2025
@k8s-ci-robot k8s-ci-robot requested a review from soltysh September 9, 2025 13:29
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 9, 2025
@justinsb justinsb force-pushed the modelcloud_example branch 2 times, most recently from 287ae85 to 97a98d7 Compare September 9, 2025 15:21
This example demonstrates how we can serve models from inside the cluster,
without needing to bake them into the container images.

We may also in future want to support storing models in GCS or S3,
but this example focuses on storing models without cloud dependencies.

We may also want to investigate serving models from container images,
particularly given the upcoming support for mounting container images
as volumes, but this approach works today and allows for more
dynamic model loading (e.g. loading new models without restarting pods).
Moreover, a container image server is backed by a blob server,
as introduced here.
@justinsb
Copy link
Member Author

justinsb commented Sep 9, 2025

Heavily inspired by @seans3 's work in vllm-deployment example! And now with a readme (with similar inspiration).

Looks like we aren't checking copyright headers so I will look into adding that in a separate PR

@janetkuo
Copy link
Member

/assign

on AI-conformant kubernetes clusters.

We (aspirationally) aim to demonstrate the capabilities of the AI-conformance
profile. Where we cannot achieve production-grade inference, we hope to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove "profile" everywhere. We were suggested not to use the term "profile" for Kubernetes AI conformance, given that there was historically an effort to define subsets (not supersets) of Kubernetes Conformance with this term.

def get_image_prefix():
"""Constructs the image prefix for a container image."""
project_id = get_gcp_project()
return f"gcr.io/{project_id}/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: adopt the same change in kubernetes-sigs/agent-sandbox#13 with changes like supporting IMAGE_PREFIX env etc.

# gemma3-6cf4765df9-c4nmt gemma3 DEBUG 09-08 14:57:56 [__init__.py:99] CUDA platform is not available because: NVML Shared Library Not Found


# FROM vllm/vllm-openai:v0.10.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove commented out part of it's not needed


1. `blob-server`, a statefulset with a persistent volume to hold the model blobs (files)

1. `gemma3`, a deployment running vLLM, with a frontend go process that will download the model from `blob-server`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to merge this with https://github.com/kubernetes/examples/tree/master/AI/vllm-deployment? The other one doesn't provide persistent model storage.


```bash
kubectl delete deployment gemma3
kubectl delete statefulset blob-server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to delete the PVC as well for full cleanup

selector:
matchLabels:
app: blob-server
#serviceName: blob-server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove given that it's not needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants