feat: make deploy-ollama script generic #80

zdtsw · 2025-06-25T18:02:01Z

change description:

for security:
- olllma: create SA, Role, RoleBinding, not set annotation in deployment
- vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment
- both set securitycontext in deployment but based on different provider
  - rename old deploy-ollama.sh to deploy-quickstart.sh
  - introduce a utils.sh to host common functions
  - make deploy-quickstart.sh works for both ollama (default provider) and vllm
  - make new flag --model for user to pass in different model than default llama3.2:1b
  - make new flag --runtime-args --runtime-env for user to provide customized config
  vllm :https://github.com/vllm-project/vllm/blame/main/docs/deployment/k8s.md#L77
  ollama: keep old behavior https://github.com/llamastack/llama-stack-k8s-operator/blob/main/hack/deploy-ollama.sh#L86 make --keepalive as env variable
  - rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh
  - update documents to reflect changes
  - default config for vllm is on CPU only, if run on GPU should set env or args
  - add readinessprob
  ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#version
  vllm: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L414

to fix: #76

rhdedgar

Very cool to see support for anther provider! I believe vLLM should work without the scc logic, so maybe moving that call to quickstart-scc.sh into a if [ "${PROVIDER}" = "ollama" ]; then block would be enough.

Thank you!

hack/deploy-quickstart.sh

rhdedgar · 2025-06-26T23:25:42Z

hack/deploy-quickstart.sh

+# OpenShift requires specific permissions in order for the container to run as uid 0
+if kubectl api-resources --api-group=security.openshift.io | grep -iq 'SecurityContextConstraints'; then
+  "${SCRIPT_DIR}/quickstart-scc.sh" "${PROVIDER}"
+fi


Some good news here: I only know this SecurityContextConstraint to be required for ollama container deployments in particular.

It would be worth using the default restricted-v2 SCC (no extra ServiceAccount config needed) unless the chosen provider deployment type requires it.

oh, nice
for the ollama: we need SA, SCC, Role, RB, and set security context in Deployment
for vllm: we can skip all ^ these. set annotation to restricted-v2
is this correct understanding?

Yep! The annotation was just to meant to convey which SecurityContextConstraint the ollama-sa ServiceAccount uses, but shouldn't be necessary for others. It can be omitted for vllm-sa for example, because a newly created SA will use the restricted-v2 SCC by default.

So for just the ollama provider we can run hack/quickstart-scc.sh, and I expect we can safely skip it for other providers.

hack/quickstart-scc.sh

zdtsw · 2025-06-27T07:13:35Z

Very cool to see support for anther provider! I believe vLLM should work without the scc logic, so maybe moving that call to quickstart-scc.sh into a if [ "${PROVIDER}" = "ollama" ]; then block would be enough.

Thank you!

You are right 👍
unless needed (in ollama case) should not grant scc on uid 0, which can be a security concern, esp. vllm does not run on root. I will fix this

rhdedgar

As we add more providers in the future, we may want to organize the yaml resources in plain .yaml files in config/samples to make the examples easy to understand, but this approach may be ok for now.

I'll see if @leseb and @VaishnaviHire have a preference on if we should look into this approach now for vLLM, or in a later refactor.

rhdedgar · 2025-07-01T19:36:26Z

README.md

+
+**vLLM Examples:**
+
+This would require a secret "hf-token-secret" in namesapce "vllm-dist" for HuggingFace token (required for downloading models) to be created in advance.


It might be good to mention the format that the BASH_REMATCH commands match in utils.sh, or maybe prompt for the value if the secret doesn't already exist.

And a slight formatting change:
namesapce ->
namespace

did some updates to include the typo also add the "error out" on the missing secret case.

- rename old deploy-ollama.sh to deploy-quickstart.sh - introduce a utils.sh to host common functions - make deploy-quickstart.sh works for both ollama (default provider) and vllm - make new flag --model for user to pass in different model than default llama3.2:1b - make new flag --runtime-args --runtime-env for user to provide customized config - rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh - update documents to reflect changes - default config for vllm is on CPU only, if run on GPU should set env or args Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- formatting - move "scc + security context" into utils.sh - only create if provider is ollma - vllm does not require uid 0 to run - can extend later if other provider we will support - remove check on SA before create SCC Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- olllma: create SA, Role, RoleBinding, not set annotation in deployment - vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment - both set securitycontext in deployment but based on different provider Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- move load_provider_config() to utils.sh to make deploy-quickstart.sh clean - add check on secret exist if it is set within --runtime-env - fix typo - apply "${var}" for code consistency Signed-off-by: Wen Zhou <wenzhou@redhat.com>

rhdedgar

I think that covers all the questions I had. Thanks!

change description: - for security: - olllma: create SA, Role, RoleBinding, not set annotation in deployment - vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment - both set securitycontext in deployment but based on different provider - rename old deploy-ollama.sh to deploy-quickstart.sh - introduce a utils.sh to host common functions - make deploy-quickstart.sh works for both ollama (default provider) and vllm - make new flag --model for user to pass in different model than default llama3.2:1b - make new flag --runtime-args --runtime-env for user to provide customized config vllm :https://github.com/vllm-project/vllm/blame/main/docs/deployment/k8s.md#L77 ollama: keep old behavior https://github.com/llamastack/llama-stack-k8s-operator/blob/main/hack/deploy-ollama.sh#L86 make --keepalive as env variable - rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh - update documents to reflect changes - default config for vllm is on CPU only, if run on GPU should set env or args - add readinessprob ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#version vllm: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L414 to fix: llamastack#76 Approved-by: rhdedgar (cherry picked from commit 466cb83)

zdtsw force-pushed the feature-from-upstream-main branch 3 times, most recently from 26da3eb to 69ab1d2 Compare June 26, 2025 10:20

mfleader requested a review from rhdedgar June 26, 2025 12:31

rhdedgar requested changes Jun 27, 2025

View reviewed changes

zdtsw force-pushed the feature-from-upstream-main branch 3 times, most recently from febb625 to 39474a3 Compare June 27, 2025 10:46

rhdedgar self-assigned this Jul 1, 2025

rhdedgar reviewed Jul 1, 2025

View reviewed changes

rhdedgar requested a review from VaishnaviHire July 1, 2025 20:49

zdtsw force-pushed the feature-from-upstream-main branch from 972e8dc to 49ffc9f Compare July 2, 2025 07:33

zdtsw added 4 commits July 2, 2025 18:06

update: code review + format

b41c701

- move load_provider_config() to utils.sh to make deploy-quickstart.sh clean - add check on secret exist if it is set within --runtime-env - fix typo - apply "${var}" for code consistency Signed-off-by: Wen Zhou <wenzhou@redhat.com>

zdtsw force-pushed the feature-from-upstream-main branch from 49ffc9f to b41c701 Compare July 2, 2025 16:07

leseb requested a review from rhdedgar July 11, 2025 13:16

rhdedgar approved these changes Jul 11, 2025

View reviewed changes

rhdedgar removed the request for review from VaishnaviHire July 11, 2025 19:36

mergify bot merged commit 466cb83 into llamastack:main Jul 11, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: make deploy-ollama script generic #80

feat: make deploy-ollama script generic #80

Uh oh!

zdtsw commented Jun 25, 2025 •

edited

Loading

Uh oh!

rhdedgar left a comment

Uh oh!

Uh oh!

rhdedgar Jun 26, 2025

Uh oh!

zdtsw Jun 27, 2025

Uh oh!

rhdedgar Jul 1, 2025

Uh oh!

Uh oh!

zdtsw commented Jun 27, 2025

Uh oh!

rhdedgar left a comment

Uh oh!

rhdedgar Jul 1, 2025

Uh oh!

zdtsw Jul 2, 2025

Uh oh!

rhdedgar left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		vLLM Examples:

		This would require a secret "hf-token-secret" in namesapce "vllm-dist" for HuggingFace token (required for downloading models) to be created in advance.

feat: make deploy-ollama script generic #80

feat: make deploy-ollama script generic #80

Uh oh!

Conversation

zdtsw commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhdedgar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rhdedgar Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

zdtsw Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

rhdedgar Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zdtsw commented Jun 27, 2025

Uh oh!

rhdedgar left a comment

Choose a reason for hiding this comment

Uh oh!

rhdedgar Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

zdtsw Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

rhdedgar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zdtsw commented Jun 25, 2025 •

edited

Loading