Skip to content

Conversation

@zdtsw
Copy link
Contributor

@zdtsw zdtsw commented Jun 25, 2025

change description:

to fix: #76

@zdtsw zdtsw force-pushed the feature-from-upstream-main branch 3 times, most recently from 26da3eb to 69ab1d2 Compare June 26, 2025 10:20
@mfleader mfleader requested a review from rhdedgar June 26, 2025 12:31
Copy link
Collaborator

@rhdedgar rhdedgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool to see support for anther provider! I believe vLLM should work without the scc logic, so maybe moving that call to quickstart-scc.sh into a if [ "${PROVIDER}" = "ollama" ]; then block would be enough.

Thank you!

Comment on lines 126 to 129
# OpenShift requires specific permissions in order for the container to run as uid 0
if kubectl api-resources --api-group=security.openshift.io | grep -iq 'SecurityContextConstraints'; then
"${SCRIPT_DIR}/quickstart-scc.sh" "${PROVIDER}"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some good news here: I only know this SecurityContextConstraint to be required for ollama container deployments in particular.

It would be worth using the default restricted-v2 SCC (no extra ServiceAccount config needed) unless the chosen provider deployment type requires it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, nice
for the ollama: we need SA, SCC, Role, RB, and set security context in Deployment
for vllm: we can skip all ^ these. set annotation to restricted-v2
is this correct understanding?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! The annotation was just to meant to convey which SecurityContextConstraint the ollama-sa ServiceAccount uses, but shouldn't be necessary for others. It can be omitted for vllm-sa for example, because a newly created SA will use the restricted-v2 SCC by default.

So for just the ollama provider we can run hack/quickstart-scc.sh, and I expect we can safely skip it for other providers.

@zdtsw
Copy link
Contributor Author

zdtsw commented Jun 27, 2025

Very cool to see support for anther provider! I believe vLLM should work without the scc logic, so maybe moving that call to quickstart-scc.sh into a if [ "${PROVIDER}" = "ollama" ]; then block would be enough.

Thank you!

You are right 👍
unless needed (in ollama case) should not grant scc on uid 0, which can be a security concern, esp. vllm does not run on root. I will fix this

@zdtsw zdtsw force-pushed the feature-from-upstream-main branch 3 times, most recently from febb625 to 39474a3 Compare June 27, 2025 10:46
@rhdedgar rhdedgar self-assigned this Jul 1, 2025
Copy link
Collaborator

@rhdedgar rhdedgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we add more providers in the future, we may want to organize the yaml resources in plain .yaml files in config/samples to make the examples easy to understand, but this approach may be ok for now.

I'll see if @leseb and @VaishnaviHire have a preference on if we should look into this approach now for vLLM, or in a later refactor.

README.md Outdated

**vLLM Examples:**

This would require a secret "hf-token-secret" in namesapce "vllm-dist" for HuggingFace token (required for downloading models) to be created in advance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to mention the format that the BASH_REMATCH commands match in utils.sh, or maybe prompt for the value if the secret doesn't already exist.

And a slight formatting change:
namesapce ->
namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did some updates to include the typo also add the "error out" on the missing secret case.

@rhdedgar rhdedgar requested a review from VaishnaviHire July 1, 2025 20:49
@zdtsw zdtsw force-pushed the feature-from-upstream-main branch from 972e8dc to 49ffc9f Compare July 2, 2025 07:33
zdtsw added 4 commits July 2, 2025 18:06
- rename old deploy-ollama.sh to deploy-quickstart.sh
- introduce a utils.sh to host common functions
- make deploy-quickstart.sh works for both ollama (default provider) and vllm
- make new flag --model for user to pass in different model than default llama3.2:1b
- make new flag --runtime-args --runtime-env for user to provide customized config
- rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh
- update documents to reflect changes
- default config for vllm is on CPU only, if run on GPU should set env or args

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- formatting
- move "scc + security context" into utils.sh
  - only create if provider is ollma
  - vllm does not require uid 0 to run
  - can extend later if other provider we will support
- remove check on SA before create SCC

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- olllma: create SA, Role, RoleBinding, not set annotation in deployment
- vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment
- both set securitycontext in deployment but based on different provider

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- move load_provider_config() to utils.sh to make deploy-quickstart.sh clean
- add check on secret exist if it is set within --runtime-env
- fix typo
- apply "${var}" for code consistency

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@zdtsw zdtsw force-pushed the feature-from-upstream-main branch from 49ffc9f to b41c701 Compare July 2, 2025 16:07
@leseb leseb requested a review from rhdedgar July 11, 2025 13:16
Copy link
Collaborator

@rhdedgar rhdedgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that covers all the questions I had. Thanks!

@rhdedgar rhdedgar removed the request for review from VaishnaviHire July 11, 2025 19:36
@mergify mergify bot merged commit 466cb83 into llamastack:main Jul 11, 2025
6 checks passed
VaishnaviHire pushed a commit to VaishnaviHire/llama-stack-k8s-operator that referenced this pull request Jul 16, 2025
change description:
   -  for security:
      - olllma: create SA, Role, RoleBinding, not set annotation in deployment
      - vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment
      - both set securitycontext in deployment but based on different provider
    - rename old deploy-ollama.sh to deploy-quickstart.sh
    - introduce a utils.sh to host common functions
    - make deploy-quickstart.sh works for both ollama (default provider) and vllm
    - make new flag --model for user to pass in different model than default llama3.2:1b
    - make new flag --runtime-args --runtime-env for user to provide customized config
   vllm :https://github.com/vllm-project/vllm/blame/main/docs/deployment/k8s.md#L77
  ollama: keep old behavior https://github.com/llamastack/llama-stack-k8s-operator/blob/main/hack/deploy-ollama.sh#L86 make --keepalive as env variable
    - rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh
    - update documents to reflect changes
    - default config for vllm is on CPU only, if run on GPU should set env or args
    -  add readinessprob
  ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#version
  vllm: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L414

to fix: llamastack#76

Approved-by: rhdedgar
(cherry picked from commit 466cb83)
VaishnaviHire pushed a commit to VaishnaviHire/llama-stack-k8s-operator that referenced this pull request Jul 16, 2025
change description:
   -  for security:
      - olllma: create SA, Role, RoleBinding, not set annotation in deployment
      - vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment
      - both set securitycontext in deployment but based on different provider
    - rename old deploy-ollama.sh to deploy-quickstart.sh
    - introduce a utils.sh to host common functions
    - make deploy-quickstart.sh works for both ollama (default provider) and vllm
    - make new flag --model for user to pass in different model than default llama3.2:1b
    - make new flag --runtime-args --runtime-env for user to provide customized config
   vllm :https://github.com/vllm-project/vllm/blame/main/docs/deployment/k8s.md#L77
  ollama: keep old behavior https://github.com/llamastack/llama-stack-k8s-operator/blob/main/hack/deploy-ollama.sh#L86 make --keepalive as env variable
    - rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh
    - update documents to reflect changes
    - default config for vllm is on CPU only, if run on GPU should set env or args
    -  add readinessprob
  ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#version
  vllm: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L414

to fix: llamastack#76

Approved-by: rhdedgar
(cherry picked from commit 466cb83)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add quick start script for vLLM Deployment

2 participants