-
Notifications
You must be signed in to change notification settings - Fork 42
feat: make deploy-ollama script generic #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make deploy-ollama script generic #80
Conversation
26da3eb to
69ab1d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool to see support for anther provider! I believe vLLM should work without the scc logic, so maybe moving that call to quickstart-scc.sh into a if [ "${PROVIDER}" = "ollama" ]; then block would be enough.
Thank you!
hack/deploy-quickstart.sh
Outdated
| # OpenShift requires specific permissions in order for the container to run as uid 0 | ||
| if kubectl api-resources --api-group=security.openshift.io | grep -iq 'SecurityContextConstraints'; then | ||
| "${SCRIPT_DIR}/quickstart-scc.sh" "${PROVIDER}" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some good news here: I only know this SecurityContextConstraint to be required for ollama container deployments in particular.
It would be worth using the default restricted-v2 SCC (no extra ServiceAccount config needed) unless the chosen provider deployment type requires it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, nice
for the ollama: we need SA, SCC, Role, RB, and set security context in Deployment
for vllm: we can skip all ^ these. set annotation to restricted-v2
is this correct understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! The annotation was just to meant to convey which SecurityContextConstraint the ollama-sa ServiceAccount uses, but shouldn't be necessary for others. It can be omitted for vllm-sa for example, because a newly created SA will use the restricted-v2 SCC by default.
So for just the ollama provider we can run hack/quickstart-scc.sh, and I expect we can safely skip it for other providers.
You are right 👍 |
febb625 to
39474a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we add more providers in the future, we may want to organize the yaml resources in plain .yaml files in config/samples to make the examples easy to understand, but this approach may be ok for now.
I'll see if @leseb and @VaishnaviHire have a preference on if we should look into this approach now for vLLM, or in a later refactor.
README.md
Outdated
|
|
||
| **vLLM Examples:** | ||
|
|
||
| This would require a secret "hf-token-secret" in namesapce "vllm-dist" for HuggingFace token (required for downloading models) to be created in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to mention the format that the BASH_REMATCH commands match in utils.sh, or maybe prompt for the value if the secret doesn't already exist.
And a slight formatting change:
namesapce ->
namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did some updates to include the typo also add the "error out" on the missing secret case.
972e8dc to
49ffc9f
Compare
- rename old deploy-ollama.sh to deploy-quickstart.sh - introduce a utils.sh to host common functions - make deploy-quickstart.sh works for both ollama (default provider) and vllm - make new flag --model for user to pass in different model than default llama3.2:1b - make new flag --runtime-args --runtime-env for user to provide customized config - rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh - update documents to reflect changes - default config for vllm is on CPU only, if run on GPU should set env or args Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- formatting - move "scc + security context" into utils.sh - only create if provider is ollma - vllm does not require uid 0 to run - can extend later if other provider we will support - remove check on SA before create SCC Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- olllma: create SA, Role, RoleBinding, not set annotation in deployment - vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment - both set securitycontext in deployment but based on different provider Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- move load_provider_config() to utils.sh to make deploy-quickstart.sh clean
- add check on secret exist if it is set within --runtime-env
- fix typo
- apply "${var}" for code consistency
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
49ffc9f to
b41c701
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that covers all the questions I had. Thanks!
change description:
- for security:
- olllma: create SA, Role, RoleBinding, not set annotation in deployment
- vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment
- both set securitycontext in deployment but based on different provider
- rename old deploy-ollama.sh to deploy-quickstart.sh
- introduce a utils.sh to host common functions
- make deploy-quickstart.sh works for both ollama (default provider) and vllm
- make new flag --model for user to pass in different model than default llama3.2:1b
- make new flag --runtime-args --runtime-env for user to provide customized config
vllm :https://github.com/vllm-project/vllm/blame/main/docs/deployment/k8s.md#L77
ollama: keep old behavior https://github.com/llamastack/llama-stack-k8s-operator/blob/main/hack/deploy-ollama.sh#L86 make --keepalive as env variable
- rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh
- update documents to reflect changes
- default config for vllm is on CPU only, if run on GPU should set env or args
- add readinessprob
ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#version
vllm: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L414
to fix: llamastack#76
Approved-by: rhdedgar
(cherry picked from commit 466cb83)
change description:
- for security:
- olllma: create SA, Role, RoleBinding, not set annotation in deployment
- vllm: do not create SA, Role, RoleBinding, but set restricted-v2 SCC in deployment
- both set securitycontext in deployment but based on different provider
- rename old deploy-ollama.sh to deploy-quickstart.sh
- introduce a utils.sh to host common functions
- make deploy-quickstart.sh works for both ollama (default provider) and vllm
- make new flag --model for user to pass in different model than default llama3.2:1b
- make new flag --runtime-args --runtime-env for user to provide customized config
vllm :https://github.com/vllm-project/vllm/blame/main/docs/deployment/k8s.md#L77
ollama: keep old behavior https://github.com/llamastack/llama-stack-k8s-operator/blob/main/hack/deploy-ollama.sh#L86 make --keepalive as env variable
- rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh
- update documents to reflect changes
- default config for vllm is on CPU only, if run on GPU should set env or args
- add readinessprob
ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#version
vllm: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L414
to fix: llamastack#76
Approved-by: rhdedgar
(cherry picked from commit 466cb83)
change description:
- rename old deploy-ollama.sh to deploy-quickstart.sh
- introduce a utils.sh to host common functions
- make deploy-quickstart.sh works for both ollama (default provider) and vllm
- make new flag --model for user to pass in different model than default llama3.2:1b
- make new flag --runtime-args --runtime-env for user to provide customized config
vllm :https://github.com/vllm-project/vllm/blame/main/docs/deployment/k8s.md#L77
ollama: keep old behavior https://github.com/llamastack/llama-stack-k8s-operator/blob/main/hack/deploy-ollama.sh#L86 make --keepalive as env variable
- rename old ollama-scc.sh to quickstart-scc.sh to support vllm if called via deploy-quickstart.sh
- update documents to reflect changes
- default config for vllm is on CPU only, if run on GPU should set env or args
- add readinessprob
ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#version
vllm: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L414
to fix: #76