-
Notifications
You must be signed in to change notification settings - Fork 219
Documentation to support the MultiModel deployment feature [DO NOT MERGE] #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation to support the MultiModel deployment feature [DO NOT MERGE] #554
Conversation
VipulMascarenhas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some minor comments
|
|
||
| MultiModel inference and serving refers to efficiently hosting and managing multiple large language models simultaneously to serve inference requests using shared resources. The Data Science server has prebuilt **vLLM service container** that make deploying and serving multiple large language model on **single GPU Compute shape** very easy, simplifying the deployment process and reducing operational complexity. This container comes with preinstalled [**LiteLLM proxy server**]https://docs.litellm.ai/docs/simple_proxy) which routes requests to the appropriate model, ensuring seamless prediction. | ||
|
|
||
| **Multi-Model Deployment is currently in beta and is only available through the CLI. At this time, only base service LLM models are supported, and fine-tuned/registered models cannot be deployed.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use either "Multi-Model" or "MultiModel" throughout to be consistent in terminology.
| ```bash | ||
| ads aqua deployment list_shapes | ||
| ``` | ||
| ### Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Example section in this case would be redundant. Same command we show in Usage.
| ``` | ||
| ##### CLI Output | ||
|
|
||
| ```json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be short, let's reduce the result list to a couple of items.
| If no primary model is provided, the gpu allocation for A, B, C could be [2, 4, 2], [2, 2, 4] or [4, 2, 2] | ||
| If B is the primary model, the gpu allocation is [2, 4, 2] as B always gets the maximum gpu count. | ||
|
|
||
| `**kwargs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to mention **kwargs here. We could rather show:
-- compartment_id: [str]
The compartment OCID to retrieve the models and available model deployment shapes.
| ``` | ||
|
|
||
|
|
||
| ## List MultiModel Deployments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no difference currently between listing single and multi-model deployments. We can give a reference to the CLI tips like we did for model list. We can still mention that MultiModel deployments will have tag "aqua_multimodel": "true", associated with them.
| - [Create Model Evaluation](#create-model-evaluations) | ||
|
|
||
|
|
||
| # Introduction to MultiModel Deployment and Serving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to add a Prerequisites, Info or Limitations section where we can outline all the current limitations?
| ### Example | ||
|
|
||
| ```bash | ||
| ads aqua evaluation create --evaluation_source_id "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>" --evaluation_name "test_evaluation" --dataset_path "oci://<bucket>@<namespace>/path/to/the/dataset.jsonl" --report_path "oci://<bucket>@<namespace>/report/path/" --model_parameters '{"model":"<model_name>","max_tokens": 500, "temperature": 0.7, "top_p": 1.0, "top_k": 50}' --shape_name "VM.Standard.E4.Flex" --block_storage_size 50 --metrics '[{"name": "bertscore", "args": {}}, {"name": "rouge", "args": {}}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: For longer examples, could we split them across multiple lines for better readability?
No description provided.