Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions clients/python/llmengine/data_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,9 @@ class GetLLMEndpointResponse(BaseModel):
source: LLMSource = Field(description="The source of the model, e.g. Hugging Face.")
"""The source of the model, e.g. Hugging Face."""

status: ModelEndpointStatus = Field(description="The status of the model.")
"""The status of the model (can be one of "READY", "UPDATE_PENDING", "UPDATE_IN_PROGRESS", "UPDATE_FAILED", "DELETE_IN_PROGRESS")."""

inference_framework: LLMInferenceFramework = Field(
description="The inference framework used by the model."
)
Expand Down
1 change: 1 addition & 0 deletions clients/python/llmengine/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,7 @@ def get(
"name": "llama-2-7b.suffix.2023-07-18-12-00-00",
"model_name": null,
"source": "hugging_face",
"status": "READY",
"inference_framework": "text_generation_inference",
"inference_framework_tag": null,
"num_shards": null,
Expand Down
1 change: 1 addition & 0 deletions docs/api/data_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
- inference_framework
- id
- model_name
- status
- inference_framework_tag
- num_shards
- quantize
Expand Down
15 changes: 15 additions & 0 deletions docs/guides/endpoint_creation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
When creating a model endpoint, you can periodically poll the model status field to
track the status of your model endpoint. In general, you'll need to wait after the
model creation step for the model endpoint to be ready and available for use.
An example is provided below:

*Assuming the user has created a model named "llama-2-7b.suffix.2023-07-18-12-00-00"*
```
model_name = "llama-2-7b.suffix.2023-07-18-12-00-00"
response = Model.get(model_name)
while response.status != "READY":
time.sleep(60)
response = Model.get(model_name)
```

Once the endpoint status is ready, you can use your newly created model for inference.