Support checkpoint_path for endpoint creation #181

yunfeng-scale · 2023-07-26T20:58:12Z

LLM_ENGINE_BASE_PATH=xxx SCALE_API_KEY=yyy python

from llmengine import Model
Model.create(name="llama-7b-test", model="llama-7b", inference_framework_image_tag="0.9.3", checkpoint_path="s3://xxx/yyy.tar",public_inference=False)

yixu34

Probably fine? Think we should get @jenkspt @sam-scale @adlam-scale to get 👀 on the API change

yixu34 · 2023-07-26T21:25:33Z

clients/python/llmengine/data_types.py

    model_name: str
    source: LLMSource = LLMSource.HUGGING_FACE
-    inference_framework: LLMInferenceFramework = LLMInferenceFramework.DEEPSPEED
+    inference_framework: LLMInferenceFramework = LLMInferenceFramework.TEXT_GENERATION_INFERENCE


Is this a breaking change? Seems like we're changing default functionality. Might be fine.

i don't think people could run build deepspeed models right now

clients/python/llmengine/data_types.py

yixu34 · 2023-07-26T21:26:16Z

clients/python/llmengine/data_types.py

    metadata: Dict[str, Any]  # TODO: JSON type
    post_inference_hooks: Optional[List[str]]
-    endpoint_type: ModelEndpointType = ModelEndpointType.SYNC
+    endpoint_type: ModelEndpointType = ModelEndpointType.STREAMING


Does this line up with the default on the server?

no, but this lines up with the default endpoint type for TGI

adlam-scale · 2023-07-26T21:32:25Z

FYI @juliashuieh ran into trouble with the tar strategy because (I strongly suspect) of out of disk space issues. the tar file is 80G, so unzipped (the original tar file is not deleted) it takes up 160G. Then there are safetensors that are created which take up another 80G which subsequently exceeds the storage space of the pod (which is 200G). Would be nice to untar + delete or increase the hd space

yunfeng-scale · 2023-07-26T22:01:44Z

FYI @juliashuieh ran into trouble with the tar strategy because (I strongly suspect) of out of disk space issues. the tar file is 80G, so unzipped (the original tar file is not deleted) it takes up 160G. Then there are safetensors that are created which take up another 80G which subsequently exceeds the storage space of the pod (which is 200G). Would be nice to untar + delete or increase the hd space

storage size can be specified in API

yunfeng-scale · 2023-07-26T22:05:34Z

FYI @juliashuieh ran into trouble with the tar strategy because (I strongly suspect) of out of disk space issues. the tar file is 80G, so unzipped (the original tar file is not deleted) it takes up 160G. Then there are safetensors that are created which take up another 80G which subsequently exceeds the storage space of the pod (which is 200G). Would be nice to untar + delete or increase the hd space

also i believe we do delete after untar, but still we'd need 160G space to untar. default storage is 96GB

clients/python/llmengine/model.py

jenkspt · 2023-07-27T21:30:59Z

I get this is out-of-scope for this PR - but not sure where to put it. Can remove the requirement to tar our checkpoints?

jenkspt · 2023-07-27T22:28:09Z

Seems I can't use the default min_workers=0

BadRequestError                           Traceback (most recent call last)
Cell In[4], line 1
----> 1 out = Model.create(name='test-falcon-7b-deploy', model='falcon-7b', checkpoint_path='s3://scale-ml/users/penn/deploy/test-falcon-7b-deploy.tar', inference_framework_image_tag="0.9.3")

File ~/llm-engine/clients/python/llmengine/api_engine.py:27, in assert_self_hosted.<locals>.inner(*args, **kwargs)
     25 if SPELLBOOK_API_URL == LLM_ENGINE_BASE_PATH:
     26     raise ValueError("This feature is only available for self-hosted users.")
---> 27 return func(*args, **kwargs)

File ~/llm-engine/clients/python/llmengine/model.py:195, in Model.create(cls, name, model, inference_framework_image_tag, source, inference_framework, num_shards, quantize, checkpoint_path, cpus, memory, storage, gpus, min_workers, max_workers, per_worker, endpoint_type, gpu_type, high_priority, post_inference_hooks, default_callback_url, public_inference, labels)
    168             post_inference_hooks_strs.append(hook)
    170 request = CreateLLMEndpointRequest(
    171     name=name,
    172     model_name=model,
   (...)
    193     public_inference=public_inference,
    194 )
--> 195 response = cls.post_sync(
    196     resource_name="v1/llm/model-endpoints",
    197     data=request.dict(),
    198     timeout=DEFAULT_TIMEOUT,
    199 )
    200 return CreateLLMEndpointResponse.parse_obj(response)

File ~/llm-engine/clients/python/llmengine/api_engine.py:96, in APIEngine.post_sync(cls, resource_name, data, timeout)
     88 response = requests.post(
     89     os.path.join(LLM_ENGINE_BASE_PATH, resource_name),
     90     json=data,
   (...)
     93     auth=(api_key, ""),
     94 )
     95 if response.status_code != 200:
---> 96     raise parse_error(response.status_code, response.content)
     97 payload = response.json()
     98 return payload

BadRequestError: Requested min workers 0 too low

using min_workers=1 works fine

Support checkpoint_path for endpoint creation

a94d420

yixu34 approved these changes Jul 26, 2023

View reviewed changes

comment

eeaa78c

jenkspt reviewed Jul 27, 2023

View reviewed changes

clients/python/llmengine/model.py Show resolved Hide resolved

jenkspt reviewed Jul 27, 2023

View reviewed changes

clients/python/llmengine/model.py Show resolved Hide resolved

comments

b545f7d

yunfeng-scale merged commit e044a96 into main Jul 28, 2023

yunfeng-scale deleted the yunfeng-checkpoint-path branch July 28, 2023 17:42

saiatmakuri mentioned this pull request Aug 31, 2023

Parsing Error Raised when Attempting to Run Example Code #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support checkpoint_path for endpoint creation #181

Support checkpoint_path for endpoint creation #181

Uh oh!

yunfeng-scale commented Jul 26, 2023 •

edited

Loading

Uh oh!

yixu34 left a comment

Uh oh!

yixu34 Jul 26, 2023

Uh oh!

yunfeng-scale Jul 26, 2023

Uh oh!

Uh oh!

yixu34 Jul 26, 2023

Uh oh!

yunfeng-scale Jul 26, 2023

Uh oh!

adlam-scale commented Jul 26, 2023

Uh oh!

yunfeng-scale commented Jul 26, 2023

Uh oh!

yunfeng-scale commented Jul 26, 2023

Uh oh!

Uh oh!

Uh oh!

jenkspt commented Jul 27, 2023

Uh oh!

jenkspt commented Jul 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Support checkpoint_path for endpoint creation #181

Support checkpoint_path for endpoint creation #181

Uh oh!

Conversation

yunfeng-scale commented Jul 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yixu34 left a comment

Choose a reason for hiding this comment

Uh oh!

yixu34 Jul 26, 2023

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale Jul 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yixu34 Jul 26, 2023

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale Jul 26, 2023

Choose a reason for hiding this comment

Uh oh!

adlam-scale commented Jul 26, 2023

Uh oh!

yunfeng-scale commented Jul 26, 2023

Uh oh!

yunfeng-scale commented Jul 26, 2023

Uh oh!

Uh oh!

Uh oh!

jenkspt commented Jul 27, 2023

Uh oh!

jenkspt commented Jul 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yunfeng-scale commented Jul 26, 2023 •

edited

Loading