Skip to content

Conversation

@yunfeng-scale
Copy link
Contributor

@yunfeng-scale yunfeng-scale commented Jul 26, 2023

LLM_ENGINE_BASE_PATH=xxx SCALE_API_KEY=yyy python

from llmengine import Model
Model.create(name="llama-7b-test", model="llama-7b", inference_framework_image_tag="0.9.3", checkpoint_path="s3://xxx/yyy.tar",public_inference=False)

Copy link
Member

@yixu34 yixu34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fine? Think we should get @jenkspt @sam-scale @adlam-scale to get 👀 on the API change

model_name: str
source: LLMSource = LLMSource.HUGGING_FACE
inference_framework: LLMInferenceFramework = LLMInferenceFramework.DEEPSPEED
inference_framework: LLMInferenceFramework = LLMInferenceFramework.TEXT_GENERATION_INFERENCE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change? Seems like we're changing default functionality. Might be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think people could run build deepspeed models right now

metadata: Dict[str, Any] # TODO: JSON type
post_inference_hooks: Optional[List[str]]
endpoint_type: ModelEndpointType = ModelEndpointType.SYNC
endpoint_type: ModelEndpointType = ModelEndpointType.STREAMING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this line up with the default on the server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, but this lines up with the default endpoint type for TGI

@adlam-scale
Copy link

FYI @juliashuieh ran into trouble with the tar strategy because (I strongly suspect) of out of disk space issues. the tar file is 80G, so unzipped (the original tar file is not deleted) it takes up 160G. Then there are safetensors that are created which take up another 80G which subsequently exceeds the storage space of the pod (which is 200G). Would be nice to untar + delete or increase the hd space

@yunfeng-scale
Copy link
Contributor Author

FYI @juliashuieh ran into trouble with the tar strategy because (I strongly suspect) of out of disk space issues. the tar file is 80G, so unzipped (the original tar file is not deleted) it takes up 160G. Then there are safetensors that are created which take up another 80G which subsequently exceeds the storage space of the pod (which is 200G). Would be nice to untar + delete or increase the hd space

storage size can be specified in API

@yunfeng-scale
Copy link
Contributor Author

FYI @juliashuieh ran into trouble with the tar strategy because (I strongly suspect) of out of disk space issues. the tar file is 80G, so unzipped (the original tar file is not deleted) it takes up 160G. Then there are safetensors that are created which take up another 80G which subsequently exceeds the storage space of the pod (which is 200G). Would be nice to untar + delete or increase the hd space

also i believe we do delete after untar, but still we'd need 160G space to untar. default storage is 96GB

@jenkspt
Copy link

jenkspt commented Jul 27, 2023

I get this is out-of-scope for this PR - but not sure where to put it. Can remove the requirement to tar our checkpoints?

@jenkspt
Copy link

jenkspt commented Jul 27, 2023

Seems I can't use the default min_workers=0

BadRequestError                           Traceback (most recent call last)
Cell In[4], line 1
----> 1 out = Model.create(name='test-falcon-7b-deploy', model='falcon-7b', checkpoint_path='s3://scale-ml/users/penn/deploy/test-falcon-7b-deploy.tar', inference_framework_image_tag="0.9.3")

File ~/llm-engine/clients/python/llmengine/api_engine.py:27, in assert_self_hosted.<locals>.inner(*args, **kwargs)
     25 if SPELLBOOK_API_URL == LLM_ENGINE_BASE_PATH:
     26     raise ValueError("This feature is only available for self-hosted users.")
---> 27 return func(*args, **kwargs)

File ~/llm-engine/clients/python/llmengine/model.py:195, in Model.create(cls, name, model, inference_framework_image_tag, source, inference_framework, num_shards, quantize, checkpoint_path, cpus, memory, storage, gpus, min_workers, max_workers, per_worker, endpoint_type, gpu_type, high_priority, post_inference_hooks, default_callback_url, public_inference, labels)
    168             post_inference_hooks_strs.append(hook)
    170 request = CreateLLMEndpointRequest(
    171     name=name,
    172     model_name=model,
   (...)
    193     public_inference=public_inference,
    194 )
--> 195 response = cls.post_sync(
    196     resource_name="v1/llm/model-endpoints",
    197     data=request.dict(),
    198     timeout=DEFAULT_TIMEOUT,
    199 )
    200 return CreateLLMEndpointResponse.parse_obj(response)

File ~/llm-engine/clients/python/llmengine/api_engine.py:96, in APIEngine.post_sync(cls, resource_name, data, timeout)
     88 response = requests.post(
     89     os.path.join(LLM_ENGINE_BASE_PATH, resource_name),
     90     json=data,
   (...)
     93     auth=(api_key, ""),
     94 )
     95 if response.status_code != 200:
---> 96     raise parse_error(response.status_code, response.content)
     97 payload = response.json()
     98 return payload

BadRequestError: Requested min workers 0 too low

using min_workers=1 works fine

@yunfeng-scale yunfeng-scale merged commit e044a96 into main Jul 28, 2023
@yunfeng-scale yunfeng-scale deleted the yunfeng-checkpoint-path branch July 28, 2023 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants