Skip to content

Conversation

@ruizehung-scale
Copy link
Contributor

@ruizehung-scale ruizehung-scale commented Jul 19, 2023

Summary

  • Add missing a100 and t4 devices in values_sample.yaml that prevent llm-engine-cacher from starting successfully.
  • Update http-forwarder container python command path for streaming endpoint
  • Fix TGI repo name and forwarder config path
  • Fix forwarder aws config mount path

Test Plan

data = {
    "name": "llama-7b",
    "model_name": "llama-7b",
    "source": "hugging_face",
    "inference_framework": "text_generation_inference",
    "inference_framework_image_tag": "0.9.1",
    "num_shards": 4,
    "endpoint_type": "streaming",
    "cpus": 32,
    "gpus": 4,
    "memory": "40Gi",
    "storage": "40Gi",
    "gpu_type": "nvidia-ampere-a10",
    "min_workers": 1,
    "max_workers": 12,
    "per_worker": 1,
    "labels": {"team": "infra", "product": "llm_model_zoo"},
    "metadata": {}
}
headers = {'Content-Type': 'application/json'}
response = requests.post("http://localhost:5000/v1/llm/model-endpoints", headers=headers, data=json.dumps(data), auth=('test_user_id', ''))
print(response.status_code)
print(response.json())
------------------------
200
{'endpoint_creation_task_id': 'e9a7a99b-6fe6-4e69-97e9-4b007ab2e7bb'}
data = {
    "prompts": ["hi"],
    "max_new_tokens": 10,
    "temperature": 0.1
}

headers = {'Content-Type': 'application/json'}
response = requests.post("http://localhost:5000/v1/llm/completions-sync?model_endpoint_name=llama-7b", headers=headers, data=json.dumps(data), auth=('test-user-id', ''))
print(response.status_code)
print(response.json())
------------------------
200
{'status': 'SUCCESS', 'outputs': [{'text': 'hi hi hi2 hi2 hi2 hi2', 'num_completion_tokens': 10}], 'traceback': None}

@ruizehung-scale ruizehung-scale marked this pull request as ready for review July 19, 2023 04:01
@ruizehung-scale ruizehung-scale marked this pull request as draft July 19, 2023 04:22
@ruizehung-scale ruizehung-scale changed the title Add missing a100 and t4 devices in values_sample.yaml Can create LLM endpoint Jul 19, 2023
flavor=StreamingEnhancedRunnableImageFlavor(
flavor=ModelBundleFlavorType.STREAMING_ENHANCED_RUNNABLE_IMAGE,
repository="text-generation-inference", # TODO: let user choose repo
repository="ghcr.io/huggingface/text-generation-inference", # TODO: let user choose repo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yunfeng-scale It turns out I need to update TGI repo name in order to skip image existence check in ECR repo given the logic here

and self.docker_repository.is_repo_name(request.flavor.repository)

Is this change reasonable? Should we back propagate this back to hmi as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't hardcode this since it diverged internal / OSS code, can you add this as a parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this change makes sense

@ruizehung-scale ruizehung-scale marked this pull request as ready for review July 19, 2023 22:36
volumeMounts:
- name: config-volume
mountPath: /root/.aws/config
mountPath: /home/user/.aws/config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: We may need to parameterize this entirely.

- run-service
- --config
- /workspace/llm_engine/llm_engine/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
- /workspace/server/llm_engine_server/inference/configs/${FORWARDER_CONFIG_FILE_NAME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may need to be parameterized as well.

Copy link
Contributor

@song-william song-william left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline that we will parameterize these values in a later PR. Merging as is.

@song-william song-william merged commit 767cbc4 into main Jul 19, 2023
@song-william song-william deleted the fix-llm-engine-image-cache-startup-failure branch July 19, 2023 22:46
@song-william song-william changed the title Can create LLM endpoint Can create LLM endpoints Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants