-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable mllm api #107
Enable mllm api #107
Conversation
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @xuechendi for the work! Can we also add one of the models in the CI?
) | ||
parser.add_argument( | ||
"--image_path", | ||
default="my_app/test_data/twitter_graph.png", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a URL to this image here so it can work by default? Can you also update the help message if it can be a local path or a URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
os.environ['WORKDIR'] = os.getcwd() | ||
|
||
parser = argparse.ArgumentParser( | ||
description="Example script to query with http requests", add_help=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the description to include "image".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
"--max_new_tokens", default=128, help="The maximum numbers of tokens to generate" | ||
) | ||
parser.add_argument( | ||
"--temperature", default=0.2, help="The value used to modulate the next token probabilities" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to for these default values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw NVIDIA uses these setting in their Fuyu-8b demo on NGC AI Playground
inference/mllm_predictor.py
Outdated
|
||
adapt_transformers_to_gaudi() | ||
# get correct torch type for loading HF model | ||
# torch_dtype = get_torch_dtype(infer_conf, hf_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this torch_dtype
and pass it to from_pretrained
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
inference/mllm_predictor.py
Outdated
# torch_dtype = get_torch_dtype(infer_conf, hf_config) | ||
|
||
# model = FuyuForCausalLM.from_pretrained(model_desc.model_id_or_path) | ||
# processor = FuyuProcessor.from_pretrained(model_desc.model_id_or_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the above commented code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
inference/models/deplot.yaml
Outdated
precision: bf16 | ||
model_description: | ||
model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/google/deplot | ||
tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/google/deplot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update these to ids on huggingface
inference/models/fuyu8b.yaml
Outdated
precision: bf16 | ||
model_description: | ||
model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/adept/fuyu-8b | ||
tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/adept/fuyu-8b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update these to ids on huggingface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
inference/predictor_deployment.py
Outdated
prompts.append(prompt) | ||
images.extend(image) | ||
else: | ||
prompt = self.process_tool.get_promipt(prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_promipt -> get_prompt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
inference/predictor_deployment.py
Outdated
generate_text = generate_result[0] | ||
model_response = ModelResponse( | ||
generated_text=generate_text, | ||
num_input_tokens=self.predictor.input_length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not returning GenerateResult
in MllmPredictor.generate
like other predictors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generate_result
is type as "String" when I use my test model Fuyu-8b
, so it does not have input_length
properties like other predictors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
generate_result
is type as "String" when I use my test modelFuyu-8b
, so it does not haveinput_length
properties like other predictors.
Just realized that GenerateResult
is instantiate in transformer_predictor, in that case, I'll merge here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
ui/start_ui.py
Outdated
@@ -674,10 +758,14 @@ def shutdown_deploy(self): | |||
serve.shutdown() | |||
|
|||
def get_ray_cluster(self): | |||
command = "conda activate " + self.conda_env_name + "; ray status" | |||
command = "source ~/anaconda3/bin/activate; conda activate " + self.conda_env_name + "; ray status" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't assume this path works in other users' environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, Forgot to remove that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
0eb8ed2
to
b2b61a3
Compare
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
@carsonwang , I fixed issues per your request.
|
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good. @KepingYan please take another look especially for the UI changes.
) | ||
parser.add_argument( | ||
"--top_p", | ||
default=None, | ||
default=0.7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revoke the default value changes in this file? If these default values work for the mllm models, it makes sense to set them in image_query_http_requests.py
as you have done. But this file more general so let's just use openai's default value?
stream=True, | ||
max_tokens=128, | ||
temperature=0.2, | ||
top_p=0.7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use args.xxx
stream=False, | ||
max_tokens=128, | ||
temperature=0.2, | ||
top_p=0.7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use args.xxx
model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf | ||
tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore these default value.
inference/predictor_deployment.py
Outdated
@@ -166,9 +177,13 @@ async def openai_call(self, prompt, config, streaming_response=True): | |||
prompts.append(prompt) | |||
|
|||
if not streaming_response: | |||
model_response = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this value is not used.
ui/start_ui.py
Outdated
node_ip = self.ray_nodes[index]["NodeName"] | ||
self.ssh_connect[index] = paramiko.SSHClient() | ||
self.ssh_connect[index].load_system_host_keys() | ||
self.ssh_connect[index].set_missing_host_key_policy(paramiko.AutoAddPolicy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this.
ui/start_ui.py
Outdated
endpoint_value = "http://127.0.0.1:8000/v1/chat/completions" | ||
model_endpoint = gr.Text( | ||
label="Model Endpoint", value=endpoint_value, scale=1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better for the default value to be None? We can set placeholder to let users know that it can be a pre-deployed endpoint or the endpoint returned from deployment module.
ui/start_ui.py
Outdated
label="Model Endpoint", value=endpoint_value, scale=1 | ||
) | ||
model_name = gr.Text( | ||
label="Model Name", value="llama-2-7b-chat-hf", scale=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Because the default value may not be deployed by users.
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
@KepingYan @carsonwang , I completed all codes fixing per your review. Thanks |
d811809
to
c522b47
Compare
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
top_p=args.top_p, | ||
) | ||
print(chat_completion) | ||
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not_needed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove parameters here, and set it via environment variables OPENAI_BASE_URL and OPENAI_API_KEY by users as described in readme?
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
@KepingYan , I updated 4 examples by using ENV variables to set base url and api_key
|
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
7614e58
to
1d6260b
Compare
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @xuechendi ! Merging this.
Purpose of the PR:
Enable MLLM model support in LLM-on-Ray: OpenAI compatible AI support, webui support
Models:
Any huggingface MLLM models, tested with below two
Config update:
![image](https://private-user-images.githubusercontent.com/4355494/305572269-e20d0c25-b2d0-46b6-8b37-2e3877e62e97.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0ODA4MDAsIm5iZiI6MTczOTQ4MDUwMCwicGF0aCI6Ii80MzU1NDk0LzMwNTU3MjI2OS1lMjBkMGMyNS1iMmQwLTQ2YjYtOGIzNy0yZTM4NzdlNjJlOTcucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTNUMjEwMTQwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZmJiOWIyNjE1M2I1MzQwNmMyYjc4YTJhMjUwMGM4ZTFjNTIxNmM2YThiNzYxMzY4ZDdiOTc2MTljMjhiM2RjOSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.wevFoyte8RMDpvKyY0fFsRXFXapGWf9ALC_fHbP-mnM)
UI example:
![image](https://private-user-images.githubusercontent.com/4355494/305572333-b7892128-44b9-47c6-86d1-fbc616a8fa1c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0ODA4MDAsIm5iZiI6MTczOTQ4MDUwMCwicGF0aCI6Ii80MzU1NDk0LzMwNTU3MjMzMy1iNzg5MjEyOC00NGI5LTQ3YzYtODZkMS1mYmM2MTZhOGZhMWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTNUMjEwMTQwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTgxMDc4YzczMDk0Mjc1NTk4MjQ4NGU4MmU5N2Q2MmI5NDg0MzNlNjczMzRjYTZjNDBjYzkyNDY3ZDBkNjRiYyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.Glk-DYgn0CAPTXmO_fBaRdz1nXju7s53QsqhJQf5_4U)
![image](https://private-user-images.githubusercontent.com/4355494/305572354-fc5a0314-5fe0-471d-b5af-1f5fa39670c0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0ODA4MDAsIm5iZiI6MTczOTQ4MDUwMCwicGF0aCI6Ii80MzU1NDk0LzMwNTU3MjM1NC1mYzVhMDMxNC01ZmUwLTQ3MWQtYjVhZi0xZjVmYTM5NjcwYzAucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTNUMjEwMTQwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NjM2M2U5ZTc0ZDMxMjBjMmJkZTQyNGZkZjJiM2Y0YzExMDZjMTNhYWY5M2MwODAxMjk3YmQwZTJlODUwYzA5MCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.PwCRvK6Jl76C6qUvhD7Gv9aPK-EUkvfQeWraFrBtiPE)