Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable mllm api #107

Merged
merged 15 commits into from
Feb 28, 2024
Merged

Enable mllm api #107

merged 15 commits into from
Feb 28, 2024

Conversation

xuechendi
Copy link
Contributor

Purpose of the PR:
Enable MLLM model support in LLM-on-Ray: OpenAI compatible AI support, webui support

Models:
Any huggingface MLLM models, tested with below two

Config update:
image

UI example:
image
image

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Copy link
Contributor

@carsonwang carsonwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xuechendi for the work! Can we also add one of the models in the CI?

)
parser.add_argument(
"--image_path",
default="my_app/test_data/twitter_graph.png",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a URL to this image here so it can work by default? Can you also update the help message if it can be a local path or a URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

os.environ['WORKDIR'] = os.getcwd()

parser = argparse.ArgumentParser(
description="Example script to query with http requests", add_help=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the description to include "image".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"--max_new_tokens", default=128, help="The maximum numbers of tokens to generate"
)
parser.add_argument(
"--temperature", default=0.2, help="The value used to modulate the next token probabilities"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to for these default values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw NVIDIA uses these setting in their Fuyu-8b demo on NGC AI Playground


adapt_transformers_to_gaudi()
# get correct torch type for loading HF model
# torch_dtype = get_torch_dtype(infer_conf, hf_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this torch_dtype and pass it to from_pretrained?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# torch_dtype = get_torch_dtype(infer_conf, hf_config)

# model = FuyuForCausalLM.from_pretrained(model_desc.model_id_or_path)
# processor = FuyuProcessor.from_pretrained(model_desc.model_id_or_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the above commented code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

precision: bf16
model_description:
model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/google/deplot
tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/google/deplot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update these to ids on huggingface

precision: bf16
model_description:
model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/adept/fuyu-8b
tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/adept/fuyu-8b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update these to ids on huggingface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

prompts.append(prompt)
images.extend(image)
else:
prompt = self.process_tool.get_promipt(prompt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_promipt -> get_prompt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

generate_text = generate_result[0]
model_response = ModelResponse(
generated_text=generate_text,
num_input_tokens=self.predictor.input_length,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not returning GenerateResult in MllmPredictor.generate like other predictors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generate_result is type as "String" when I use my test model Fuyu-8b, so it does not have input_length properties like other predictors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generate_result is type as "String" when I use my test model Fuyu-8b, so it does not have input_length properties like other predictors.

Just realized that GenerateResult is instantiate in transformer_predictor, in that case, I'll merge here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ui/start_ui.py Outdated
@@ -674,10 +758,14 @@ def shutdown_deploy(self):
serve.shutdown()

def get_ray_cluster(self):
command = "conda activate " + self.conda_env_name + "; ray status"
command = "source ~/anaconda3/bin/activate; conda activate " + self.conda_env_name + "; ray status"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume this path works in other users' environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Forgot to remove that one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
@xuechendi
Copy link
Contributor Author

@carsonwang , I fixed issues per your request.

  1. change to use URL in my example
  2. update script description for my example
  3. removed conda assumption, fix typo bug in prediction_deployment.py
  4. update MLLM_predictor.py with Ipex support, torch_dtype, add GenerateResult wrapper class

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Copy link
Contributor

@carsonwang carsonwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good. @KepingYan please take another look especially for the UI changes.

)
parser.add_argument(
"--top_p",
default=None,
default=0.7,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revoke the default value changes in this file? If these default values work for the mllm models, it makes sense to set them in image_query_http_requests.py as you have done. But this file more general so let's just use openai's default value?

stream=True,
max_tokens=128,
temperature=0.2,
top_p=0.7,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use args.xxx

stream=False,
max_tokens=128,
temperature=0.2,
top_p=0.7,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use args.xxx

Comment on lines 13 to 14
model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf
tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore these default value.

@@ -166,9 +177,13 @@ async def openai_call(self, prompt, config, streaming_response=True):
prompts.append(prompt)

if not streaming_response:
model_response = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this value is not used.

ui/start_ui.py Outdated
node_ip = self.ray_nodes[index]["NodeName"]
self.ssh_connect[index] = paramiko.SSHClient()
self.ssh_connect[index].load_system_host_keys()
self.ssh_connect[index].set_missing_host_key_policy(paramiko.AutoAddPolicy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this.

ui/start_ui.py Outdated
Comment on lines 1148 to 1151
endpoint_value = "http://127.0.0.1:8000/v1/chat/completions"
model_endpoint = gr.Text(
label="Model Endpoint", value=endpoint_value, scale=1
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better for the default value to be None? We can set placeholder to let users know that it can be a pre-deployed endpoint or the endpoint returned from deployment module.

ui/start_ui.py Outdated
label="Model Endpoint", value=endpoint_value, scale=1
)
model_name = gr.Text(
label="Model Name", value="llama-2-7b-chat-hf", scale=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Because the default value may not be deployed by users.

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
@xuechendi
Copy link
Contributor Author

@KepingYan @carsonwang , I completed all codes fixing per your review. Thanks

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
top_p=args.top_p,
)
print(chat_completion)
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not_needed")
Copy link
Contributor

@KepingYan KepingYan Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove parameters here, and set it via environment variables OPENAI_BASE_URL and OPENAI_API_KEY by users as described in readme?

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
@xuechendi
Copy link
Contributor Author

@KepingYan , I updated 4 examples by using ENV variables to set base url and api_key

  1. langchain_sdk for image/text
  2. openai sdk for image/text

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Copy link
Contributor

@carsonwang carsonwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xuechendi ! Merging this.

@carsonwang carsonwang merged commit 3067abb into intel:main Feb 28, 2024
10 checks passed
@carsonwang carsonwang mentioned this pull request Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants