Enable mllm api #107

xuechendi · 2024-02-16T22:31:55Z

Purpose of the PR:
Enable MLLM model support in LLM-on-Ray: OpenAI compatible AI support, webui support

Models:
Any huggingface MLLM models, tested with below two

Fuyu-8b: https://huggingface.co/adept/fuyu-8b
Deplot: https://huggingface.co/google/deplot

Config update:

UI example:

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

carsonwang

Thank you @xuechendi for the work! Can we also add one of the models in the CI?

carsonwang · 2024-02-18T09:03:53Z

examples/inference/api_server_openai/image_query_http_requests.py

+)
+parser.add_argument(
+    "--image_path",
+    default="my_app/test_data/twitter_graph.png",


Can we use a URL to this image here so it can work by default? Can you also update the help message if it can be a local path or a URL.

carsonwang · 2024-02-18T09:18:35Z

examples/inference/api_server_openai/image_query_http_requests.py

+    os.environ['WORKDIR'] = os.getcwd()
+
+parser = argparse.ArgumentParser(
+    description="Example script to query with http requests", add_help=True


Update the description to include "image".

carsonwang · 2024-02-18T09:19:53Z

examples/inference/api_server_openai/image_query_http_requests.py

+    "--max_new_tokens", default=128, help="The maximum numbers of tokens to generate"
+)
+parser.add_argument(
+    "--temperature", default=0.2, help="The value used to modulate the next token probabilities"


Any reason to for these default values?

I saw NVIDIA uses these setting in their Fuyu-8b demo on NGC AI Playground

carsonwang · 2024-02-18T09:26:01Z

inference/mllm_predictor.py

+
+            adapt_transformers_to_gaudi()
+        # get correct torch type for loading HF model
+        # torch_dtype = get_torch_dtype(infer_conf, hf_config)


Can we keep this torch_dtype and pass it to from_pretrained?

carsonwang · 2024-02-18T09:26:14Z

inference/mllm_predictor.py

+        # torch_dtype = get_torch_dtype(infer_conf, hf_config)
+
+        # model = FuyuForCausalLM.from_pretrained(model_desc.model_id_or_path)
+        # processor = FuyuProcessor.from_pretrained(model_desc.model_id_or_path)


Remove the above commented code?

carsonwang · 2024-02-18T09:28:29Z

inference/models/deplot.yaml

+  precision: bf16
+model_description:
+  model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/google/deplot
+  tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/google/deplot


Update these to ids on huggingface

carsonwang · 2024-02-18T09:28:37Z

inference/models/fuyu8b.yaml

+  precision: bf16
+model_description:
+  model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/adept/fuyu-8b
+  tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/adept/fuyu-8b


Update these to ids on huggingface

carsonwang · 2024-02-18T09:48:39Z

inference/predictor_deployment.py

+                        prompts.append(prompt)
+                        images.extend(image)
+                    else:
+                        prompt = self.process_tool.get_promipt(prompt)


get_promipt -> get_prompt

carsonwang · 2024-02-18T09:57:42Z

inference/predictor_deployment.py

+                generate_text = generate_result[0]
+                model_response = ModelResponse(
+                    generated_text=generate_text,
+                    num_input_tokens=self.predictor.input_length,


Why not returning GenerateResult in MllmPredictor.generate like other predictors?

The generate_result is type as "String" when I use my test model Fuyu-8b, so it does not have input_length properties like other predictors.

The generate_result is type as "String" when I use my test model Fuyu-8b, so it does not have input_length properties like other predictors.

Just realized that GenerateResult is instantiate in transformer_predictor, in that case, I'll merge here

carsonwang · 2024-02-18T14:27:37Z

ui/start_ui.py

@@ -674,10 +758,14 @@ def shutdown_deploy(self):
        serve.shutdown()

    def get_ray_cluster(self):
-        command = "conda activate " + self.conda_env_name + "; ray status"
+        command = "source ~/anaconda3/bin/activate; conda activate " + self.conda_env_name + "; ray status"


We can't assume this path works in other users' environment.

Sorry, Forgot to remove that one.

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

xuechendi · 2024-02-21T17:42:11Z

@carsonwang , I fixed issues per your request.

change to use URL in my example
update script description for my example
removed conda assumption, fix typo bug in prediction_deployment.py
update MLLM_predictor.py with Ipex support, torch_dtype, add GenerateResult wrapper class

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

carsonwang

This looks mostly good. @KepingYan please take another look especially for the UI changes.

carsonwang · 2024-02-22T02:21:22Z

examples/inference/api_server_openai/query_http_requests.py

 )
 parser.add_argument(
    "--top_p",
-    default=None,
+    default=0.7,


Can we revoke the default value changes in this file? If these default values work for the mllm models, it makes sense to set them in image_query_http_requests.py as you have done. But this file more general so let's just use openai's default value?

carsonwang · 2024-02-22T02:23:40Z

examples/inference/api_server_openai/query_openai_sdk.py

+        stream=True,
+        max_tokens=128,
+        temperature=0.2,
+        top_p=0.7,


Use args.xxx

carsonwang · 2024-02-22T02:23:52Z

examples/inference/api_server_openai/query_openai_sdk.py

+        stream=False,
+        max_tokens=128,
+        temperature=0.2,
+        top_p=0.7,


Use args.xxx

KepingYan · 2024-02-22T02:54:47Z

inference/models/llama-2-7b-chat-hf.yaml

+  model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf
+  tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf


Please restore these default value.

KepingYan · 2024-02-22T02:56:46Z

inference/predictor_deployment.py

@@ -166,9 +177,13 @@ async def openai_call(self, prompt, config, streaming_response=True):
            prompts.append(prompt)

        if not streaming_response:
+            model_response = None


It seems this value is not used.

KepingYan · 2024-02-22T13:56:20Z

ui/start_ui.py

+                node_ip = self.ray_nodes[index]["NodeName"]
+                self.ssh_connect[index] = paramiko.SSHClient()
+                self.ssh_connect[index].load_system_host_keys()
+                self.ssh_connect[index].set_missing_host_key_policy(paramiko.AutoAddPolicy())


Please revert this.

KepingYan · 2024-02-22T14:03:38Z

ui/start_ui.py

+                        endpoint_value = "http://127.0.0.1:8000/v1/chat/completions"
+                        model_endpoint = gr.Text(
+                            label="Model Endpoint", value=endpoint_value, scale=1
+                        )


Is it better for the default value to be None? We can set placeholder to let users know that it can be a pre-deployed endpoint or the endpoint returned from deployment module.

KepingYan · 2024-02-22T14:04:54Z

ui/start_ui.py

+                            label="Model Endpoint", value=endpoint_value, scale=1
+                        )
+                        model_name = gr.Text(
+                            label="Model Name", value="llama-2-7b-chat-hf", scale=1


Same here. Because the default value may not be deployed by users.

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

xuechendi · 2024-02-22T16:52:31Z

@KepingYan @carsonwang , I completed all codes fixing per your review. Thanks

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

KepingYan · 2024-02-23T07:06:53Z

examples/inference/api_server_openai/query_openai_sdk.py

-    top_p=args.top_p,
-)
-print(chat_completion)
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="not_needed")


Can we remove parameters here, and set it via environment variables OPENAI_BASE_URL and OPENAI_API_KEY by users as described in readme?

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

xuechendi · 2024-02-23T18:17:47Z

@KepingYan , I updated 4 examples by using ENV variables to set base url and api_key

langchain_sdk for image/text
openai sdk for image/text

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

carsonwang

Thank you @xuechendi ！ Merging this.

xuechendi added 2 commits February 15, 2024 19:02

Necessary codes fixing and enable openai API in webui

445889c

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

Fix ruff error

cef8a73

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

carsonwang reviewed Feb 18, 2024

View reviewed changes

xuechendi added 4 commits February 21, 2024 15:04

add new MLLM predcitor to LLM-on-Ray

87f248f

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

Add langchain and openai test

e339c56

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

Fix formatting

66addeb

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

Fix issues per Carson's request

b2b61a3

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

xuechendi force-pushed the enable_MLLM_api branch from 0eb8ed2 to b2b61a3 Compare February 21, 2024 16:16

MLLM predictor update per Carson's request

0992e2c

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

Fix parsing issue

328d33f

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

carsonwang reviewed Feb 22, 2024

View reviewed changes

KepingYan reviewed Feb 22, 2024

View reviewed changes

Modifications for UI format

08ca9e5

KepingYan reviewed Feb 22, 2024

View reviewed changes

Update codes per second review comments

1fef98a

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

xuechendi force-pushed the enable_MLLM_api branch from d811809 to c522b47 Compare February 23, 2024 01:47

Add two example for inference with openai API and langchain API

74e457e

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

KepingYan reviewed Feb 23, 2024

View reviewed changes

Add start_ui fixing

c522b47

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

using os env for OPENAI_API_KEY and OPENAI_BASE_URL

1d6260b

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

xuechendi force-pushed the enable_MLLM_api branch from 7614e58 to 1d6260b Compare February 23, 2024 19:22

KepingYan approved these changes Feb 26, 2024

View reviewed changes

xuechendi added 2 commits February 26, 2024 09:29

Merge branch 'main' into enable_MLLM_api

c54026a

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

update format

87faa74

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

carsonwang approved these changes Feb 28, 2024

View reviewed changes

carsonwang merged commit 3067abb into intel:main Feb 28, 2024
10 checks passed

carsonwang mentioned this pull request Feb 28, 2024

Enable mllm in CI #126

Open

		model_id_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf
		tokenizer_name_or_path: /mnt/nvme0n1/chendi/llm-on-ray/models/meta-llama/Llama-2-7b-chat-hf

Enable mllm api #107

Enable mllm api #107

Conversation

xuechendi commented Feb 16, 2024

carsonwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xuechendi commented Feb 21, 2024

carsonwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xuechendi commented Feb 22, 2024

KepingYan Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

xuechendi commented Feb 23, 2024

carsonwang left a comment

Choose a reason for hiding this comment

KepingYan Feb 23, 2024 •

edited

Loading