Merge remote-tracking branch 'origin/win-perf-opt' into win-perf-opt

intel-analytics · Apr 11, 2024 · 8a220d5 · 8a220d5
2 parents 964238a + 44a182c
commit 8a220d5
Show file tree

Hide file tree

Showing 11 changed files with 36 additions and 33 deletions.
diff --git a/README.md b/README.md
@@ -1,28 +1,28 @@
-***The WebUI is ported from [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) for running local LLM on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) using [BigDL-LLM](https://github.com/intel-analytics/bigdl).***
+***The WebUI is ported from [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) for running local LLM on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) using [IPEX-LLM](https://github.com/intel-analytics/ipex-llm).***
 
 ## Quick Start
-To get started, please see the step-by-step [quickstart](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html).
+To get started, please see the step-by-step [quickstart](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html).
 
-[<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" height="480px">](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html)
+[<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" height="480px">](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html)
 
 ## User Guide
 For more information, see the user guide below.
 
 ### 1. Download and Unzip WebUI
 
-Before starting all the steps, you need to download and unzip the text-generation-webui based on `BigDL-LLM` optimizations.
+Before starting all the steps, you need to download and unzip the text-generation-webui based on `IPEX-LLM` optimizations.
 
 ```bash
-https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/bigdl-llm.zip
+https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/ipex-llm.zip
 ```
 
 ### 2. Prepare the Environment on Windows
 
 Please use a python environment management tool (we recommend using Conda) to create a python enviroment and install necessary libs.
 
-#### 2.1 Install BigDL-LLM
+#### 2.1 Install IPEX-LLM
 
-Please see [BigDL-LLm Installation on Windows](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#windows) for more details to install BigDL-LLM on your Client.
+Please see [IPEX-LLM Installation on Windows](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#windows) for more details to install IPEX-LLM on your Client.
 
 #### 2.2 Install Other Required Dependencies
 
@@ -67,28 +67,28 @@ This share link expires in 72 hours. For free permanent hosting and GPU upgrades
 ##### 4.1.1 Download the Model
 If you need to download a model, enter the Hugging Face username or model path, for instance: `Qwen/Qwen-7B-Chat`.
 
-![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image.png)
+![Image text](./readme_folder/image.png)
 
 ##### 4.1.2 Place the Model
 After you have downloaded the model (or if you already have the model locally), please place the model in `Text-Generation-WebUI/models` directory.
 
 After completing the two steps above, you may click the `Model` button to select your model.
 
-![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image1.png)
+![Image text](./readme_folder/image1.png)
 
 
-#### 4.2 Enable BigDL-LLM Optimizations
-Text-Generation-WebUI supports multiple backends, including `BigDL-LLM`, `Transformers`, `llama.cpp`, etc (the default backend is `BigDL-LLM`). You may select the BigDL-LLM backend as below to enable low-bit optimizations.
+#### 4.2 Enable IPEX-LLM Optimizations
+Text-Generation-WebUI supports multiple backends, including `IPEX-LLM`, `Transformers`, `llama.cpp`, etc (the default backend is `IPEX-LLM`). You may select the IPEX-LLM backend as below to enable low-bit optimizations.
 
 
 Then please select the device according to your device (the default device is `GPU`).
 
-![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image2.png)
+![Image text](./readme_folder/image2.png)
 
 
 #### 4.3 Load Model in Low Precision 
 
-One common use case of BigDL-LLM is to load a Hugging Face transformers model in low precision.
+One common use case of IPEX-LLM is to load a Hugging Face transformers model in low precision.
 
 Notes:
 
@@ -99,14 +99,14 @@ Notes:
 -  Please select the `optimize-model` and `use_cache` options to accelerate the model.
 
 
-Now you may click the `Load` button to load the model with BigDL-LLM optimizations.
+Now you may click the `Load` button to load the model with IPEX-LLM optimizations. If everything goes well, you will get a message as shown below.
 
-![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image3.png)
+![Image text](./readme_folder/image3.png)
 
 
 ##### 4.4 Run the Model on WebUI
 
-After completing the steps of model preparation, enabling BigDL-LLM optimizations, and loading model, you may need to sepecify parameters in the `Parameters tab` according to the needs of your task.
+After completing the steps of model preparation, enabling IPEX-LLM optimizations, and loading model, you may need to sepecify parameters in the `Parameters tab` according to the needs of your task.
 
 Notes:
 * `max_new_tokens`: Maximum number of tokens to generate.
@@ -115,7 +115,7 @@ Notes:
 
 * Please see [Parameters-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab) for more details.
 
-Now you may do model inference on Text-Generation-WebUI with BigDL-LLM optimizations, including `Chat`, `Default` and `Notebook` Tabs.
+Now you may do model inference on Text-Generation-WebUI with IPEX-LLM optimizations, including `Chat`, `Default` and `Notebook` Tabs.
 
 ##### 4.4.1 Chat Tab
 
@@ -128,7 +128,7 @@ Notes:
 
 * Please see [Chat-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab) for more details.
 
-![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image4.png)
+![Image text](./readme_folder/image4.png)
 
 ##### 4.4.2 Default Tab
 
@@ -138,7 +138,7 @@ This tab contains two main text boxes: Input, where you enter your prompt, and O
 
 Please see [Default-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/02-%E2%80%90-Default-and-Notebook-Tabs#default-tab) for more details.
 
-![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image5.png)
+![Image text](./readme_folder/image5.png)
 
 
 ##### 4.4.3 Notebook Tab
@@ -147,4 +147,4 @@ You may use the `Notebook tab` to do exactly what the `Default tab` does, with t
 
 Please see [Notebook-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/02-%E2%80%90-Default-and-Notebook-Tabs#notebook-tab) for more details.
 
-![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image6.png)
+![Image text](./readme_folder/image6.png)
diff --git a/modules/loaders.py b/modules/loaders.py
@@ -110,7 +110,7 @@
         'no_use_fast',
         'autogptq_info',
     ],
-    'BigDL-LLM': [
+    'IPEX-LLM': [
         'load_in_4bit',
         'load_in_low_bit',
         'optimize_model',
@@ -210,7 +210,7 @@ def transformers_samplers():
     'AutoAWQ': transformers_samplers(),
     'QuIP#': transformers_samplers(),
     'HQQ': transformers_samplers(),
-    'BigDL-LLM': transformers_samplers(),
+    'IPEX-LLM': transformers_samplers(),
     'ExLlamav2': {
         'temperature',
         'top_p',

diff --git a/modules/models.py b/modules/models.py
@@ -60,7 +60,7 @@ def load_model(model_name, loader=None):
     shared.is_seq2seq = False
     shared.model_name = model_name
     load_func_map = {
-        'BigDL-LLM': bigdl_llm_loader,
+        'IPEX-LLM': ipex_llm_loader,
         'Transformers': huggingface_loader,
         'AutoGPTQ': AutoGPTQ_loader,
         'GPTQ-for-LLaMa': GPTQ_loader,
@@ -321,9 +321,9 @@ def AutoAWQ_loader(model_name):
 
     return model
 
-def bigdl_llm_loader(model_name):
+def ipex_llm_loader(model_name):
 
-    from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel, AutoModelForSeq2SeqLM
+    from ipex_llm.transformers import AutoModelForCausalLM, AutoModel, AutoModelForSeq2SeqLM
 
     path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
 

diff --git a/modules/models_settings.py b/modules/models_settings.py
@@ -162,7 +162,7 @@ def infer_loader(model_name, model_settings):
     elif re.match(r'.*-hqq', model_name.lower()):
         return 'HQQ'
     else:
-        loader = 'BigDL-LLM'
+        loader = 'IPEX-LLM'
 
     return loader
 

diff --git a/modules/shared.py b/modules/shared.py
@@ -1,4 +1,5 @@
 import argparse
+import copy
 import os
 import sys
 from collections import OrderedDict
@@ -65,6 +66,7 @@
     'default_extensions': ['gallery'],
 }
 
+default_settings = copy.deepcopy(settings)
 
 # Parser copied from https://github.com/vladmandic/automatic
 parser = argparse.ArgumentParser(description="Text generation web UI", conflict_handler='resolve', add_help=True, formatter_class=lambda prog: argparse.HelpFormatter(prog, max_help_position=55, indent_increment=2, width=200))
@@ -155,8 +157,8 @@
 group.add_argument('--checkpoint', type=str, help='The path to the quantized checkpoint file. If not specified, it will be automatically detected.')
 group.add_argument('--monkey-patch', action='store_true', help='Apply the monkey patch for using LoRAs with quantized models.')
 
-# BigDL-LLM
-group = parser.add_argument_group('BigDL-LLM')
+# IPEX-LLM
+group = parser.add_argument_group('IPEX-LLM')
 group.add_argument('--device', type=str, default='GPU', help='the device type, it could be CPU or GPU')
 group.add_argument('--load-in-4bit', action='store_true', default=False, help='boolean value, True means loading linear’s weight to symmetric int 4 if'\
                    'the model is a regular fp16/bf16/fp32 model, and to asymmetric int 4 if the model is GPTQ model.Default to be False')
@@ -165,8 +167,8 @@
                    'nf4 means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.')
 group.add_argument('--optimize-model', action='store_true', default=True, help='boolean value, Whether to further optimize the low_bit llm model.')
 #group.add_argument('--modules-to-not-convert', type=str, default=None, help='list of str value, modules (nn.Module) that are skipped when conducting model optimizations.')
-group.add_argument('--cpu-embedding', action='store_true', default=True, help='Whether to replace the Embedding layer, may need to set it to `True` when running BigDL-LLM on GPU on Windows. Default to be `False`')
-#group.add_argument('--lightweight-bmm', action='store_true', help='Whether to replace the torch.bmm ops, may need to set it to `True` when running BigDL-LLM on GPU on Windows.')
+group.add_argument('--cpu-embedding', action='store_true', default=True, help='Whether to replace the Embedding layer, may need to set it to `True` when running IPEX-LLM on GPU on Windows. Default to be `False`')
+#group.add_argument('--lightweight-bmm', action='store_true', help='Whether to replace the torch.bmm ops, may need to set it to `True` when running IPEX-LLM on GPU on Windows.')
 group.add_argument('--use-cache', action='store_true', default=True, help='If use_cache is True, past key values are used to speed up decoding if applicable to model.')
 group.add_argument('--trust-remote-code', action='store_true', default=True, help='Set trust_remote_code=True while loading the model. Necessary for some models.')
 
@@ -274,8 +276,8 @@ def fix_loader_name(name):
         return 'QuIP#'
     elif name in ['hqq']:
         return 'HQQ'
-    elif name in ['BigDL-LLM', 'bigdl-llm', 'bigdl']:
-        return 'BigDL-LLM'
+    elif name in ['IPEX-LLM', 'ipex-llm']:
+        return 'IPEX-LLM'
 
 
 def add_extension(name, last=False):

diff --git a/modules/ui_model_menu.py b/modules/ui_model_menu.py
@@ -239,6 +239,7 @@ def load_model_wrapper(selected_model, loader, autoload=False):
                 if 'instruction_template' in settings:
                     output += '\n\nIt seems to be an instruction-following model with template "{}". In the chat tab, instruct or chat-instruct modes should be used.'.format(settings['instruction_template'])
 
+                output += '\n\n Starting warmup ...'
                 yield output
             else:
                 yield f"Failed to load `{selected_model}`."

diff --git a/readme_folder/image.png b/readme_folder/image.png
diff --git a/readme_folder/image1.png b/readme_folder/image1.png
diff --git a/readme_folder/image2.png b/readme_folder/image2.png
diff --git a/readme_folder/image3.png b/readme_folder/image3.png
diff --git a/server.py b/server.py
@@ -88,7 +88,7 @@ def create_interface():
 
     # Force some events to be triggered on page load
     shared.persistent_interface_state.update({
-        'loader': shared.args.loader or 'BigDL-LLM',
+        'loader': shared.args.loader or 'IPEX-LLM',
         'mode': shared.settings['mode'],
         'character_menu': shared.args.character or shared.settings['character'],
         'instruction_template_str': shared.settings['instruction_template_str'],