Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/win-perf-opt' into win-perf-opt
Browse files Browse the repository at this point in the history
  • Loading branch information
sgwhat committed Apr 11, 2024
2 parents 964238a + 44a182c commit 8a220d5
Show file tree
Hide file tree
Showing 11 changed files with 36 additions and 33 deletions.
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
***The WebUI is ported from [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) for running local LLM on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) using [BigDL-LLM](https://github.com/intel-analytics/bigdl).***
***The WebUI is ported from [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) for running local LLM on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) using [IPEX-LLM](https://github.com/intel-analytics/ipex-llm).***

## Quick Start
To get started, please see the step-by-step [quickstart](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html).
To get started, please see the step-by-step [quickstart](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html).

[<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" height="480px">](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html)
[<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" height="480px">](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html)

## User Guide
For more information, see the user guide below.

### 1. Download and Unzip WebUI

Before starting all the steps, you need to download and unzip the text-generation-webui based on `BigDL-LLM` optimizations.
Before starting all the steps, you need to download and unzip the text-generation-webui based on `IPEX-LLM` optimizations.

```bash
https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/bigdl-llm.zip
https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/ipex-llm.zip
```

### 2. Prepare the Environment on Windows

Please use a python environment management tool (we recommend using Conda) to create a python enviroment and install necessary libs.

#### 2.1 Install BigDL-LLM
#### 2.1 Install IPEX-LLM

Please see [BigDL-LLm Installation on Windows](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#windows) for more details to install BigDL-LLM on your Client.
Please see [IPEX-LLM Installation on Windows](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#windows) for more details to install IPEX-LLM on your Client.

#### 2.2 Install Other Required Dependencies

Expand Down Expand Up @@ -67,28 +67,28 @@ This share link expires in 72 hours. For free permanent hosting and GPU upgrades
##### 4.1.1 Download the Model
If you need to download a model, enter the Hugging Face username or model path, for instance: `Qwen/Qwen-7B-Chat`.

![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image.png)
![Image text](./readme_folder/image.png)

##### 4.1.2 Place the Model
After you have downloaded the model (or if you already have the model locally), please place the model in `Text-Generation-WebUI/models` directory.

After completing the two steps above, you may click the `Model` button to select your model.

![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image1.png)
![Image text](./readme_folder/image1.png)


#### 4.2 Enable BigDL-LLM Optimizations
Text-Generation-WebUI supports multiple backends, including `BigDL-LLM`, `Transformers`, `llama.cpp`, etc (the default backend is `BigDL-LLM`). You may select the BigDL-LLM backend as below to enable low-bit optimizations.
#### 4.2 Enable IPEX-LLM Optimizations
Text-Generation-WebUI supports multiple backends, including `IPEX-LLM`, `Transformers`, `llama.cpp`, etc (the default backend is `IPEX-LLM`). You may select the IPEX-LLM backend as below to enable low-bit optimizations.


Then please select the device according to your device (the default device is `GPU`).

![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image2.png)
![Image text](./readme_folder/image2.png)


#### 4.3 Load Model in Low Precision

One common use case of BigDL-LLM is to load a Hugging Face transformers model in low precision.
One common use case of IPEX-LLM is to load a Hugging Face transformers model in low precision.

Notes:

Expand All @@ -99,14 +99,14 @@ Notes:
- Please select the `optimize-model` and `use_cache` options to accelerate the model.


Now you may click the `Load` button to load the model with BigDL-LLM optimizations.
Now you may click the `Load` button to load the model with IPEX-LLM optimizations. If everything goes well, you will get a message as shown below.

![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image3.png)
![Image text](./readme_folder/image3.png)


##### 4.4 Run the Model on WebUI

After completing the steps of model preparation, enabling BigDL-LLM optimizations, and loading model, you may need to sepecify parameters in the `Parameters tab` according to the needs of your task.
After completing the steps of model preparation, enabling IPEX-LLM optimizations, and loading model, you may need to sepecify parameters in the `Parameters tab` according to the needs of your task.

Notes:
* `max_new_tokens`: Maximum number of tokens to generate.
Expand All @@ -115,7 +115,7 @@ Notes:

* Please see [Parameters-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab) for more details.

Now you may do model inference on Text-Generation-WebUI with BigDL-LLM optimizations, including `Chat`, `Default` and `Notebook` Tabs.
Now you may do model inference on Text-Generation-WebUI with IPEX-LLM optimizations, including `Chat`, `Default` and `Notebook` Tabs.

##### 4.4.1 Chat Tab

Expand All @@ -128,7 +128,7 @@ Notes:

* Please see [Chat-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab) for more details.

![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image4.png)
![Image text](./readme_folder/image4.png)

##### 4.4.2 Default Tab

Expand All @@ -138,7 +138,7 @@ This tab contains two main text boxes: Input, where you enter your prompt, and O

Please see [Default-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/02-%E2%80%90-Default-and-Notebook-Tabs#default-tab) for more details.

![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image5.png)
![Image text](./readme_folder/image5.png)


##### 4.4.3 Notebook Tab
Expand All @@ -147,4 +147,4 @@ You may use the `Notebook tab` to do exactly what the `Default tab` does, with t

Please see [Notebook-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/02-%E2%80%90-Default-and-Notebook-Tabs#notebook-tab) for more details.

![Image text](https://github.com/intel-analytics/text-generation-webui/blob/8ebee0651dd56012c4a9e0ba6932efec4c7d1b2e/readme_folder/image6.png)
![Image text](./readme_folder/image6.png)
4 changes: 2 additions & 2 deletions modules/loaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@
'no_use_fast',
'autogptq_info',
],
'BigDL-LLM': [
'IPEX-LLM': [
'load_in_4bit',
'load_in_low_bit',
'optimize_model',
Expand Down Expand Up @@ -210,7 +210,7 @@ def transformers_samplers():
'AutoAWQ': transformers_samplers(),
'QuIP#': transformers_samplers(),
'HQQ': transformers_samplers(),
'BigDL-LLM': transformers_samplers(),
'IPEX-LLM': transformers_samplers(),
'ExLlamav2': {
'temperature',
'top_p',
Expand Down
6 changes: 3 additions & 3 deletions modules/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def load_model(model_name, loader=None):
shared.is_seq2seq = False
shared.model_name = model_name
load_func_map = {
'BigDL-LLM': bigdl_llm_loader,
'IPEX-LLM': ipex_llm_loader,
'Transformers': huggingface_loader,
'AutoGPTQ': AutoGPTQ_loader,
'GPTQ-for-LLaMa': GPTQ_loader,
Expand Down Expand Up @@ -321,9 +321,9 @@ def AutoAWQ_loader(model_name):

return model

def bigdl_llm_loader(model_name):
def ipex_llm_loader(model_name):

from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel, AutoModelForSeq2SeqLM
from ipex_llm.transformers import AutoModelForCausalLM, AutoModel, AutoModelForSeq2SeqLM

path_to_model = Path(f'{shared.args.model_dir}/{model_name}')

Expand Down
2 changes: 1 addition & 1 deletion modules/models_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def infer_loader(model_name, model_settings):
elif re.match(r'.*-hqq', model_name.lower()):
return 'HQQ'
else:
loader = 'BigDL-LLM'
loader = 'IPEX-LLM'

return loader

Expand Down
14 changes: 8 additions & 6 deletions modules/shared.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import argparse
import copy
import os
import sys
from collections import OrderedDict
Expand Down Expand Up @@ -65,6 +66,7 @@
'default_extensions': ['gallery'],
}

default_settings = copy.deepcopy(settings)

# Parser copied from https://github.com/vladmandic/automatic
parser = argparse.ArgumentParser(description="Text generation web UI", conflict_handler='resolve', add_help=True, formatter_class=lambda prog: argparse.HelpFormatter(prog, max_help_position=55, indent_increment=2, width=200))
Expand Down Expand Up @@ -155,8 +157,8 @@
group.add_argument('--checkpoint', type=str, help='The path to the quantized checkpoint file. If not specified, it will be automatically detected.')
group.add_argument('--monkey-patch', action='store_true', help='Apply the monkey patch for using LoRAs with quantized models.')

# BigDL-LLM
group = parser.add_argument_group('BigDL-LLM')
# IPEX-LLM
group = parser.add_argument_group('IPEX-LLM')
group.add_argument('--device', type=str, default='GPU', help='the device type, it could be CPU or GPU')
group.add_argument('--load-in-4bit', action='store_true', default=False, help='boolean value, True means loading linear’s weight to symmetric int 4 if'\
'the model is a regular fp16/bf16/fp32 model, and to asymmetric int 4 if the model is GPTQ model.Default to be False')
Expand All @@ -165,8 +167,8 @@
'nf4 means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.')
group.add_argument('--optimize-model', action='store_true', default=True, help='boolean value, Whether to further optimize the low_bit llm model.')
#group.add_argument('--modules-to-not-convert', type=str, default=None, help='list of str value, modules (nn.Module) that are skipped when conducting model optimizations.')
group.add_argument('--cpu-embedding', action='store_true', default=True, help='Whether to replace the Embedding layer, may need to set it to `True` when running BigDL-LLM on GPU on Windows. Default to be `False`')
#group.add_argument('--lightweight-bmm', action='store_true', help='Whether to replace the torch.bmm ops, may need to set it to `True` when running BigDL-LLM on GPU on Windows.')
group.add_argument('--cpu-embedding', action='store_true', default=True, help='Whether to replace the Embedding layer, may need to set it to `True` when running IPEX-LLM on GPU on Windows. Default to be `False`')
#group.add_argument('--lightweight-bmm', action='store_true', help='Whether to replace the torch.bmm ops, may need to set it to `True` when running IPEX-LLM on GPU on Windows.')
group.add_argument('--use-cache', action='store_true', default=True, help='If use_cache is True, past key values are used to speed up decoding if applicable to model.')
group.add_argument('--trust-remote-code', action='store_true', default=True, help='Set trust_remote_code=True while loading the model. Necessary for some models.')

Expand Down Expand Up @@ -274,8 +276,8 @@ def fix_loader_name(name):
return 'QuIP#'
elif name in ['hqq']:
return 'HQQ'
elif name in ['BigDL-LLM', 'bigdl-llm', 'bigdl']:
return 'BigDL-LLM'
elif name in ['IPEX-LLM', 'ipex-llm']:
return 'IPEX-LLM'


def add_extension(name, last=False):
Expand Down
1 change: 1 addition & 0 deletions modules/ui_model_menu.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,7 @@ def load_model_wrapper(selected_model, loader, autoload=False):
if 'instruction_template' in settings:
output += '\n\nIt seems to be an instruction-following model with template "{}". In the chat tab, instruct or chat-instruct modes should be used.'.format(settings['instruction_template'])

output += '\n\n Starting warmup ...'
yield output
else:
yield f"Failed to load `{selected_model}`."
Expand Down
Binary file modified readme_folder/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified readme_folder/image1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified readme_folder/image2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified readme_folder/image3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion server.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def create_interface():

# Force some events to be triggered on page load
shared.persistent_interface_state.update({
'loader': shared.args.loader or 'BigDL-LLM',
'loader': shared.args.loader or 'IPEX-LLM',
'mode': shared.settings['mode'],
'character_menu': shared.args.character or shared.settings['character'],
'instruction_template_str': shared.settings['instruction_template_str'],
Expand Down

0 comments on commit 8a220d5

Please sign in to comment.