-
Notifications
You must be signed in to change notification settings - Fork 205
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[NeuralChat] Support GGUF model in NeuralChat (#1200)
* Support GGUF model in NeuralChat Signed-off-by: lvliang-intel <liang1.lv@intel.com>
- Loading branch information
1 parent
3068496
commit a53a33c
Showing
9 changed files
with
209 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
64 changes: 64 additions & 0 deletions
64
..._transformers/neural_chat/examples/deployment/codegen/backend/pc/gguf/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
This README is designed to walk you through setting up the backend for a code-generating chatbot using the NeuralChat framework. You can deploy this chatbot on various platforms, including Intel XEON Scalable Processors, Habana's Gaudi processors (HPU), Intel Data Center GPU and Client GPU, Nvidia Data Center GPU, and Client GPU. | ||
|
||
This code-generating chatbot demonstrates how to deploy it specifically on a Laptop PC. To ensure smooth operation on a laptop, we need to implement [LLM runtime optimization](../../../../../../llm/runtime/graph/README.md) to accelerate the inference process. | ||
|
||
# Setup Conda | ||
|
||
First, you need to install and configure the Conda environment: | ||
|
||
Visit the [Miniconda download page](https://docs.conda.io/projects/miniconda/en/latest/) and download the installer suitable for your Windows system. | ||
Locate the downloaded installer file (e.g., Miniconda3-latest-Windows-x86_64.exe for Miniconda). Double-click the installer to launch it. | ||
To create a new Conda environment, use the command: "conda create -n myenv python=3.9.0" | ||
|
||
# Install visual cpp build tools | ||
|
||
Visual C++ Build Tools is a package provided by Microsoft that includes tools required to build C++ projects using Visual Studio without installing the full Visual Studio IDE. These tools are essential for compiling, linking, and building intel extension for transformers. | ||
|
||
To install the Visual C++ Build Tools, visit the following link: [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/). | ||
Once there, you'll find download options and instructions for installation based on your specific requirements. | ||
|
||
# Install intel extension for transformers | ||
|
||
Install the intel extension for transformers from source code to get the latest features of LLM runtime. | ||
|
||
```bash | ||
pip clone https://github.com/intel/intel-extension-for-transformers.git | ||
cd intel-extension-for-transformers | ||
pip install -r requirements.txt | ||
pip install -e . | ||
``` | ||
|
||
# Install Python dependencies | ||
|
||
Install dependencies using pip | ||
|
||
```bash | ||
pip install ../../../../../requirements_pc.txt | ||
pip install transformers==4.35.2 | ||
``` | ||
|
||
# Configure the codegen.yaml | ||
|
||
You can customize the configuration file 'codegen.yaml' to match your environment setup. Here's a table to help you understand the configurable options: | ||
|
||
| Item | Value | | ||
| ------------------ | -------------------------------------| | ||
| host | 127.0.0.1 | | ||
| port | 8000 | | ||
| model_name_or_path | "codellama/CodeLlama-7b-hf" | | ||
| device | "cpu" | | ||
| tasks_list | ['textchat'] | | ||
| optimization | | | ||
| | use_llm_runtime | true | | ||
| | optimization_type| "weight_only" | | ||
| | compute_dtype | "int8" | | ||
| | weight_dtype | "int4" | | ||
|
||
|
||
|
||
# Run the Code Generation Chatbot server | ||
To start the code-generating chatbot server, use the following command: | ||
|
||
```shell | ||
nohup python run_code_gen.py & | ||
``` |
36 changes: 36 additions & 0 deletions
36
...ion_for_transformers/neural_chat/examples/deployment/codegen/backend/pc/gguf/codegen.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# This is the parameter configuration file for NeuralChat Serving. | ||
|
||
################################################################################# | ||
# SERVER SETTING # | ||
################################################################################# | ||
host: 0.0.0.0 | ||
port: 8000 | ||
|
||
# if you want to run "codellama/CodeLlama-7b-hf", please download it to local and pass the local path. | ||
# model_name_or_path: "TheBloke/Magicoder-S-DS-6.7B-GGUF" | ||
# tokenizer_name_or_path: "ise-uiuc/Magicoder-S-DS-6.7B" | ||
# gguf_model_path: "magicoder-s-ds-6.7b.Q4_0.gguf" | ||
model_name_or_path: "TheBloke/Llama-2-7B-Chat-GGUF" | ||
tokenizer_name_or_path: "meta-llama/Llama-2-7b-chat-hf" | ||
gguf_model_path: "llama-2-7b-chat.Q4_0.gguf" | ||
device: "cpu" | ||
|
||
# task choices = ['textchat', 'voicechat', 'retrieval', 'text2image', 'finetune', 'codegen'] | ||
tasks_list: ['codegen'] |
26 changes: 26 additions & 0 deletions
26
..._for_transformers/neural_chat/examples/deployment/codegen/backend/pc/gguf/run_code_gen.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# !/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor | ||
|
||
def main(): | ||
server_executor = NeuralChatServerExecutor() | ||
server_executor(config_file="./codegen.yaml", log_file="./codegen.log") | ||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
intel_extension_for_transformers/neural_chat/tests/nightly/models/test_gguf.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig | ||
from intel_extension_for_transformers.neural_chat.config import LoadingModelConfig | ||
from intel_extension_for_transformers.neural_chat.utils.common import get_device_type | ||
import unittest | ||
|
||
class TestLlama2GGUFModel(unittest.TestCase): | ||
def setUp(self): | ||
self.device = get_device_type() | ||
return super().setUp() | ||
|
||
def tearDown(self) -> None: | ||
return super().tearDown() | ||
|
||
def test_code_gen_with_gguf(self): | ||
if self.device == "hpu": | ||
self.skipTest("GGUF is not supported on HPU.") | ||
|
||
loading_config = LoadingModelConfig(gguf_model_path="llama-2-7b-chat.Q4_0.gguf") | ||
config = PipelineConfig(model_name_or_path="TheBloke/Llama-2-7B-Chat-GGUF", | ||
tokenizer_name_or_path="meta-llama/Llama-2-7b-chat-hf", | ||
loading_config=loading_config) | ||
chatbot = build_chatbot(config=config) | ||
result = chatbot.predict("Tell me about Intel Xeon Scalable Processors.") | ||
print(result) | ||
self.assertIn('Intel Xeon Scalable Processors', str(result)) | ||
|
||
if __name__ == "__main__": | ||
unittest.main() |