Add IPEX-LLM with GPU #24

ivy-lv11 · 2024-05-20T09:42:40Z

Description: ipex-llm is a PyTorch library for running LLM on Intel CPU and GPU. This PR adds GPU support to the IpexLLM llm integration.
Fixes # (issue): N/A
New Package?: No
Examples
- added a few more examples
- added -d option for all examples to choose device.
- add a new jupyter notebook doc for GPU.
Tests: N/A

.github/workflows/publish_sub_package.yml

shane-huang · 2024-05-21T03:12:24Z

add a jupyter notebook for gpu like this https://github.com/run-llama/llama_index/pull/13097/files#diff-4de70abde948df04dbda0c1a6cc564e1bf27c5481665939db1f014df156e6679
add a new example which works for both CPU and GPU, migrate this example to use the new interface https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LlamaIndex/rag.py.
update the more_data_types to use GPU, but need to test other data types which only works on GPU, etc. fp4

shane-huang · 2024-05-21T03:15:53Z

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py

@@ -33,11 +33,20 @@ def completion_to_prompt(completion):
        choices=["sym_int4", "asym_int4", "sym_int5", "asym_int5", "sym_int8"],


update the choices to and add GPU related data_types. For a full list of datatypes we can support, refer to https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/transformers.html#automodelforcausallm load_in_lowbit param api doc

ivy-lv11 · 2024-05-21T14:37:22Z

add a jupyter notebook for gpu like this https://github.com/run-llama/llama_index/pull/13097/files#diff-4de70abde948df04dbda0c1a6cc564e1bf27c5481665939db1f014df156e6679

add a new example which works for both CPU and GPU, migrate this example to use the new interface https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LlamaIndex/rag.py.

update the more_data_types to use GPU, but need to test other data types which only works on GPU, etc. fp4

Various data types were tested, including: fp4, fp8, fp16, bf16, nf3, nf4, fp8_e4m3, fp8_e5m2. All generate normally.

shane-huang · 2024-05-23T01:37:46Z

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/README.md

+The example [rag.py](./rag.py) shows how to use RAG pipeline. Run the example as following:
+
+```bash
+python rag.py -m <path_to_model> -q <question> -u <vector_db_username> -p <vector_db_password> -e <path_to_embedding_model> -n <num_token> -t <path_to_tokenizer> -x <device>


we'd better to use the same option -d for device for all examples。

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py

shane-huang · 2024-05-23T03:04:25Z

remove rag.py and we shall add it to the next PR.
write a simple inference example basic.py just like in jupyter and update example/README
update jupyter just like the embedding jupyter
uniform the arguments d, -q, -m params for all examples.
support only cpu/xpu and report error for others

shane-huang · 2024-05-23T03:20:40Z

also fix the test errors

shane-huang · 2024-05-23T03:21:05Z

add a jupyter notebook for gpu like this https://github.com/run-llama/llama_index/pull/13097/files#diff-4de70abde948df04dbda0c1a6cc564e1bf27c5481665939db1f014df156e6679

add a new example which works for both CPU and GPU, migrate this example to use the new interface https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LlamaIndex/rag.py.

update the more_data_types to use GPU, but need to test other data types which only works on GPU, etc. fp4

Various data types were tested, including: fp4, fp8, fp16, bf16, nf3, nf4, fp8_e4m3, fp8_e5m2. All generate normally.

add those options to example choices

shane-huang · 2024-05-23T07:12:11Z

docs/docs/examples/llm/ipex_llm_gpu.ipynb

+    "\n",
+    "## `IpexLLM`\n",
+    "\n",
+    "Setting `device_map=\"xpu\"` when initializing `IpexLLM` will put the embedding model on Intel GPU and benefit from IPEX-LLM optimizations:\n",


change this line to use the descriptions in llm jupyter doc.
add the descriptions to explain prompts like in llm jupyter doc

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/basic.py

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/low_bit.py

llama-index-integrations/llms/llama-index-llms-ipex-llm/llama_index/llms/ipex_llm/base.py

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/low_bit.py

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py

shane-huang · 2024-05-23T08:07:09Z

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/README.md

+python basic.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM>
+```
+
+> Please note that in this example we'll use [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) model for demonstration. It requires updating `transformers` and `tokenizers` packages.


lowbit also uses zephyr, put this update also in the low bit exmaple

Oscilloscope98 · 2024-05-23T08:35:13Z

Why we use langchain description here :)?

ivy-lv11 added 3 commits May 20, 2024 17:00

modify example

49b8441

workflow

f2150dc

workflow

badcbe6

Oscilloscope98 reviewed May 20, 2024

View reviewed changes

.github/workflows/publish_sub_package.yml Outdated Show resolved Hide resolved

workflow

af2d8b9

Oscilloscope98 reviewed May 20, 2024

View reviewed changes

.github/workflows/publish_sub_package.yml Outdated Show resolved Hide resolved

workflow

385a042

shane-huang reviewed May 21, 2024

View reviewed changes

ivy-lv11 added 3 commits May 21, 2024 16:45

add ipynb for ipex-llm-gpu

09142fa

modify rag

fff96ac

update README

efafe9c

update more datatype

40ff83c

shane-huang reviewed May 23, 2024

View reviewed changes

ivy-lv11 added 5 commits May 23, 2024 14:24

update

0955829

update

153915a

update

9d8dab4

modify device map

1a74b06

update

576d7e9

shane-huang reviewed May 23, 2024

View reviewed changes

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/low_bit.py Outdated Show resolved Hide resolved

llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py Outdated Show resolved Hide resolved

ivy-lv11 added 3 commits May 23, 2024 15:56

modify

a1e34d7

update

41bce6c

update

ae14a13

shane-huang reviewed May 23, 2024

View reviewed changes

update

f417256

upstream

f76713b

ivy-lv11 merged commit 2aeb875 into intel-analytics:ipex-llm-llm-gpu May 23, 2024
3 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IPEX-LLM with GPU #24

Add IPEX-LLM with GPU #24

ivy-lv11 commented May 20, 2024 •

edited by shane-huang

Loading

shane-huang commented May 21, 2024

shane-huang May 21, 2024

ivy-lv11 commented May 21, 2024

shane-huang May 23, 2024

shane-huang commented May 23, 2024 •

edited

Loading

shane-huang commented May 23, 2024

shane-huang commented May 23, 2024

shane-huang May 23, 2024

shane-huang May 23, 2024

Oscilloscope98 commented May 23, 2024 •

edited

Loading

		@@ -33,11 +33,20 @@ def completion_to_prompt(completion):
		choices=["sym_int4", "asym_int4", "sym_int5", "asym_int5", "sym_int8"],

Add IPEX-LLM with GPU #24

Add IPEX-LLM with GPU #24

Conversation

ivy-lv11 commented May 20, 2024 • edited by shane-huang Loading

shane-huang commented May 21, 2024

shane-huang May 21, 2024

Choose a reason for hiding this comment

ivy-lv11 commented May 21, 2024

shane-huang May 23, 2024

Choose a reason for hiding this comment

shane-huang commented May 23, 2024 • edited Loading

shane-huang commented May 23, 2024

shane-huang commented May 23, 2024

shane-huang May 23, 2024

Choose a reason for hiding this comment

shane-huang May 23, 2024

Choose a reason for hiding this comment

Oscilloscope98 commented May 23, 2024 • edited Loading

ivy-lv11 commented May 20, 2024 •

edited by shane-huang

Loading

shane-huang commented May 23, 2024 •

edited

Loading

Oscilloscope98 commented May 23, 2024 •

edited

Loading