-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IPEX-LLM with GPU #24
Add IPEX-LLM with GPU #24
Conversation
|
@@ -33,11 +33,20 @@ def completion_to_prompt(completion): | |||
choices=["sym_int4", "asym_int4", "sym_int5", "asym_int5", "sym_int8"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the choices to and add GPU related data_types. For a full list of datatypes we can support, refer to https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/transformers.html#automodelforcausallm load_in_lowbit
param api doc
Various data types were tested, including: fp4, fp8, fp16, bf16, nf3, nf4, fp8_e4m3, fp8_e5m2. All generate normally. |
The example [rag.py](./rag.py) shows how to use RAG pipeline. Run the example as following: | ||
|
||
```bash | ||
python rag.py -m <path_to_model> -q <question> -u <vector_db_username> -p <vector_db_password> -e <path_to_embedding_model> -n <num_token> -t <path_to_tokenizer> -x <device> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better to use the same option -d
for device for all examples。
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py
Outdated
Show resolved
Hide resolved
|
also fix the test errors |
add those options to example choices |
"\n", | ||
"## `IpexLLM`\n", | ||
"\n", | ||
"Setting `device_map=\"xpu\"` when initializing `IpexLLM` will put the embedding model on Intel GPU and benefit from IPEX-LLM optimizations:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this line to use the descriptions in llm jupyter doc.
add the descriptions to explain prompts like in llm jupyter doc
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/basic.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/low_bit.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/llama_index/llms/ipex_llm/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/llama_index/llms/ipex_llm/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/llama_index/llms/ipex_llm/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/low_bit.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py
Outdated
Show resolved
Hide resolved
python basic.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM> | ||
``` | ||
|
||
> Please note that in this example we'll use [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) model for demonstration. It requires updating `transformers` and `tokenizers` packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowbit also uses zephyr, put this update also in the low bit exmaple
Why we use langchain description here :)? |
2aeb875
into
intel-analytics:ipex-llm-llm-gpu
ipex-llm
is a PyTorch library for running LLM on Intel CPU and GPU. This PR adds GPU support to the IpexLLM llm integration.-d
option for all examples to choose device.