openchat.openvino

Interactive demo

Download model from https://huggingface.co/fakezeta/openchat-3.5-0106-openvino-int8 in ir_model folder

1. Run interactive Q&A demo with Gradio:

$ python3 demo/qa_gradio.py -m "./ir_model" -d GPU

-d argument to select the default device: depending on your hardware may be one of these. Can be changed runtime with the dropdown menu widget.

This sample shows how to implement OpenChat model with OpenVINO runtime.

Only qa_gradio.py and relevant model.py has been adapted because the goal was to benchmark iGPU performance vs CPU vs other iGPU backends.

Following the original README from OpenVINO-dev-contest/llama2.openvino

Please follow the Licence on HuggingFace and get the approval from Meta before downloading llama checkpoints, for more information
Please notice this repository is only for a functional test and personal study.

Requirements

Linux, Windows
Python >= 3.9.0
CPU or GPU compatible with OpenVINO.
RAM: >=16GB
vRAM: >=8GB

Install the requirements

$ python3 -m venv openvino_env

$ source openvino_env/bin/activate

$ python3 -m pip install --upgrade pip

$ pip install wheel setuptools

$ pip install -r requirements.txt

Q&A Pipeline

1. Export IR model

from Transformers:

$ python3 export_ir.py -m 'meta-llama/Llama-2-7b-hf'

or from Optimum-Intel:

$ python3 export_op.py -m 'meta-llama/Llama-2-7b-hf'

or for #GPTQ model:

$ python3 export_op.py -m 'TheBloke/Llama-2-7B-Chat-GPTQ'

1.1. (Optional) quantize local IR model with #int8 or #int4 weight

$ python3 quantize.py -m 'ir_model' -p 'int4'

For more information on quantization configuration, please refer to weight compression

2. Run pipeline

Optimum-Intel OpenVINO pipeline:

$ python3 ir_pipeline/generate_op.py -m "./ir_model" -p "what is openvino ?" -d "CPU"

or Restructured pipeline:

$ python3 ir_pipeline/generate_ir.py -m "./ir_model" -p "what is openvino ?" -d "CPU"

Interactive demo

1. Run interactive Q&A demo with Gradio:

$ python3 demo/qa_gradio.py -m "./ir_model"

2. or chatbot demo with Streamlit:

$ python3 export_op.py -m 'meta-llama/Llama-2-7b-chat-hf' -o './ir_model_chat'

$ streamlit run demo/chat_streamlit.py -- -m './ir_model_chat'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openchat.openvino

Interactive demo

Following the original README from OpenVINO-dev-contest/llama2.openvino

Requirements

Install the requirements

Q&A Pipeline

Interactive demo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
demo		demo
pipeline		pipeline
utils		utils
README.md		README.md
Screenshot 2024-02-19 at 13-03-38 Gradio.png		Screenshot 2024-02-19 at 13-03-38 Gradio.png
export_ir.py		export_ir.py
export_op.py		export_op.py
quantize.py		quantize.py
requirements.txt		requirements.txt

fakezeta/openchat.openvino

Folders and files

Latest commit

History

Repository files navigation

openchat.openvino

Interactive demo

Following the original README from OpenVINO-dev-contest/llama2.openvino

Requirements

Install the requirements

Q&A Pipeline

Interactive demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages