Skip to content

This sample shows how to implement a llama-based model with OpenVINO runtime

Notifications You must be signed in to change notification settings

jjhw/llama.openvino

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

llama_openvino

This sample shows how to implement a llama-based model with OpenVINO runtime.

Please notice this repository is only for a functional test, and you can try to quantize the model to further optimize the performance of it

How to run it?

  1. Install the requirements:

    $pip install -r requirements.txt

  2. Export the ONNX model from HuggingFace pipeline:

    $python export.py -m huggingface_model_path -o onnx_model_path

    For example: python export.py -m "xxx/llama-7b-hf" -o "./llama.onnx"

    please follow the Licence on HuggingFace and get the approval from Meta before downloading llama checkpoints

  3. Convert ONNX model to OpenVINO IR in FP16:

    $mo -m onnx_model_path --compress_to_fp16

  4. Run restructured pipeline:

    $python generate.py -m openvino_model_path -t tokenizer_path -p prompt_sentence

About

This sample shows how to implement a llama-based model with OpenVINO runtime

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages