From e508d4725f4e5419b14723bb40d1ec579a9f19ba Mon Sep 17 00:00:00 2001 From: Emma Ning <43255631+EmmaNingMS@users.noreply.github.com> Date: Thu, 16 May 2024 17:48:44 -0700 Subject: [PATCH] Update README.md for web example --- js/chat/README.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/js/chat/README.md b/js/chat/README.md index 0b9e2d7f3..862da91b6 100644 --- a/js/chat/README.md +++ b/js/chat/README.md @@ -1,11 +1,11 @@ -# Local Chat using Phi3, ONNX Runtime Web and WebGPU +# Local Chatbot in the browser using Phi3, ONNX Runtime Web and WebGPU This repository contains an example of running [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) in your browser using [ONNX Runtime Web](https://github.com/microsoft/onnxruntime) with WebGPU. You can try out the live demo [here](https://guschmue.github.io/ort-webgpu/chat/index.html). -We keep this example simple and use the onnxruntime-web api directly without a -higher level framework like [transformers.js](https://github.com/xenova/transformers.js). +We keep this example simple and use the onnxruntime-web api directly. ONNX Runtime Web has been powering +higher level frameworks like [transformers.js](https://github.com/xenova/transformers.js). ## Getting Started @@ -42,13 +42,11 @@ Point your browser to http://localhost:8080/. ### The Phi3 ONNX Model -The model used in this example is hosted on [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx-web). It is slightly different than the ONNX model for CUDA or CPU: +The model used in this example is hosted on [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx-web). It is an optimized ONNX version specific to Web and slightly different than the ONNX model for CUDA or CPU: 1. The model output 'logits' is kept as float32 (even for float16 models) since Javascript does not support float16. 2. Our WebGPU implementation uses the custom Multiheaded Attention operator instread of Group Query Attention. 3. Phi3 is larger then 2GB and we need to use external data files. To keep them cacheable in the browser, both model.onnx and model.onnx.data are kept under 2GB. -The model was created using the [ONNX genai model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models). - -If you like to create the model yourself, you can use [Olive](https://github.com/microsoft/Olive/). -An example how to create the model for ONNX Runtime Web with Olive can be found [here](https://github.com/microsoft/Olive/tree/main/examples/phi3). +If you like to optimize your fine-tuned pytorch Phi-3-min model, you can use [Olive](https://github.com/microsoft/Olive/) which supports float data type conversion and [ONNX genai model builder toolkit](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models). +An example how to optimize Phi-3-min model for ONNX Runtime Web with Olive can be found [here](https://github.com/microsoft/Olive/tree/main/examples/phi3).