Merge pull request #64 from liltom-eth/llama2-wrapper

[DOCUMENT] update readme
liltom-eth · Sep 2, 2023 · d86e2c4 · d86e2c4
2 parents 8a0fe14 + 00b8dc6
commit d86e2c4
Showing 1 changed file with 29 additions and 13 deletions.
diff --git a/README.md b/README.md
@@ -64,7 +64,7 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl
 
 ## Usage
 
-### Start Web UI
+### Start Chat UI
 
 Run chatbot simply with web UI:
 
@@ -82,25 +82,29 @@ Start downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
 
 You can also customize your `MODEL_PATH`, `BACKEND_TYPE,` and model configs in `.env` file to run different llama2 models on different backends (llama.cpp, transformers, gptq). 
 
-Example run CodeLlama on gptq backend:
+### Start Code Llama UI
 
-```
-python app.py --backend_type gptq --model_path ./models/CodeLlama-7B-Instruct-GPTQ/ --share True
+We provide a code completion / filling UI for Code Llama.
+
+Base model **Code Llama** and extend model **Code Llama — Python** are not fine-tuned to follow instructions. They should be prompted so that the expected answer is the natural continuation of the prompt. That means these two models focus on code filling and code completion.
+
+Here is an example run CodeLlama code completion on llama.cpp backend:
+
+``` 
+python code_completion.py --model_path ./models/codellama-7b.ggmlv3.Q4_0.bin
 ```
 
+![code_llama_playground](https://i.imgur.com/FgMUiT6.gif)
 
+**Code Llama — Instruct** trained with “natural language instruction” inputs paired with anticipated outputs. This strategic methodology enhances the model’s capacity to grasp human expectations in prompts. That means instruct models can be used in a chatbot-like app.
 
-#### Env Examples
+Example run CodeLlama chat on gptq backend:
 
-There are some examples in `./env_examples/` folder.
+```
+python app.py --backend_type gptq --model_path ./models/CodeLlama-7B-Instruct-GPTQ/ --share True
+```
 
-| Model Setup                                            | Example .env                |
-| ------------------------------------------------------ | --------------------------- |
-| Llama-2-7b-chat-hf 8-bit (transformers backend)        | .env.7b_8bit_example        |
-| Llama-2-7b-Chat-GPTQ 4-bit (gptq transformers backend) | .env.7b_gptq_example        |
-| Llama-2-7B-Chat-GGML 4bit (llama.cpp backend)          | .env.7b_ggmlv3_q4_0_example |
-| Llama-2-13b-chat-hf (transformers backend)             | .env.13b_example            |
-| ...                                                    | ...                         |
+![code_llama_chat](https://i.imgur.com/lQLfemB.gif)
 
 ### Use llama2-wrapper for Your App
 
@@ -255,6 +259,18 @@ For GGML models like [TheBloke/Llama-2-7B-Chat-GGML](https://huggingface.co/TheB
 
 ## Tips
 
+### Env Examples
+
+There are some examples in `./env_examples/` folder.
+
+| Model Setup                                            | Example .env                |
+| ------------------------------------------------------ | --------------------------- |
+| Llama-2-7b-chat-hf 8-bit (transformers backend)        | .env.7b_8bit_example        |
+| Llama-2-7b-Chat-GPTQ 4-bit (gptq transformers backend) | .env.7b_gptq_example        |
+| Llama-2-7B-Chat-GGML 4bit (llama.cpp backend)          | .env.7b_ggmlv3_q4_0_example |
+| Llama-2-13b-chat-hf (transformers backend)             | .env.13b_example            |
+| ...                                                    | ...                         |
+
 ### Run on Nvidia GPU
 
 The running requires around 14GB of GPU VRAM for Llama-2-7b and 28GB of GPU VRAM for Llama-2-13b.