diff --git a/docs/prebuilt_models.rst b/docs/prebuilt_models.rst
index 363798adae..367b3a18ea 100644
--- a/docs/prebuilt_models.rst
+++ b/docs/prebuilt_models.rst
@@ -25,15 +25,17 @@ Prebuilt Models for CLI
:header-rows: 1
* - Model code
- - Model Series
+ - Original Model
- Quantization Mode
- Hugging Face repo
- * - `Llama-2-7b-q4f16_1`
- - `Llama `__
+ * - `Llama-2-{7, 13, 70}b-chat-hf-q4f16_1`
+ - `Llama-2 `__
- * Weight storage data type: int4
* Running data type: float16
* Symmetric quantization
- - `link `__
+ - * `7B link `__
+ * `13B link `__
+ * `70B link `__
* - `vicuna-v1-7b-q3f16_0`
- `Vicuna `__
- * Weight storage data type: int3
@@ -46,24 +48,58 @@ Prebuilt Models for CLI
* Running data type: float16
* Symmetric quantization
- `link `__
- * - `rwkv-raven-1b5-q8f16_0`
+ * - `rwkv-raven-{1b5, 3b, 7b}-q8f16_0`
- `RWKV `__
- * Weight storage data type: uint8
* Running data type: float16
* Symmetric quantization
- - `link `__
- * - `rwkv-raven-3b-q8f16_0`
- - `RWKV `__
- - * Weight storage data type: uint8
- * Running data type: float16
+ - * `1b5 link `__
+ * `3b link `__
+ * `7b link `__
+ * - `WizardLM-13B-V1.2-{q4f16_1, q4f32_1}`
+ - `WizardLM `__
+ - * Weight storage data type: int4
+ * Running data type: float{16, 32}
* Symmetric quantization
- - `link `__
- * - `rwkv-raven-7b-q8f16_0`
- - `RWKV `__
- - * Weight storage data type: uint8
+ - * `q4f16_1 link `__
+ * `q4f32_1 link `__
+ * - `WizardCoder-15B-V1.0-{q4f16_1, q4f32_1}`
+ - `WizardCoder `__
+ - * Weight storage data type: int4
+ * Running data type: float{16, 32}
+ * Symmetric quantization
+ - * `q4f16_1 link `__
+ * `q4f32_1 link `__
+ * - `WizardMath-{7, 13, 70}B-V1.0-q4f16_1`
+ - `WizardMath `__
+ - * Weight storage data type: int4
* Running data type: float16
* Symmetric quantization
- - `link `__
+ - * `7B link `__
+ * `13B link `__
+ * `70B link `__
+ * - `llama2-7b-chat-uncensored-{q4f16_1, q4f32_1}`
+ - `georgesung `__
+ - * Weight storage data type: int4
+ * Running data type: float{16, 32}
+ * Symmetric quantization
+ - * `q4f16_1 link `__
+ * `q4f32_1 link `__
+ * - `Llama2-Chinese-7b-Chat-{q4f16_1, q4f32_1}`
+ - `FlagAlpha `__
+ - * Weight storage data type: int4
+ * Running data type: float{16, 32}
+ * Symmetric quantization
+ - * `q4f16_1 link `__
+ * `q4f32_1 link `__
+ * - `GOAT-7B-Community-{q4f16_1, q4f32_1}`
+ - `GOAT-AI `__
+ - * Weight storage data type: int4
+ * Running data type: float{16, 32}
+ * Symmetric quantization
+ - * `q4f16_1 link `__
+ * `q4f32_1 link `__
+
To download and run one model with CLI, follow the instructions below:
@@ -179,6 +215,11 @@ For example, if you compile `OpenLLaMA-7B `__
* `WizardLM `__
* `YuLan-Chat `__
+ * `WizardMath `__
+ * `FlagAlpha Llama-2 Chinese `__
* - ``gpt-neox``
- `GPT-NeoX `__
- `Relax Code `__