diff --git a/docs/prebuilt_models.rst b/docs/prebuilt_models.rst index 363798adae..367b3a18ea 100644 --- a/docs/prebuilt_models.rst +++ b/docs/prebuilt_models.rst @@ -25,15 +25,17 @@ Prebuilt Models for CLI :header-rows: 1 * - Model code - - Model Series + - Original Model - Quantization Mode - Hugging Face repo - * - `Llama-2-7b-q4f16_1` - - `Llama `__ + * - `Llama-2-{7, 13, 70}b-chat-hf-q4f16_1` + - `Llama-2 `__ - * Weight storage data type: int4 * Running data type: float16 * Symmetric quantization - - `link `__ + - * `7B link `__ + * `13B link `__ + * `70B link `__ * - `vicuna-v1-7b-q3f16_0` - `Vicuna `__ - * Weight storage data type: int3 @@ -46,24 +48,58 @@ Prebuilt Models for CLI * Running data type: float16 * Symmetric quantization - `link `__ - * - `rwkv-raven-1b5-q8f16_0` + * - `rwkv-raven-{1b5, 3b, 7b}-q8f16_0` - `RWKV `__ - * Weight storage data type: uint8 * Running data type: float16 * Symmetric quantization - - `link `__ - * - `rwkv-raven-3b-q8f16_0` - - `RWKV `__ - - * Weight storage data type: uint8 - * Running data type: float16 + - * `1b5 link `__ + * `3b link `__ + * `7b link `__ + * - `WizardLM-13B-V1.2-{q4f16_1, q4f32_1}` + - `WizardLM `__ + - * Weight storage data type: int4 + * Running data type: float{16, 32} * Symmetric quantization - - `link `__ - * - `rwkv-raven-7b-q8f16_0` - - `RWKV `__ - - * Weight storage data type: uint8 + - * `q4f16_1 link `__ + * `q4f32_1 link `__ + * - `WizardCoder-15B-V1.0-{q4f16_1, q4f32_1}` + - `WizardCoder `__ + - * Weight storage data type: int4 + * Running data type: float{16, 32} + * Symmetric quantization + - * `q4f16_1 link `__ + * `q4f32_1 link `__ + * - `WizardMath-{7, 13, 70}B-V1.0-q4f16_1` + - `WizardMath `__ + - * Weight storage data type: int4 * Running data type: float16 * Symmetric quantization - - `link `__ + - * `7B link `__ + * `13B link `__ + * `70B link `__ + * - `llama2-7b-chat-uncensored-{q4f16_1, q4f32_1}` + - `georgesung `__ + - * Weight storage data type: int4 + * Running data type: float{16, 32} + * Symmetric quantization + - * `q4f16_1 link `__ + * `q4f32_1 link `__ + * - `Llama2-Chinese-7b-Chat-{q4f16_1, q4f32_1}` + - `FlagAlpha `__ + - * Weight storage data type: int4 + * Running data type: float{16, 32} + * Symmetric quantization + - * `q4f16_1 link `__ + * `q4f32_1 link `__ + * - `GOAT-7B-Community-{q4f16_1, q4f32_1}` + - `GOAT-AI `__ + - * Weight storage data type: int4 + * Running data type: float{16, 32} + * Symmetric quantization + - * `q4f16_1 link `__ + * `q4f32_1 link `__ + To download and run one model with CLI, follow the instructions below: @@ -179,6 +215,11 @@ For example, if you compile `OpenLLaMA-7B `__ * `WizardLM `__ * `YuLan-Chat `__ + * `WizardMath `__ + * `FlagAlpha Llama-2 Chinese `__ * - ``gpt-neox`` - `GPT-NeoX `__ - `Relax Code `__