modelscope · Jintao-Huang · Jan 12, 2024 · Jan 12, 2024 · Jan 12, 2024 · Jan 12, 2024
diff --git a/README.md b/README.md
diff --git a/README_CN.md b/README_CN.md
diff --git a/docs/source/LLM/LLM推理文档.md b/docs/source/LLM/LLM推理文档.md
@@ -413,21 +413,21 @@ CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b-chat
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 
-from swift.llm import InferArguments, ModelType, app_ui_main
+from swift.llm import AppUIArguments, ModelType, app_ui_main
 
-infer_args = InferArguments(model_type=ModelType.qwen_7b_chat)
-app_ui_main(infer_args)
+app_ui_args = AppUIArguments(model_type=ModelType.qwen_7b_chat)
+app_ui_main(app_ui_args)
 ```
 
 使用bnb量化:
 ```python
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 
-from swift.llm import InferArguments, ModelType, app_ui_main
+from swift.llm import AppUIArguments, ModelType, app_ui_main
 
-infer_args = InferArguments(model_type=ModelType.qwen_7b_chat, quantization_bit=4)
-app_ui_main(infer_args)
+app_ui_args = AppUIArguments(model_type=ModelType.qwen_7b_chat, quantization_bit=4)
+app_ui_main(app_ui_args)
 ```
 
 ### qwen-7b
@@ -441,10 +441,10 @@ CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 
-from swift.llm import InferArguments, ModelType, app_ui_main
+from swift.llm import AppUIArguments, ModelType, app_ui_main
 
-infer_args = InferArguments(model_type=ModelType.qwen_7b)
-app_ui_main(infer_args)
+app_ui_args = AppUIArguments(model_type=ModelType.qwen_7b)
+app_ui_main(app_ui_args)
 ```
 
 ### 微调后模型

diff --git a/docs/source/LLM/命令行参数.md b/docs/source/LLM/命令行参数.md
@@ -1,10 +1,12 @@
 # 命令行参数
 ## 目录
-- [sft 命令行参数](#sft-命令行参数)
-- [merge-lora infer app-ui 命令行参数](#merge-lora-infer-app-ui-命令行参数)
-- [deploy 命令行参数](#deploy-命令行参数)
+- [SFT 参数](#SFT-参数)
+- [DPO 参数](#DPO-参数)
+- [merge-lora infer 参数](#merge-lora-infer-参数)
+- [app-ui 参数](#app-ui-参数)
+- [deploy 参数](#deploy-参数)
 
-## sft 命令行参数
+## SFT 参数
 - `--model_type`: 表示你选择的模型类型, 默认是`None`. 如果没有指定`model_id_or_path`, 则抛出异常. 如果指定了`model_id_or_path`, 则会根据`model_id_or_path`以及`MODEL_MAPPING`推断`model_type`. `model_type`和`model_id_or_path`这两个参数不能同时指定. 可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`.
 - `--model_id_or_path`: 表示模型在ModelScope Hub中的`model_id`, 不区分大小写, 默认为`None`. 如果`--model_id_or_path`未被注册, 则会抛出异常. 你可以使用`model_type`的方式指定模型类型, 也可以通过`model_id_or_path`的方式指定模型类型.
 - `--model_revision`: 表示模型在ModelScope Hub中对应`model_id`的版本号, 默认为`None`. `model_revision`指定为`None`, 则使用注册在`MODEL_MAPPING`中的revision. 否则强制使用命令行传入的`model_revision`.
@@ -92,15 +94,15 @@
 - `--repetition_penalty`: 默认为`1.05`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 - `--num_beams`: 默认为`1`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 
-## DPO参数
+## DPO 参数
 
-DPO参数继承了上面的SFT参数, 除此之外增加了以下参数:
+dpo参数继承了sft参数, 除此之外增加了以下参数:
 
-- `--ref_model_type` 对比模型类型, 可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`.
-- `--max_prompt_length` 最大的提示长度, 该参数会传入DPOTrainer中, 使prompt长度不超过该值的设置, 默认值1024.
+- `--ref_model_type` 对比模型的类型, 可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`.
+- `--max_prompt_length` 最大的提示长度, 该参数会传入DPOTrainer中, 使prompt长度不超过该值的设置, 默认值`1024`.
 
 
-## merge-lora infer app-ui 命令行参数
+## merge-lora infer 参数
 - `--model_type`: 默认值为`None`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--model_id_or_path`: 默认值为`None`, 具体的参数介绍可以在`sft.sh命令行参数`中查看. 推荐使用model_type的方式指定.
 - `--model_revision`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看. 如果`model_id_or_path`为None或者是本地的模型目录, 则该参数失效.
@@ -142,14 +144,23 @@ DPO参数继承了上面的SFT参数, 除此之外增加了以下参数:
 - `--save_safetensors`: 保存成`safetensors`文件还是`bin`文件. 默认为`True`.
 - `--overwrite_generation_config`: 是否将评估所使用的generation_config保存成`generation_config.json`文件, 默认为`None`. 如果指定了`ckpt_dir`, 则设置为`True`, 否则设置为`False`. 训练时保存的generation_config文件将被覆盖.
 - `--verbose`: 如果设置为False, 则使用tqdm样式推理. 如果设置为True, 则输出推理的query, response, label. 默认为`None`, 进行自动选择, 即`len(val_dataset) >= 100`时, 设置为False, 否则设置为True. 该参数只有在使用数据集评估时生效.
-- `--share`: 传递给gradio的`demo.queue().launch(...)`函数. 该参数只有在使用`app-ui`时才生效.
 - `--gpu_memory_utilization`: 初始化vllm引擎`EngineArgs`的参数, 默认为`0.9`. 该参数只有在使用vllm时才生效.
 - `--tensor_parallel_size`: 初始化vllm引擎`EngineArgs`的参数, 默认为`1`. 该参数只有在使用vllm时才生效.
 
 
-## deploy 命令行参数
+## app-ui 参数
+
+app-ui参数继承了infer参数, 除此之外增加了以下参数:
+
+- `server_name`: 默认为`'127.0.0.1'`. 传递给gradio的`demo.queue().launch(...)`函数.
+- `server_port`: 默认为`7860`. 传递给gradio的`demo.queue().launch(...)`函数.
+- `share`: 默认为`False`. 传递给gradio的`demo.queue().launch(...)`函数.
+
+## deploy 参数
+
+deploy参数继承了infer参数, 除此之外增加了以下参数:
+
 - `--host`: 默认为`'127.0.0.1`.
 - `--port`: 默认为`8000`.
 - `--ssl_keyfile`: 默认为`None`.
 - `--ssl_certfile`: 默认为`None`.
-- 其他参数继承自infer的命令行参数.
diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -55,6 +55,8 @@
 |yi-34b-chat|[01ai/Yi-34B-Chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||
 |deepseek-7b|[deepseek-ai/deepseek-llm-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||
 |deepseek-7b-chat|[deepseek-ai/deepseek-llm-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||
+|deepseek-moe-16b|[deepseek-ai/deepseek-moe-16b-base](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2718;||
+|deepseek-moe-16b-chat|[deepseek-ai/deepseek-moe-16b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-chat/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2718;||
 |deepseek-67b|[deepseek-ai/deepseek-llm-67b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||
 |deepseek-67b-chat|[deepseek-ai/deepseek-llm-67b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-chat/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||
 |openbuddy-llama2-13b-chat|[OpenBuddy/openbuddy-llama2-13b-v8.1-fp16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||
@@ -64,10 +66,10 @@
 |openbuddy-zephyr-7b-chat|[OpenBuddy/openbuddy-zephyr-7b-v14.1](https://modelscope.cn/models/OpenBuddy/openbuddy-zephyr-7b-v14.1/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;|transformers>=4.34|
 |openbuddy-deepseek-67b-chat|[OpenBuddy/openbuddy-deepseek-67b-v15.2](https://modelscope.cn/models/OpenBuddy/openbuddy-deepseek-67b-v15.2/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||
 |mistral-7b|[AI-ModelScope/Mistral-7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.34|
-|mistral-7b-chat|[AI-ModelScope/Mistral-7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.34|
-|mistral-7b-chat-v2|[AI-ModelScope/Mistral-7B-Instruct-v0.2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.34|
-|mixtral-7b-moe|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.36|
-|mixtral-7b-moe-chat|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.36|
+|mistral-7b-instruct|[AI-ModelScope/Mistral-7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.34|
+|mistral-7b-instruct-v2|[AI-ModelScope/Mistral-7B-Instruct-v0.2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.34|
+|mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.36|
+|mixtral-moe-7b-instruct|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.36|
 |baichuan-7b|[baichuan-inc/baichuan-7B](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|
 |baichuan-13b|[baichuan-inc/Baichuan-13B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|
 |baichuan-13b-chat|[baichuan-inc/Baichuan-13B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary)|W_pack|baichuan|&#x2718;|&#x2714;|transformers<4.34|
@@ -104,11 +106,11 @@
 |tongyi-finance-14b-chat-int4|[TongyiFinance/Tongyi-Finance-14B-Chat-Int4](https://modelscope.cn/models/TongyiFinance/Tongyi-Finance-14B-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2718;|auto_gptq>=0.5|
 |codefuse-codellama-34b-chat|[codefuse-ai/CodeFuse-CodeLlama-34B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/summary)|q_proj, k_proj, v_proj|codefuse-codellama|&#x2714;|&#x2714;||
 |deepseek-coder-1_3b|[deepseek-ai/deepseek-coder-1.3b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||
-|deepseek-coder-1_3b-chat|[deepseek-ai/deepseek-coder-1.3b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||
+|deepseek-coder-1_3b-instruct|[deepseek-ai/deepseek-coder-1.3b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||
 |deepseek-coder-6_7b|[deepseek-ai/deepseek-coder-6.7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||
-|deepseek-coder-6_7b-chat|[deepseek-ai/deepseek-coder-6.7b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||
+|deepseek-coder-6_7b-instruct|[deepseek-ai/deepseek-coder-6.7b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||
 |deepseek-coder-33b|[deepseek-ai/deepseek-coder-33b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||
-|deepseek-coder-33b-chat|[deepseek-ai/deepseek-coder-33b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||
+|deepseek-coder-33b-instruct|[deepseek-ai/deepseek-coder-33b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||
 |phi2-3b|[AI-ModelScope/phi-2](https://modelscope.cn/models/AI-ModelScope/phi-2/summary)|Wqkv|default-generation|&#x2714;|&#x2714;||
 |cogagent-chat|[ZhipuAI/cogagent-chat](https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent|&#x2718;|&#x2718;||
 |cogagent-vqa|[ZhipuAI/cogagent-vqa](https://modelscope.cn/models/ZhipuAI/cogagent-vqa/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent|&#x2718;|&#x2718;||
@@ -172,5 +174,8 @@
 |ner-jave-zh|[damo/zh_ner-JAVE](https://modelscope.cn/datasets/damo/zh_ner-JAVE/summary)|1266|0|118.3±45.5, min=44, max=223|chat, ner|
 |coco-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|414113|40504|298.8±2.8, min=294, max=351|chat, multi-modal, vision|
 |🔥coco-mini-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|20000|200|298.8±2.8, min=294, max=339|chat, multi-modal, vision|
+|capcha-images|[AI-ModelScope/captcha-images](https://modelscope.cn/datasets/AI-ModelScope/captcha-images/summary)|6000|2000|29.0±0.0, min=29, max=29|chat, multi-modal, vision|
 |aishell1-zh|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)|134424|7176|152.2±36.8, min=63, max=419|chat, multi-modal, audio|
 |🔥aishell1-mini-zh|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)|14326|200|152.0±35.5, min=74, max=359|chat, multi-modal, audio|
+|stack-exchange-paired|[AI-ModelScope/stack-exchange-paired](https://modelscope.cn/datasets/AI-ModelScope/stack-exchange-paired/summary)|4483004|0|534.5±594.6, min=31, max=56588|hfrl, dpo, pairwise|
+|hh-rlhf|[AI-ModelScope/hh-rlhf](https://modelscope.cn/datasets/AI-ModelScope/hh-rlhf/summary)|42537|2312|163.4±117.7, min=27, max=964|hfrl, dpo, pairwise|
diff --git a/docs/source/LLM/自我认知微调最佳实践.md b/docs/source/LLM/自我认知微调最佳实践.md
@@ -260,14 +260,14 @@ CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'qwen-7b-chat/vx-xxx/checkpoint-xx
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 
-from swift.llm import InferArguments, merge_lora_main, app_ui_main
+from swift.llm import AppUIArguments, merge_lora_main, app_ui_main
 
 best_model_checkpoint = 'qwen-7b-chat/vx-xxx/checkpoint-xxx'
-infer_args = InferArguments(
+app_ui_args = AppUIArguments(
     ckpt_dir=best_model_checkpoint,
     eval_human=True)
-# merge_lora_main(infer_args)
-result = app_ui_main(infer_args)
+# merge_lora_main(app_ui_args)
+result = app_ui_main(app_ui_args)
 ```
 
 使用CLI:

diff --git a/examples/pytorch/llm/app.py b/examples/pytorch/llm/app.py
@@ -3,14 +3,14 @@
 # os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 import custom
 
-from swift.llm import InferArguments, ModelType, app_ui_main
+from swift.llm import AppUIArguments, ModelType, app_ui_main
 
 if __name__ == '__main__':
     # Please refer to the `infer.sh` for setting the parameters.
     # text-generation
-    # args = InferArguments(model_type=ModelType.chatglm3_6b_base)
+    # args = AppUIArguments(model_type=ModelType.chatglm3_6b_base)
     # or chat
-    args = InferArguments(model_type=ModelType.qwen_7b_chat_int4)
+    args = AppUIArguments(model_type=ModelType.qwen_7b_chat_int4)
     # or load from ckpt dir
-    # args = InferArguments(ckpt_dir='xxx/vx_xxx/checkpoint-xxx')
+    # args = AppUIArguments(ckpt_dir='xxx/vx_xxx/checkpoint-xxx')
     app_ui_main(args)
diff --git a/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/infer.sh b/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/infer.sh
@@ -0,0 +1,12 @@
+# Experimental environment: A100
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --ckpt_dir "output/deepseek-moe-16b-chat/vx_xxx/checkpoint-xxx" \
+    --load_dataset_config true \
+    --max_length 4096 \
+    --use_flash_attn true \
+    --max_new_tokens 2048 \
+    --temperature 0.1 \
+    --top_p 0.7 \
+    --repetition_penalty 1.05 \
+    --do_sample true \
diff --git a/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/sft.sh b/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/sft.sh
@@ -0,0 +1,12 @@
+# Experimental environment: A100
+# 52GB GPU memory
+CUDA_VISIBLE_DEVICES=0 \
+swift sft \
+    --model_type deepseek-moe-16b-chat \
+    --dataset damo-agent-mini-zh \
+    --train_dataset_sample 20000 \
+    --max_length 4096 \
+    --gradient_checkpointing true \
+    --eval_steps 100 \
+    --use_flash_attn true \
+    --output_dir output \
diff --git a/...ipts/mistral_7b_chat/lora_ddp_ds/infer.sh → .../mistral_7b_instruct/lora_ddp_ds/infer.sh b/...ipts/mistral_7b_chat/lora_ddp_ds/infer.sh → .../mistral_7b_instruct/lora_ddp_ds/infer.sh
@@ -3,7 +3,7 @@
 PYTHONPATH=../../.. \
 CUDA_VISIBLE_DEVICES=0 \
 python llm_infer.py \
-    --ckpt_dir "output/mistral-7b-chat/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "output/mistral-7b-instruct/vx_xxx/checkpoint-xxx" \
     --load_dataset_config true \
     --max_length 4096 \
     --max_new_tokens 2048 \

diff --git a/...cripts/mistral_7b_chat/lora_ddp_ds/sft.sh → ...ts/mistral_7b_instruct/lora_ddp_ds/sft.sh b/...cripts/mistral_7b_chat/lora_ddp_ds/sft.sh → ...ts/mistral_7b_instruct/lora_ddp_ds/sft.sh
@@ -37,7 +37,7 @@ torchrun \
     --save_total_limit 2 \
     --logging_steps 10 \
     --push_to_hub false \
-    --hub_model_id mistral-7b-chat-lora \
+    --hub_model_id mistral-7b-instruct-lora \
     --hub_private_repo true \
     --hub_token 'your-sdk-token' \
     --deepspeed_config_path 'ds_config/zero2.json' \

diff --git a/...ipts/mistral_7b_chat/lora_mp_ddp/infer.sh → .../mistral_7b_instruct/lora_mp_ddp/infer.sh b/...ipts/mistral_7b_chat/lora_mp_ddp/infer.sh → .../mistral_7b_instruct/lora_mp_ddp/infer.sh
@@ -3,7 +3,7 @@
 PYTHONPATH=../../.. \
 CUDA_VISIBLE_DEVICES=0 \
 python llm_infer.py \
-    --ckpt_dir "output/mistral-7b-chat/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "output/mistral-7b-instruct/vx_xxx/checkpoint-xxx" \
     --load_dataset_config true \
     --max_length 4096 \
     --max_new_tokens 2048 \

diff --git a/...cripts/mistral_7b_chat/lora_mp_ddp/sft.sh → ...ts/mistral_7b_instruct/lora_mp_ddp/sft.sh b/...cripts/mistral_7b_chat/lora_mp_ddp/sft.sh → ...ts/mistral_7b_instruct/lora_mp_ddp/sft.sh
@@ -37,6 +37,6 @@ torchrun \
     --save_total_limit 2 \
     --logging_steps 10 \
     --push_to_hub false \
-    --hub_model_id mistral-7b-chat-lora \
+    --hub_model_id mistral-7b-instruct-lora \
     --hub_private_repo true \
     --hub_token 'your-sdk-token' \
diff --git a/.../llm/scripts/mixtral_7b_moe/lora/infer.sh → .../llm/scripts/mixtral_moe_7b/lora/infer.sh b/.../llm/scripts/mixtral_7b_moe/lora/infer.sh → .../llm/scripts/mixtral_moe_7b/lora/infer.sh
@@ -3,7 +3,7 @@
 PYTHONPATH=../../.. \
 CUDA_VISIBLE_DEVICES=0,1 \
 python llm_infer.py \
-    --ckpt_dir "output/mixtral-7b-moe/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "output/mixtral-moe-7b/vx_xxx/checkpoint-xxx" \
     --load_dataset_config true \
     --max_length 2048 \
     --use_flash_attn true \

diff --git a/...ch/llm/scripts/mixtral_7b_moe/lora/sft.sh → ...ch/llm/scripts/mixtral_moe_7b/lora/sft.sh b/...ch/llm/scripts/mixtral_7b_moe/lora/sft.sh → ...ch/llm/scripts/mixtral_moe_7b/lora/sft.sh
diff --git a/...scripts/mixtral_7b_moe_chat/lora/infer.sh → ...pts/mixtral_moe_7b_instruct/lora/infer.sh b/...scripts/mixtral_7b_moe_chat/lora/infer.sh → ...pts/mixtral_moe_7b_instruct/lora/infer.sh
@@ -3,7 +3,7 @@
 PYTHONPATH=../../.. \
 CUDA_VISIBLE_DEVICES=0,1 \
 python llm_infer.py \
-    --ckpt_dir "output/mixtral-7b-moe-chat/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "output/mixtral-moe-7b-instruct/vx_xxx/checkpoint-xxx" \
     --load_dataset_config true \
     --max_length 2048 \
     --use_flash_attn true \

diff --git a/...m/scripts/mixtral_7b_moe_chat/lora/sft.sh → ...ripts/mixtral_moe_7b_instruct/lora/sft.sh b/...m/scripts/mixtral_7b_moe_chat/lora/sft.sh → ...ripts/mixtral_moe_7b_instruct/lora/sft.sh