modelscope · Jintao-Huang · Dec 9, 2024 · Dec 9, 2024 · Dec 9, 2024
diff --git a/docs/source/GetStarted/快速开始.md b/docs/source/GetStarted/快速开始.md
@@ -7,10 +7,8 @@ SWIFT是集成了模型训练、推理部署、评测、量化一体的集成式
 - 任务类型：除通用的生成类型任务外，支持分类任务的训练
 - 轻量微调：支持了LoRA、QLoRA、DoRA、ReFT、LLaMAPro、Adapter、SCEdit、GaLore、Liger-Kernel等多种轻量微调方式
 - 训练stage：涵盖了预训练、微调、人类对齐的全stage
-- 训练并行：涵盖了单机单卡、单机多卡device_map、分布式数据并行（DDP）、多机多卡、DeepSpeed、FSDP、PAI DLC等，并支持Megatron架构的模型训练支持
-  - 额外支持了[TorchAcc](https://github.imc.re/AlibabaPAI/torchacc)训练加速
-  - 额外支持了基于[XTuner](https://github.com/InternLM/xtuner)的序列并行
-- 推理部署：支持PyTorch、vLLM、LmDeploy等多推理框架的推理部署，可直接应用在docker镜像或k8s工程环境中
+- 训练并行：涵盖了单机单卡、单机多卡device_map、分布式数据并行（DDP）、多机多卡、DeepSpeed、FSDP、PAI DLC等
+- 推理部署：支持PyTorch、vLLM、LmDeploy等多推理框架的推理部署
 - 评测：支持以EvalScope为基本框架的纯文本和多模态评测能力，并支持自定义评测
 - 导出：支持awq、gptq、bnb等量化方式，并支持lora、llamapro的merge操作
 - 界面化：支持以gradio为基本框架的界面化操作，并支持仅部署单模型应用于space或demo环境中

diff --git a/docs/source/Instruction/命令行参数.md b/docs/source/Instruction/命令行参数.md
@@ -239,14 +239,6 @@ Vera使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
 
 - use_liger: 使用liger-kernel进行训练.
 
-### TorchAcc参数
-
-- model_layer_cls_name: Decoder layer的类名
-- metric_warmup_step: TorchAcc的warmup步数，默认为1
-- fsdp_num: fsdp数量，默认为1
-- acc_steps: 训练时评估acc的step数，默认为1
-
-
 ### LMDeploy参数
 参数含义可以查看[lmdeploy文档](https://lmdeploy.readthedocs.io/en/latest/api/pipeline.html#turbomindengineconfig)
 
@@ -281,7 +273,7 @@ Vera使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
 ## 集成参数
 
 ### 训练参数
-训练参数除包含[基本参数](#基本参数)、[Seq2SeqTrainer参数](#Seq2SeqTrainer参数)、[tuner参数](#tuner参数)、[torchacc参数](#torchacc参数)外，还包含下面的部分:
+训练参数除包含[基本参数](#基本参数)、[Seq2SeqTrainer参数](#Seq2SeqTrainer参数)、[tuner参数](#tuner参数)外，还包含下面的部分:
 
 - add_version: 在output_dir上额外增加目录`'<版本号>-<时间戳>'`防止权重覆盖，默认为True
 - resume_only_model: 如果resume_from_checkpoint，仅resume模型权重，默认为False

diff --git a/docs/source/Instruction/预训练及微调.md b/docs/source/Instruction/预训练及微调.md
@@ -30,7 +30,6 @@ Megatron的example还没有正式支持，预计本迭代内会支持好。
 - packing：将多个sequence拼成一个，可以让每个样例训练时尽量接近设置的max_length，提高显卡利用率，参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/packing/train.sh)
 - 流式训练：不断读入数据，在数据量比较大的情况下减少内存使用。参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/streaming/train.sh)
 - lazy tokenize：适合一次读入固定数据，训练时解析图片的场景。参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/lazy_tokenize/train.sh)
-- torchacc：适合packing到固定长度时的训练提速，参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/torchacc)
 - agent训练：参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/agent)
 
 

diff --git a/docs/source_en/GetStarted/Quick-start.md b/docs/source_en/GetStarted/Quick-start.md
@@ -7,10 +7,8 @@ SWIFT is an integrated framework that encompasses model training, inference depl
 - Task Types: Besides general generative tasks, it supports training for classification tasks.
 - Lightweight Fine-tuning: Supports various lightweight fine-tuning methods such as LoRA, QLoRA, DoRA, ReFT, LLaMAPro, Adapter, SCEdit, GaLore, and Liger-Kernel.
 - Training stages: Covering the entire stages of pre-training, fine-tuning, and human alignment.
-- Training Parallelism: Covers single machine single card, single machine multiple card device mapping, distributed data parallelism (DDP), multi-machine multi-card, DeepSpeed, FSDP, PAI DLC, and supports training for models based on the Megatron architecture.
-  - Extra support for [TorchAcc](https://github.imc.re/AlibabaPAI/torchacc) training acceleration.
-  - Extra support for sequence parallelism based on [XTuner](https://github.com/InternLM/xtuner).
-- Inference Deployment: Supports inference deployment on multiple frameworks such as PyTorch, vLLM, LmDeploy, which can be directly applied in Docker images or Kubernetes environments.
+- Training Parallelism: Covers single machine single card, single machine multiple card device mapping, distributed data parallelism (DDP), multi-machine multi-card, DeepSpeed, FSDP, PAI DLC.
+- Inference Deployment: Supports inference deployment on multiple frameworks such as PyTorch, vLLM, LmDeploy.
 - Evaluation: Supports pure text and multi-modal evaluation capabilities based on the EvalScope framework, and allows for customized evaluation.
 - Export: Supports quantization methods like awq, gptq, bnb, and operations for merging lora and llamapro.
 - User Interface: Supports interface operations based on the Gradio framework and allows for the deployment of single model applications in space or demo environments.

diff --git a/docs/source_en/Instruction/Command-line-parameters.md b/docs/source_en/Instruction/Command-line-parameters.md
@@ -244,13 +244,6 @@ The following parameters are effective when `train_type` is set to `reft`.
 
 - use_liger: Use liger-kernel for training.
 
-### TorchAcc Arguments
-
-- model_layer_cls_name: Class name of Decoder layer.
-- metric_warmup_step: Warmup steps for TorchAcc, default is 1.
-- fsdp_num: Number of FSDP, default is 1.
-- acc_steps: Number of steps for evaluating accuracy during training, default is 1.
-
 ### LMDeploy Arguments
 
 Parameter meanings can be found in the [lmdeploy documentation](https://lmdeploy.readthedocs.io/en/latest/api/pipeline.html#turbomindengineconfig).
@@ -286,7 +279,7 @@ Parameter meanings can be found in the [vllm documentation](https://docs.vllm.ai
 
 ### Training Arguments
 
-Training arguments include the [base arguments](#base-arguments), [Seq2SeqTrainer arguments](#Seq2SeqTrainer-arguments), [tuner arguments](#tuner-arguments), [torchacc arguments](#torchacc-arguments), and also include the following parts:
+Training arguments include the [base arguments](#base-arguments), [Seq2SeqTrainer arguments](#Seq2SeqTrainer-arguments), [tuner arguments](#tuner-arguments), and also include the following parts:
 
 - add_version: Add directory to output_dir with `'<version>-<timestamp>'` to prevent weight overwrite, default is True.
 - resume_only_model: If resume_from_checkpoint, only resume model weights, default is False.

diff --git a/docs/source_en/Instruction/Pre-training-and-Fine-tuning.md b/docs/source_en/Instruction/Pre-training-and-Fine-tuning.md
@@ -30,7 +30,6 @@ Additionally, other technologies and examples supported by SWIFT include:
 - **Packing**: This combines multiple sequences into one, helping each sample to approach the set max_length during training, improving GPU utilization. See [here](https://github.com/modelscope/swift/blob/main/examples/train/packing/train.sh).
 - **Streaming Training**: This method continuously reads data, reducing memory usage when handling large datasets. Check [here](https://github.com/modelscope/swift/blob/main/examples/train/streaming/train.sh) for details.
 - **Lazy Tokenization**: Suitable for scenarios where a fixed amount of data is read in at once, and images are parsed during training. Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/lazy_tokenize/train.sh).
-- **torchacc**: This aids in speeding up training when packing to fixed lengths. More information can be found [here](https://github.com/modelscope/swift/blob/main/examples/train/torchacc).
 - **Agent Training**: For more details, see [here](https://github.com/modelscope/swift/blob/main/examples/train/agent).
 
 **Tips**:

diff --git a/examples/train/torchacc/baichuan2_13b_chat/acc_lora_dp_sft.sh b/examples/train/torchacc/baichuan2_13b_chat/acc_lora_dp_sft.sh
diff --git a/examples/train/torchacc/baichuan2_13b_chat/acc_lora_fsdp_sft.sh b/examples/train/torchacc/baichuan2_13b_chat/acc_lora_fsdp_sft.sh
diff --git a/examples/train/torchacc/baichuan2_13b_chat/swift_lora_sft.sh b/examples/train/torchacc/baichuan2_13b_chat/swift_lora_sft.sh
diff --git a/examples/train/torchacc/chatglm3_6b/acc_lora_dp_sft.sh b/examples/train/torchacc/chatglm3_6b/acc_lora_dp_sft.sh
diff --git a/examples/train/torchacc/chatglm3_6b/acc_lora_fsdp_sft.sh b/examples/train/torchacc/chatglm3_6b/acc_lora_fsdp_sft.sh
diff --git a/examples/train/torchacc/chatglm3_6b/swift_lora_sft.sh b/examples/train/torchacc/chatglm3_6b/swift_lora_sft.sh
diff --git a/examples/train/torchacc/llama2_13b_chat/acc_lora_dp_sft.sh b/examples/train/torchacc/llama2_13b_chat/acc_lora_dp_sft.sh
diff --git a/examples/train/torchacc/llama2_13b_chat/acc_lora_fsdp_sft.sh b/examples/train/torchacc/llama2_13b_chat/acc_lora_fsdp_sft.sh
diff --git a/examples/train/torchacc/llama2_13b_chat/swift_lora_sft.sh b/examples/train/torchacc/llama2_13b_chat/swift_lora_sft.sh