You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
使用如上版本训练 https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_classification/multi_class#readme
示例中的训练数据时,使用CPU模式时由于默认命令只使用单线程训练。想加快训练进程,查看了有一个enable_auto_parallel参数,当把这个 enable_auto_parallel 置为True时,启动训练会报get_rank_by_dim_and_process_id 函数找不到。
Traceback (most recent call last):
File "train.py", line 230, in <module>
main()
File "train.py", line 166, in main
trainer = Trainer(
File "/home/user/.local/lib/python3.8/site-packages/paddlenlp/trainer/trainer.py", line 388, in __init__
self.print_config()
File "/home/user/.local/lib/python3.8/site-packages/paddlenlp/trainer/trainer.py", line 3058, in print_config
v = getattr(args, a)
File "/home/user/.local/lib/python3.8/site-packages/paddlenlp/trainer/training_args.py", line 1524, in data_parallel_rank
return mesh.get_rank_by_dim_and_process_id("dp", dist.get_rank())
AttributeError: 'ProcessMesh' object has no attribute 'get_rank_by_dim_and_process_id'
软件环境
重复问题
错误描述
稳定复现步骤 & 代码
训练时启用 enable_auto_parallel参数
python3 train.py
--do_train
--do_eval
--do_export
--model_name_or_path ernie-3.0-tiny-medium-v2-zh
--output_dir checkpoint
--device cpu
--num_train_epochs 100
--early_stopping True
--early_stopping_patience 5
--learning_rate 3e-5
--max_length 128
--per_device_eval_batch_size 32
--per_device_train_batch_size 32
--metric_for_best_model accuracy
--load_best_model_at_end
--logging_steps 5
--evaluation_strategy epoch
--save_strategy epoch
--save_total_limit 3
--enable_auto_parallel True
The text was updated successfully, but these errors were encountered: