Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: get_rank_by_dim_and_process_id 函数未实现 #8428

Closed
1 task done
jazzly opened this issue May 13, 2024 · 2 comments
Closed
1 task done

[Bug]: get_rank_by_dim_and_process_id 函数未实现 #8428

jazzly opened this issue May 13, 2024 · 2 comments
Assignees
Labels
bug Something isn't working stale

Comments

@jazzly
Copy link

jazzly commented May 13, 2024

软件环境

- paddlepaddle: 2.6.1
- paddlepaddle-gpu: 
- paddlenlp: 2.8.0

重复问题

  • I have searched the existing issues

错误描述

使用如上版本训练 https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_classification/multi_class#readme
示例中的训练数据时,使用CPU模式时由于默认命令只使用单线程训练。想加快训练进程,查看了有一个enable_auto_parallel参数,当把这个 enable_auto_parallel 置为True时,启动训练会报get_rank_by_dim_and_process_id 函数找不到。

Traceback (most recent call last):
  File "train.py", line 230, in <module>
    main()
  File "train.py", line 166, in main
    trainer = Trainer(
  File "/home/user/.local/lib/python3.8/site-packages/paddlenlp/trainer/trainer.py", line 388, in __init__
    self.print_config()
  File "/home/user/.local/lib/python3.8/site-packages/paddlenlp/trainer/trainer.py", line 3058, in print_config
    v = getattr(args, a)
  File "/home/user/.local/lib/python3.8/site-packages/paddlenlp/trainer/training_args.py", line 1524, in data_parallel_rank
    return mesh.get_rank_by_dim_and_process_id("dp", dist.get_rank())
AttributeError: 'ProcessMesh' object has no attribute 'get_rank_by_dim_and_process_id'

稳定复现步骤 & 代码

训练时启用 enable_auto_parallel参数
python3 train.py
--do_train
--do_eval
--do_export
--model_name_or_path ernie-3.0-tiny-medium-v2-zh
--output_dir checkpoint
--device cpu
--num_train_epochs 100
--early_stopping True
--early_stopping_patience 5
--learning_rate 3e-5
--max_length 128
--per_device_eval_batch_size 32
--per_device_train_batch_size 32
--metric_for_best_model accuracy
--load_best_model_at_end
--logging_steps 5
--evaluation_strategy epoch
--save_strategy epoch
--save_total_limit 3
--enable_auto_parallel True

Copy link

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Jul 13, 2024
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants