Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Error on LLaMA-2-Chat + LocalRunner + Debug #179

Closed
Leymore opened this issue Aug 9, 2023 · 0 comments · Fixed by #238
Closed

[Bug] Error on LLaMA-2-Chat + LocalRunner + Debug #179

Leymore opened this issue Aug 9, 2023 · 0 comments · Fixed by #238
Assignees

Comments

@Leymore
Copy link
Collaborator

Leymore commented Aug 9, 2023

Describe the bug

configs/eval_llama2_7b_chat.py

from mmengine.config import read_base

with read_base():
    from .datasets.piqa.piqa_ppl import piqa_datasets
    from .datasets.siqa.siqa_gen import siqa_datasets
    from .models.llama2_7b_chat import models


datasets = [*piqa_datasets, *siqa_datasets]

run command

python3 run.py configs/eval_llama2_7b_chat.py --debug

gives output

Traceback (most recent call last):
  File "/cpfs01/user/zhoufengzhe/anaconda3/envs/pjeval-deploy/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 122, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/cpfs01/user/zhoufengzhe/repos/pjeval/opencompass/models/llama2.py", line 36, in __init__
    self._load_model(path=path,
  File "/cpfs01/user/zhoufengzhe/repos/pjeval/opencompass/models/llama2.py", line 50, in _load_model
    self.generator = Llama.build(path, tokenizer_path, max_seq_len,
  File "/cpfs01/user/zhoufengzhe/llama/llama/generation.py", line 62, in build
    torch.distributed.init_process_group("nccl")
  File "/cpfs01/user/zhoufengzhe/anaconda3/envs/pjeval-deploy/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 754, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/cpfs01/user/zhoufengzhe/anaconda3/envs/pjeval-deploy/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 236, in _env_rendezvous_handler
    rank = int(_get_env_or_raise("RANK"))
  File "/cpfs01/user/zhoufengzhe/anaconda3/envs/pjeval-deploy/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 221, in _get_env_or_raise
    raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 339, in <module>
    main()
  File "run.py", line 214, in main
    exec_infer_runner(tasks, args, cfg)
  File "run.py", line 309, in exec_infer_runner
    runner(tasks)
  File "/cpfs01/user/zhoufengzhe/repos/pjeval/opencompass/runners/base.py", line 38, in __call__
    status = self.launch(tasks)
  File "/cpfs01/user/zhoufengzhe/repos/pjeval/opencompass/runners/local.py", line 57, in launch
    task.run()
  File "/cpfs01/user/zhoufengzhe/repos/pjeval/opencompass/tasks/openicl_infer.py", line 60, in run
    self.model = build_model_from_cfg(model_cfg)
  File "/cpfs01/user/zhoufengzhe/repos/pjeval/opencompass/utils/build.py", line 22, in build_model_from_cfg
    return MODELS.build(model_cfg)
  File "/cpfs01/user/zhoufengzhe/anaconda3/envs/pjeval-deploy/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/cpfs01/user/zhoufengzhe/anaconda3/envs/pjeval-deploy/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 144, in build_from_cfg
    raise type(e)(
ValueError: class `Llama2Chat` in opencompass/models/llama2.py: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

This is caused by direct use of task.run() instead of the 'build command - run the command in subprocess' process.

https://github.com/InternLM/opencompass/blob/e6194df29ef174737f09498169513554b2737dd2/opencompass/runners/local.py#L54-L58

Environment

Nah

Other information

No response

@Leymore Leymore assigned Leymore and unassigned yingfhu Aug 10, 2023
@gaotongxiao gaotongxiao self-assigned this Aug 21, 2023
@gaotongxiao gaotongxiao linked a pull request Aug 21, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants