Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run 'inference.py' and 'model parallel group is not initialized' #86

Open
ildartregulov opened this issue Apr 20, 2023 · 7 comments
Open

Comments

@ildartregulov
Copy link

ildartregulov commented Apr 20, 2023

~/GPT/pyllama_data/pyllama$ python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
Traceback (most recent call last):
  File "/home/ildar/GPT/pyllama_data/pyllama/inference.py", line 82, in <module>
    run(
  File "/home/ildar/GPT/pyllama_data/pyllama/inference.py", line 50, in run
    generator = load(
  File "/home/ildar/GPT/pyllama_data/pyllama/inference.py", line 33, in load
    model = Transformer(model_args)
  File "/home/ildar/GPT/pyllama_data/pyllama/llama/model_parallel.py", line 217, in __init__
    self.tok_embeddings = ParallelEmbedding(
  File "/home/ildar/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 186, in __init__
    world_size = get_model_parallel_world_size()
  File "/home/ildar/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/initialize.py", line 152, in get_model_parallel_world_size
    return torch.distributed.get_world_size(group=get_model_parallel_group())
  File "/home/ildar/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/initialize.py", line 128, in get_model_parallel_group
    assert _MODEL_PARALLEL_GROUP is not None, "model parallel group is not initialized"
AssertionError: model parallel group is not initialized

I use 2 nvidia 1080ti and try to start 7B model

@wangshuaiwu
Copy link

wangshuaiwu commented Apr 26, 2023

Me too,I was able to run it before, and today I took it out and ran it again, and there was this problem

@JunLiangZ
Copy link

Have you solved it? I have the same problem

@wangshuaiwu
Copy link

wangshuaiwu commented May 15, 2023

My solution was to compare it with the official code and make a change. Here is the official link
https://github.com/facebookresearch/llama

@JunLiangZ
Copy link

JunLiangZ commented May 16, 2023 via email

@wangshuaiwu
Copy link

The 7B model came out and answered really strange, you can try to run the larger model

@JunLiangZ
Copy link

JunLiangZ commented May 16, 2023 via email

@tbaggu
Copy link

tbaggu commented Feb 21, 2024

check for this env PYLLAMA_META_MP if its not set then it should work without model parallel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants