Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart #22591

Merged
merged 2 commits into from
Apr 5, 2023

Conversation

kausmeows
Copy link
Contributor

What does this PR do?

As suggested in the #22561 moving the labels to the same device as the logits they are compared to for bart and gpt-2 models

This action has been referred to from #22535

lm_logits = self.lm_head(outputs[0])
lm_logits = lm_logits + self.final_logits_bias.to(lm_logits.device)

masked_lm_loss = None
if labels is not None:
    labels = labels.to(lm_logits.device)
    loss_fct = CrossEntropyLoss()
    masked_lm_loss = loss_fct(lm_logits.view(-1, self.config.vocab_size), labels.view(-1))

cc @sgugger could you review this once.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 5, 2023

The documentation is not available anymore as the PR was closed or merged.

@sgugger
Copy link
Collaborator

sgugger commented Apr 5, 2023

Thanks a lot for your PR! Could you apply make fix-copies so that the models copied from BART or GPT-2 are auto-updated?

@kausmeows
Copy link
Contributor Author

Thanks a lot for your PR! Could you apply make fix-copies so that the models copied from BART or GPT-2 are auto-updated?

Hi, just did that!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@sgugger sgugger merged commit 1564189 into huggingface:main Apr 5, 2023
4 checks passed
@kausmeows
Copy link
Contributor Author

Thanks a lot!

All good! ✨

@kausmeows kausmeows deleted the kaus branch April 5, 2023 18:40
@innat
Copy link

innat commented Apr 5, 2023

Hi, @kaustubh-s1, does this change will fix model parallel for gpt2? I've just tried but got

 File "/opt/conda/envs/gpt_neox/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

P.S. my setup is almost same like this, only the following differences

def get_parallel_model(model_name):
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map='auto',
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True
    )

    # 
    # setattr(model, 'model_parallel', True)
    # setattr(model, 'is_parallelizable', True)

    setattr(model, 'gradient_checkpointing', True)
    return model

@kausmeows
Copy link
Contributor Author

Hi, @kaustubh-s1, does this change will fix model parallel for gpt2? I've just tried but got

 File "/opt/conda/envs/gpt_neox/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

P.S. my setup is almost same like this, only the following differences

def get_parallel_model(model_name):
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map='auto',
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True
    )

    # 
    # setattr(model, 'model_parallel', True)
    # setattr(model, 'is_parallelizable', True)

    setattr(model, 'gradient_checkpointing', True)
    return model

Hi @innat. It should do that ig. But I do not have a multi gpu setup so can't say for sure. I just followed the steps #22535 to move labels to same device as logits. Theoretically speaking, it should work.

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants