Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load the largest models? #53

Closed
jeffhernandez1995 opened this issue Jun 5, 2024 · 6 comments
Closed

How to load the largest models? #53

jeffhernandez1995 opened this issue Jun 5, 2024 · 6 comments

Comments

@jeffhernandez1995
Copy link

Loading lmms-lab/llava-next-72b and lmms-lab/llava-next-110b with the device_map='auto' does not seems to work and results in NotImplementedError: Cannot copy out of meta tensor; no data!, even though I am trying to load the model on 8-40GB GPUs. Is there a minimal example to do inference on the largest models?

@claudius-kienle
Copy link

What worked for me was to manually set the device_map like following

    device = "cuda"
    device_map = {
        "model.vision_tower": "cuda:1",
        "model.vision_resampler": "cuda:1",
        "model.mm_projector": "cuda:1",
        "model.norm": "cuda:1",
        "model.image_newline": "cuda:1",
        "model.embed_tokens": "cuda:1",
        "lm_head": "cuda:1",
    }
    for i in range(0, 40):
        device_map["model.layers.%d" % i] = "cuda:1"
    for i in range(40, 81):
        device_map["model.layers.%d" % i] = "cuda:2"

This loads half of the Qwen LLM on gpu1 and the other half on gpu2

@jeffhernandez1995
Copy link
Author

Thank you for your answer. I am doing this:

if '72b' in model_pth:
      device = 'cuda'
      device_map = {
          "model.vision_tower": "cuda:0",
          "model.vision_resampler": "cuda:0",
          "model.mm_projector": "cuda:0",
          "model.norm": "cuda:0",
          "model.image_newline": "cuda:0",
          "model.embed_tokens": "cuda:0",
          "lm_head": "cuda:0",
      }
      for i in range(0, 27):
          device_map["model.layers.%d" % i] = "cuda:0"
      for i in range(27, 54):
          device_map["model.layers.%d" % i] = "cuda:1"
      for i in range(54, 81):
          device_map["model.layers.%d" % i] = "cuda:2"
  else:
      device = 'auto'
      device_map = 'auto'
  self.tokenizer, self.model, self.image_processor, self.context_len = load_pretrained_model(
      model_path=model_pth,
      model_base=None,
      model_name=model_name,
      # device=device,
      device_map=device_map,
      **llava_model_args
  )

But it seems that the code ignores that and is still trying to load the whole model in a single GPU.

@jeffhernandez1995
Copy link
Author

I managed it make it run by directly calling the class instead of using the load_pretrained_model function:

tokenizer = AutoTokenizer.from_pretrained(model_pth, use_fast=False)

model = LlavaQwenForCausalLM.from_pretrained(
    model_pth,
    low_cpu_mem_usage=True,
    attn_implementation="flash_attention_2",
    torch_dtype=TORCH_TYPE,
    device_map='auto',
)

@jacktkk
Copy link

jacktkk commented Aug 20, 2024

What worked for me was to manually set the device_map like following

Hi, in the case of model segmentation, which device should the data be placed on? I'm getting the following error.

Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!

@ChipsICU
Copy link

What worked for me was to manually set the device_map like following

Hi, in the case of model segmentation, which device should the data be placed on? I'm getting the following error.

Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!

for me, use default code is ok, remember not set CUDA_VISIBLE_DEVICES

@zshyang
Copy link

zshyang commented Oct 5, 2024

What worked for me was to manually set the device_map like following

Hi, in the case of model segmentation, which device should the data be placed on? I'm getting the following error.

Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!

I have the same issue. Is your quetion get resolved in the end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants