How to load the largest models? #53

jeffhernandez1995 · 2024-06-05T23:30:03Z

Loading lmms-lab/llava-next-72b and lmms-lab/llava-next-110b with the device_map='auto' does not seems to work and results in NotImplementedError: Cannot copy out of meta tensor; no data!, even though I am trying to load the model on 8-40GB GPUs. Is there a minimal example to do inference on the largest models?

The text was updated successfully, but these errors were encountered:

claudius-kienle · 2024-06-06T14:06:57Z

What worked for me was to manually set the device_map like following

    device = "cuda"
    device_map = {
        "model.vision_tower": "cuda:1",
        "model.vision_resampler": "cuda:1",
        "model.mm_projector": "cuda:1",
        "model.norm": "cuda:1",
        "model.image_newline": "cuda:1",
        "model.embed_tokens": "cuda:1",
        "lm_head": "cuda:1",
    }
    for i in range(0, 40):
        device_map["model.layers.%d" % i] = "cuda:1"
    for i in range(40, 81):
        device_map["model.layers.%d" % i] = "cuda:2"

This loads half of the Qwen LLM on gpu1 and the other half on gpu2

jeffhernandez1995 · 2024-06-06T16:53:32Z

Thank you for your answer. I am doing this:

if '72b' in model_pth:
      device = 'cuda'
      device_map = {
          "model.vision_tower": "cuda:0",
          "model.vision_resampler": "cuda:0",
          "model.mm_projector": "cuda:0",
          "model.norm": "cuda:0",
          "model.image_newline": "cuda:0",
          "model.embed_tokens": "cuda:0",
          "lm_head": "cuda:0",
      }
      for i in range(0, 27):
          device_map["model.layers.%d" % i] = "cuda:0"
      for i in range(27, 54):
          device_map["model.layers.%d" % i] = "cuda:1"
      for i in range(54, 81):
          device_map["model.layers.%d" % i] = "cuda:2"
  else:
      device = 'auto'
      device_map = 'auto'
  self.tokenizer, self.model, self.image_processor, self.context_len = load_pretrained_model(
      model_path=model_pth,
      model_base=None,
      model_name=model_name,
      # device=device,
      device_map=device_map,
      **llava_model_args
  )

But it seems that the code ignores that and is still trying to load the whole model in a single GPU.

jeffhernandez1995 · 2024-06-06T18:29:46Z

I managed it make it run by directly calling the class instead of using the load_pretrained_model function:

tokenizer = AutoTokenizer.from_pretrained(model_pth, use_fast=False)

model = LlavaQwenForCausalLM.from_pretrained(
    model_pth,
    low_cpu_mem_usage=True,
    attn_implementation="flash_attention_2",
    torch_dtype=TORCH_TYPE,
    device_map='auto',
)

jacktkk · 2024-08-20T05:16:54Z

What worked for me was to manually set the device_map like following

Hi, in the case of model segmentation, which device should the data be placed on? I'm getting the following error.

Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!

ChipsICU · 2024-08-30T09:20:50Z

What worked for me was to manually set the device_map like following

Hi, in the case of model segmentation, which device should the data be placed on? I'm getting the following error.

Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!

for me, use default code is ok, remember not set CUDA_VISIBLE_DEVICES

zshyang · 2024-10-05T06:47:03Z

What worked for me was to manually set the device_map like following

Hi, in the case of model segmentation, which device should the data be placed on? I'm getting the following error.

Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!

I have the same issue. Is your quetion get resolved in the end?

jeffhernandez1995 closed this as completed Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load the largest models? #53

How to load the largest models? #53

jeffhernandez1995 commented Jun 5, 2024

claudius-kienle commented Jun 6, 2024

jeffhernandez1995 commented Jun 6, 2024

jeffhernandez1995 commented Jun 6, 2024

jacktkk commented Aug 20, 2024

ChipsICU commented Aug 30, 2024

zshyang commented Oct 5, 2024

How to load the largest models? #53

How to load the largest models? #53

Comments

jeffhernandez1995 commented Jun 5, 2024

claudius-kienle commented Jun 6, 2024

jeffhernandez1995 commented Jun 6, 2024

jeffhernandez1995 commented Jun 6, 2024

jacktkk commented Aug 20, 2024

ChipsICU commented Aug 30, 2024

zshyang commented Oct 5, 2024