-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to load the largest models? #53
Comments
What worked for me was to manually set the device_map like following device = "cuda"
device_map = {
"model.vision_tower": "cuda:1",
"model.vision_resampler": "cuda:1",
"model.mm_projector": "cuda:1",
"model.norm": "cuda:1",
"model.image_newline": "cuda:1",
"model.embed_tokens": "cuda:1",
"lm_head": "cuda:1",
}
for i in range(0, 40):
device_map["model.layers.%d" % i] = "cuda:1"
for i in range(40, 81):
device_map["model.layers.%d" % i] = "cuda:2" This loads half of the Qwen LLM on gpu1 and the other half on gpu2 |
Thank you for your answer. I am doing this:
But it seems that the code ignores that and is still trying to load the whole model in a single GPU. |
I managed it make it run by directly calling the class instead of using the
|
Hi, in the case of model segmentation, which device should the data be placed on? I'm getting the following error. Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! |
for me, use default code is ok, remember not set CUDA_VISIBLE_DEVICES |
I have the same issue. Is your quetion get resolved in the end? |
Loading
lmms-lab/llava-next-72b
andlmms-lab/llava-next-110b
with the device_map='auto' does not seems to work and results inNotImplementedError: Cannot copy out of meta tensor; no data!
, even though I am trying to load the model on 8-40GB GPUs. Is there a minimal example to do inference on the largest models?The text was updated successfully, but these errors were encountered: