Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #176

gody7334 · 2024-04-26T14:20:19Z

Notebook to reproduce
Please use GPU runtime and setup accelerate config

When I use a quantize models, it has this error:

Traceback (most recent call last):
File "/content/./lighteval/run_evals_accelerate.py", line 82, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/lighteval/logging/hierarchical_logger.py", line 166, in wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lighteval/main_accelerate.py", line 111, in main
evaluation_tracker = evaluate(
File "/usr/local/lib/python3.10/dist-packages/lighteval/evaluator.py", line 86, in evaluate
full_resps = lm.greedy_until(requests, override_bs=override_bs)
File "/usr/local/lib/python3.10/dist-packages/lighteval/models/base_model.py", line 594, in greedy_until
cur_reponses = self._generate(
File "/usr/local/lib/python3.10/dist-packages/lighteval/models/base_model.py", line 617, in _generate
outputs = self.model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1576, in generate
result = self._greedy_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2494, in _greedy_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 1158, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 987, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1075, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 681, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', './lighteval/run_evals_accelerate.py', '--model_args', 'pretrained=TheBloke/Mistral-7B-Instruct-v0.2-GPTQ', '--tasks', 'leaderboard|gsm8k|0|0', '--override_batch_size', '1', '--output_dir=./evals/']' returned non-zero exit status 1.

Anything I did wrong?
Thanks for your help

gody7334 · 2024-04-30T16:05:42Z

I can resolve the issue for GPTQ model by adding .cuda() in

main_accelerate.py (77) after load_model()

`
 with htrack_block("Model loading"): 
        with accelerator.main_process_first() if accelerator is not None else nullcontext():
            model, model_info = load_model(config=model_config, env_config=env_config)
            # for i in model.model.named_parameters():
                # print(f"{i[0]} -> {i[1].device}")
            # import ipdb; from pprint import pprint as pp; ipdb.set_trace();
            model.model.cuda()
            evaluation_tracker.general_config_logger.log_model_info(model_info)
`

But I don't think its a proper way to resolve this issue,
If anyone can have a look why quantized model is stayed on CPU, not moved to GPU during load_model procedure
That will be very helpful

The models passed using above fix:
TechxGenus/Meta-Llama-3-8B-GPTQ
TheBloke/Mistral-7B-Instruct-v0.2-GPTQ

The models doesn't pass:
01-ai/Yi-6B-Chat-4bits

NathanHB · 2024-05-12T13:21:44Z

hi ! thanks for your interest in the lighteval and sorry for the delayed answer ! just to be sure, the model that does not pass usually does but does not work anymore when uadding your fix ?
For the reason, i would guess that GPTQ models are loaded differenlty and there is a bug where they are simply not loaded to GPU.
also, how many GPUs do you have available in your collab notebook ?

gody7334 · 2024-05-12T15:06:57Z

Hi NathanHB
Thanks for the reply,

NO, I don't have any problem with normal models, as the added code simply push model into GPU
I only have single GPU, and collab is single GPU(T4) setting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #176

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #176

gody7334 commented Apr 26, 2024 •

edited

gody7334 commented Apr 30, 2024 •

edited

NathanHB commented May 12, 2024

gody7334 commented May 12, 2024

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #176

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #176

Comments

gody7334 commented Apr 26, 2024 • edited

gody7334 commented Apr 30, 2024 • edited

NathanHB commented May 12, 2024

gody7334 commented May 12, 2024

gody7334 commented Apr 26, 2024 •

edited

gody7334 commented Apr 30, 2024 •

edited