RuntimeErrorRuntimeError: : Inplace update to inference tensor outside InferenceMode is not allowed when generating using 13B on two GPUs #180

Pliman · 2023-03-11T14:40:01Z

Hello，I downloaded the code, when I running mp=1 size=7B，my command is
torchrun --nproc_per_node 1 example.py --ckpt_dir D:/llama/7B --tokenizer_path D:/llama/tokenizer.model
it works well

But when I change to mp=2 size=13B, with command
torchrun --nproc_per_node 2 example.py --ckpt_dir D:/llama/13B --tokenizer_path D:/llama/tokenizer.model
the model loaded correctly into 2 GPUs, but when generating, there is an error:
Traceback (most recent call last): File "example.py", line 165, in <module> fire.Fire(main) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 480, in _Fire target=component.__name__) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "example.py", line 160, in main [prompt], max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, top_k=top_k, repetition_penalty=repetition_penalty, token_callback=callback, File "D:\LLaMA\llama\llama\generation.py", line 46, in generate logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\LLaMA\llama\llama\model.py", line 225, in forward h = self.tok_embeddings(tokens) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\layers.py", line 214, in forward output = gather_from_model_parallel_region(output_parallel) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 156, in gather_from_model_parallel_region return _GatherFromModelParallelRegion.apply(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 131, in forward return _gather(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 82, in _gather torch.distributed.all_gather(tensor_list, input_, group=group) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\distributed\distributed_c10d.py", line 2282, in all_gather work.wait() RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.

I found the same error in stackoverflow：
https://stackoverflow.com/questions/71223747/pytorch-error-when-modifying-unpacked-tensor

I changed
logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
into
temp = tokens[:, prev_pos:cur_pos].clone()
logits = self.model.forward(temp, prev_pos)
But the problem remains

environment:
intel i7 8700k
32gb of ram
2 tesla P40 GPUs(24GB video memory each)
win11 22H2

conda version: 4.5.11
python version: 3.7
torch version: 1.13.1+cu117
cuda version: 11.7

The text was updated successfully, but these errors were encountered:

Pliman · 2023-03-12T00:57:27Z

I resolved it by remove "@torch.inference_mode()" in model.py(Line 222)

Accoding to search engine:
"The problem is caused by trying to update an inference tensor (a tensor created in InferenceMode) outside of InferenceMode. This is not allowed because InferenceMode reduces the overhead of Autograd by disabling some mechanisms, such as version counting and metadata tracking, on inference tensors 1. You can make a clone of the inference tensor to get a normal tensor before doing inplace update"

I tried to clone tokens in
h = self.tok_embeddings(tokens)
It doesn't resolve the problem, so I have to remove inference_mode, it's slower, but acceptable.

Pliman closed this as completed Mar 12, 2023

Pliman mentioned this issue Mar 12, 2023

RuntimeError about inplace update when loading >7B model on cpus #159

Closed

haileyschoelkopf mentioned this issue Dec 8, 2023

Distributed VLLM on H100 RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed EleutherAI/lm-evaluation-harness#1079

Closed

baifanxxx mentioned this issue Dec 8, 2023

Speed Issues with Local Inference of llama2-70B-chat Model #957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeErrorRuntimeError: : Inplace update to inference tensor outside InferenceMode is not allowed when generating using 13B on two GPUs #180

RuntimeErrorRuntimeError: : Inplace update to inference tensor outside InferenceMode is not allowed when generating using 13B on two GPUs #180

Pliman commented Mar 11, 2023

Pliman commented Mar 12, 2023

RuntimeErrorRuntimeError: : Inplace update to inference tensor outside InferenceMode is not allowed when generating using 13B on two GPUs #180

RuntimeErrorRuntimeError: : Inplace update to inference tensor outside InferenceMode is not allowed when generating using 13B on two GPUs #180

Comments

Pliman commented Mar 11, 2023

Pliman commented Mar 12, 2023