Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeErrorRuntimeError: : Inplace update to inference tensor outside InferenceMode is not allowed when generating using 13B on two GPUs #180

Closed
Pliman opened this issue Mar 11, 2023 · 1 comment

Comments

@Pliman
Copy link

Pliman commented Mar 11, 2023

Hello,I downloaded the code, when I running mp=1 size=7B,my command is
torchrun --nproc_per_node 1 example.py --ckpt_dir D:/llama/7B --tokenizer_path D:/llama/tokenizer.model
it works well

But when I change to mp=2 size=13B, with command
torchrun --nproc_per_node 2 example.py --ckpt_dir D:/llama/13B --tokenizer_path D:/llama/tokenizer.model
the model loaded correctly into 2 GPUs, but when generating, there is an error:
Traceback (most recent call last): File "example.py", line 165, in <module> fire.Fire(main) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 480, in _Fire target=component.__name__) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "example.py", line 160, in main [prompt], max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, top_k=top_k, repetition_penalty=repetition_penalty, token_callback=callback, File "D:\LLaMA\llama\llama\generation.py", line 46, in generate logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\LLaMA\llama\llama\model.py", line 225, in forward h = self.tok_embeddings(tokens) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\layers.py", line 214, in forward output = gather_from_model_parallel_region(output_parallel) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 156, in gather_from_model_parallel_region return _GatherFromModelParallelRegion.apply(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 131, in forward return _gather(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 82, in _gather torch.distributed.all_gather(tensor_list, input_, group=group) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\distributed\distributed_c10d.py", line 2282, in all_gather work.wait() RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.

I found the same error in stackoverflow:
https://stackoverflow.com/questions/71223747/pytorch-error-when-modifying-unpacked-tensor

I changed
logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
into
temp = tokens[:, prev_pos:cur_pos].clone()
logits = self.model.forward(temp, prev_pos)
But the problem remains

environment:
intel i7 8700k
32gb of ram
2 tesla P40 GPUs(24GB video memory each)
win11 22H2

conda version: 4.5.11
python version: 3.7
torch version: 1.13.1+cu117
cuda version: 11.7

@Pliman
Copy link
Author

Pliman commented Mar 12, 2023

I resolved it by remove "@torch.inference_mode()" in model.py(Line 222)

Accoding to search engine:
"The problem is caused by trying to update an inference tensor (a tensor created in InferenceMode) outside of InferenceMode. This is not allowed because InferenceMode reduces the overhead of Autograd by disabling some mechanisms, such as version counting and metadata tracking, on inference tensors1. You can make a clone of the inference tensor to get a normal tensor before doing inplace update"

I tried to clone tokens in
h = self.tok_embeddings(tokens)
It doesn't resolve the problem, so I have to remove inference_mode, it's slower, but acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant