Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "output tensor must have the same type as input tensor" error when i tried to finetune localy #669

Open
hichambht32 opened this issue Jun 6, 2024 · 0 comments

Comments

@hichambht32
Copy link

Hello everyone, i have 4 gpus RTX 3080 with 10 GiB each and im trying to fine tune mistral 7B v2.0 localy, i tried to optimize as much as i can...(Accelerate with DeepSpeed, 4bit quantization, LoRa and all that stuff) but now i am getting this input and output tensors are not of the same type error
my csv dataset has one column called Text, which includes question-answer pairs
Can you suggest a fix ?
this is the error :
"Loading extension module cpu_adam...
Time to load cpu_adam op: 2.2301530838012695 seconds
Parameter Offload: Total persistent parameters: 20189184 in 417 params
INFO | 2024-06-06 17:33:31 | autotrain.trainers.common:on_train_begin:231 - Starting to train...
0%| | 0/20 [00:00<?, ?it/s]
(myenv) rag@PC-RAG:~/finetune$ ERROR | 2024-06-06 17:34:26 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
return func(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/autotrain/trainers/clm/main.py", line 28, in train
train_sft(config)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 98, in train
trainer.train()
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2007, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 175, in backward
self.engine.step()
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2169, in step
self._take_model_step(lr_kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2075, in _take_model_step
self.optimizer.step()
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2060, in step
self._post_step(timer_names)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 1986, in _post_step
self.persistent_parameters[0].all_gather(self.persistent_parameters)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1121, in all_gather
return self._all_gather(param_list, async_op=async_op, hierarchy=hierarchy)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1465, in _all_gather
self._allgather_params_coalesced(all_gather_nonquantize_list, hierarchy, quantize=False)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1769, in _allgather_params_coalesced
h = dist.all_gather_into_tensor(allgather_params[param_idx],
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper
return func(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor
return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 213, in all_gather_into_tensor
return self.all_gather_function(output_tensor=output_tensor,
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "/home/rag/finetune/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2948, in all_gather_into_tensor
work = group._allgather_base(output_tensor, input_tensor, opts)
TypeError: output tensor must have the same type as input tensor

ERROR | 2024-06-06 17:34:26 | autotrain.trainers.common:wrapper:121 - output tensor must have the same type as input tensor"

@hichambht32 hichambht32 changed the title "output tensor must have the same type as input tensor" error when i tried to finetune localy [BUG] "output tensor must have the same type as input tensor" error when i tried to finetune localy Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant