Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expected scalar type Half but found Float #23

Closed
scarydemon2 opened this issue Mar 31, 2023 · 5 comments
Closed

expected scalar type Half but found Float #23

scarydemon2 opened this issue Mar 31, 2023 · 5 comments

Comments

@scarydemon2
Copy link

─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /export/home/gth/alpaca_lora/uniform_finetune.py:294 in │
│ │
│ 291 │ args = parser.parse_args() │
│ 292 │ print(args) │
│ 293 │ │
│ ❱ 294 │ train(args) │
│ 295 │
│ │
│ /export/home/gth/alpaca_lora/uniform_finetune.py:263 in train │
│ │
│ 260 │ if torch.version >= "2" and sys.platform != "win32": │
│ 261 │ │ model = torch.compile(model) │
│ 262 │ │
│ ❱ 263 │ trainer.train() │
│ 264 │ │
│ 265 │ model.save_pretrained(output_dir) │
│ 266 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1644 in train │
│ │
│ 1641 │ │ inner_training_loop = find_executable_batch_size( │
│ 1642 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1643 │ │ ) │
│ ❱ 1644 │ │ return inner_training_loop( │
│ 1645 │ │ │ args=args, │
│ 1646 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1647 │ │ │ trial=trial, │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1909 in │
│ _inner_training_loop │
│ │
│ 1906 │ │ │ │ ): │
│ 1907 │ │ │ │ │ # Avoid unnecessary DDP synchronization since there will be no backw │
│ 1908 │ │ │ │ │ with model.no_sync(): │
│ ❱ 1909 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1910 │ │ │ │ else: │
│ 1911 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1912 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2667 in training_step │
│ │
│ 2664 │ │ │ loss = loss / self.args.gradient_accumulation_steps │
│ 2665 │ │ │
│ 2666 │ │ if self.do_grad_scaling: │
│ ❱ 2667 │ │ │ self.scaler.scale(loss).backward() │
│ 2668 │ │ elif self.use_apex: │
│ 2669 │ │ │ with amp.scale_loss(loss, self.optimizer) as scaled_loss: │
│ 2670 │ │ │ │ scaled_loss.backward() │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/_tensor.py:488 in backward │
│ │
│ 485 │ │ │ │ create_graph=create_graph, │
│ 486 │ │ │ │ inputs=inputs, │
│ 487 │ │ │ ) │
│ ❱ 488 │ │ torch.autograd.backward( │
│ 489 │ │ │ self, gradient, retain_graph, create_graph, inputs=inputs │
│ 490 │ │ ) │
│ 491 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, grad_tensors
, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, *args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, *args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/utils/checkpoint.py:157 in backward │
│ │
│ 154 │ │ │ raise RuntimeError( │
│ 155 │ │ │ │ "none of output has requires_grad=True," │
│ 156 │ │ │ │ " this checkpoint() is not necessary") │
│ ❱ 157 │ │ torch.autograd.backward(outputs_with_grad, args_with_grad) │
│ 158 │ │ grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None │
│ 159 │ │ │ │ │ for inp in detached_inputs) │
│ 160 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, grad_tensors
, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, *args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, *args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/bitsandbytes/autograd/functions.py:456 in │
│ backward │
│ │
│ 453 │ │ │ │
│ 454 │ │ │ elif state.CB is not None: │
│ 455 │ │ │ │ CB = state.CB.to(ctx.dtype_A, copy=True).mul
(state.SCB.unsqueeze(1).mul │
│ ❱ 456 │ │ │ │ grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype │
│ 457 │ │ │ elif state.CxB is not None: │
│ 458 │ │ │ │ │
│ 459 │ │ │ │ if state.tile_indices is None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: expected scalar type Half but found Float

@scarydemon2
Copy link
Author

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=3192 uniform_finetune.py
--data belle1m
--model_type bloom
--model_name_or_path bigscience/bloomz-7b1-mt
--lora_target_modules query_key_value
--per_gpu_train_batch_size 4
--learning_rate 3e-4
--epochs 1

@scarydemon2
Copy link
Author

what should I do

@PhoebusSi
Copy link
Owner

This may be caused by environmental issues, and we will quickly identify the cause and provide solutions.

@PhoebusSi
Copy link
Owner

We were unable to reproduce this bug. The code works correctly in A100 using the following command:

CUDA_VISIBLE_DEVICES=1,2,3,4  python3 -m torch.distributed.launch --nproc_per_node 4 --nnodes=1  \
    uniform_finetune.py   --model_type bloom --model_name_or_path bigscience/bloomz-7b1-mt \
    --data alpaca --lora_target_modules query_key_value \
    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1  

or

CUDA_VISIBLE_DEVICES=0 python3  uniform_finetune.py \
--data alpaca \
--model_type bloom \
--model_name_or_path bigscience/bloomz-7b1-mt \
--lora_target_modules query_key_value \
--per_gpu_train_batch_size 4 \
--learning_rate 3e-4 \
--epochs 1

@PhoebusSi
Copy link
Owner

It seems that this may be caused by a low-level version of Python. Try using version 3.9 and above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants