expected scalar type Half but found Float #23

scarydemon2 · 2023-03-31T10:42:39Z

─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /export/home/gth/alpaca_lora/uniform_finetune.py:294 in │
│ │
│ 291 │ args = parser.parse_args() │
│ 292 │ print(args) │
│ 293 │ │
│ ❱ 294 │ train(args) │
│ 295 │
│ │
│ /export/home/gth/alpaca_lora/uniform_finetune.py:263 in train │
│ │
│ 260 │ if torch.version >= "2" and sys.platform != "win32": │
│ 261 │ │ model = torch.compile(model) │
│ 262 │ │
│ ❱ 263 │ trainer.train() │
│ 264 │ │
│ 265 │ model.save_pretrained(output_dir) │
│ 266 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1644 in train │
│ │
│ 1641 │ │ inner_training_loop = find_executable_batch_size( │
│ 1642 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1643 │ │ ) │
│ ❱ 1644 │ │ return inner_training_loop( │
│ 1645 │ │ │ args=args, │
│ 1646 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1647 │ │ │ trial=trial, │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1909 in │
│ _inner_training_loop │
│ │
│ 1906 │ │ │ │ ): │
│ 1907 │ │ │ │ │ # Avoid unnecessary DDP synchronization since there will be no backw │
│ 1908 │ │ │ │ │ with model.no_sync(): │
│ ❱ 1909 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1910 │ │ │ │ else: │
│ 1911 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1912 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2667 in training_step │
│ │
│ 2664 │ │ │ loss = loss / self.args.gradient_accumulation_steps │
│ 2665 │ │ │
│ 2666 │ │ if self.do_grad_scaling: │
│ ❱ 2667 │ │ │ self.scaler.scale(loss).backward() │
│ 2668 │ │ elif self.use_apex: │
│ 2669 │ │ │ with amp.scale_loss(loss, self.optimizer) as scaled_loss: │
│ 2670 │ │ │ │ scaled_loss.backward() │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/_tensor.py:488 in backward │
│ │
│ 485 │ │ │ │ create_graph=create_graph, │
│ 486 │ │ │ │ inputs=inputs, │
│ 487 │ │ │ ) │
│ ❱ 488 │ │ torch.autograd.backward( │
│ 489 │ │ │ self, gradient, retain_graph, create_graph, inputs=inputs │
│ 490 │ │ ) │
│ 491 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, grad_tensors, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, *args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, *args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/utils/checkpoint.py:157 in backward │
│ │
│ 154 │ │ │ raise RuntimeError( │
│ 155 │ │ │ │ "none of output has requires_grad=True," │
│ 156 │ │ │ │ " this checkpoint() is not necessary") │
│ ❱ 157 │ │ torch.autograd.backward(outputs_with_grad, args_with_grad) │
│ 158 │ │ grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None │
│ 159 │ │ │ │ │ for inp in detached_inputs) │
│ 160 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, grad_tensors, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, *args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, *args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/bitsandbytes/autograd/functions.py:456 in │
│ backward │
│ │
│ 453 │ │ │ │
│ 454 │ │ │ elif state.CB is not None: │
│ 455 │ │ │ │ CB = state.CB.to(ctx.dtype_A, copy=True).mul(state.SCB.unsqueeze(1).mul │
│ ❱ 456 │ │ │ │ grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype │
│ 457 │ │ │ elif state.CxB is not None: │
│ 458 │ │ │ │ │
│ 459 │ │ │ │ if state.tile_indices is None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: expected scalar type Half but found Float

scarydemon2 · 2023-03-31T10:42:51Z

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=3192 uniform_finetune.py
--data belle1m
--model_type bloom
--model_name_or_path bigscience/bloomz-7b1-mt
--lora_target_modules query_key_value
--per_gpu_train_batch_size 4
--learning_rate 3e-4
--epochs 1

scarydemon2 · 2023-03-31T10:42:59Z

what should I do

PhoebusSi · 2023-03-31T16:40:54Z

This may be caused by environmental issues, and we will quickly identify the cause and provide solutions.

PhoebusSi · 2023-03-31T17:36:07Z

We were unable to reproduce this bug. The code works correctly in A100 using the following command:

CUDA_VISIBLE_DEVICES=1,2,3,4  python3 -m torch.distributed.launch --nproc_per_node 4 --nnodes=1  \
    uniform_finetune.py   --model_type bloom --model_name_or_path bigscience/bloomz-7b1-mt \
    --data alpaca --lora_target_modules query_key_value \
    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1

or

CUDA_VISIBLE_DEVICES=0 python3  uniform_finetune.py \
--data alpaca \
--model_type bloom \
--model_name_or_path bigscience/bloomz-7b1-mt \
--lora_target_modules query_key_value \
--per_gpu_train_batch_size 4 \
--learning_rate 3e-4 \
--epochs 1

PhoebusSi · 2023-04-02T15:02:20Z

It seems that this may be caused by a low-level version of Python. Try using version 3.9 and above.

PhoebusSi closed this as completed Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expected scalar type Half but found Float #23

expected scalar type Half but found Float #23

scarydemon2 commented Mar 31, 2023

scarydemon2 commented Mar 31, 2023

scarydemon2 commented Mar 31, 2023

PhoebusSi commented Mar 31, 2023

PhoebusSi commented Mar 31, 2023

PhoebusSi commented Apr 2, 2023

expected scalar type Half but found Float #23

expected scalar type Half but found Float #23

Comments

scarydemon2 commented Mar 31, 2023

scarydemon2 commented Mar 31, 2023

scarydemon2 commented Mar 31, 2023

PhoebusSi commented Mar 31, 2023

PhoebusSi commented Mar 31, 2023

PhoebusSi commented Apr 2, 2023