when `per_device_eval_batch_size` > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

jiahuanluo · 2023-08-04T03:14:36Z

RuntimeError: Tensors must be contiguous occurs when per_device_eval_batch_size > 1
cmd:
deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port $MASTER_PORT src/train_bash.py \ --stage sft \ --model_name_or_path THUDM/chatglm2-6b \ --checkpoint_dir ${CHECKPOINT} \ --do_predict \ --dataset dev_data\ --overwrite_cache \ --finetuning_type lora \ --output_dir ${CHECKPOINT}/predict \ --overwrite_cache \ --per_device_eval_batch_size 4 \ --max_source_length 1024 \ --max_target_length 128 \ --max_samples 1000 \ --predict_with_generate \ --plot_loss \ --fp16

The text was updated successfully, but these errors were encountered:

kuailehaha · 2023-08-08T15:03:55Z

我在ChatGLM和LLaMA的efficient_tuning下,用deepspeed和accelerate的多卡infer同报错
accelerate launch
./LLaMA-Efficient-Tuning/src/train_bash.py
--max_samples 50
--model_name_or_path "Llama-2-13B-fp16/"
--do_predict
--dataset alpaca_zh
--dataset_dir "LLaMA-Efficient-Tuning/data"
--finetuning_type lora
--output_dir Efficient_Tuning/llama2-13b
--per_device_eval_batch_size 4
--predict_with_generate
--fp16
蹲解决方法

hiyouga · 2023-08-08T16:04:19Z

目前 do_predict 仅支持单卡

jiahuanluo · 2023-08-09T02:42:37Z

LLaMA-Efficient-Tuning 之前是可以的，代码越更新越多bug了

kuailehaha · 2023-08-09T14:32:11Z

是这样的 6月份的版本可以accelerate launch多卡

hiyouga · 2023-08-09T14:43:24Z

@kuailehaha pull 最新的代码试一下

hiyouga added the pending This problem is yet to be addressed. label Aug 4, 2023

hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels Aug 9, 2023

hiyouga closed this as completed Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when `per_device_eval_batch_size` > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

when `per_device_eval_batch_size` > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

jiahuanluo commented Aug 4, 2023

kuailehaha commented Aug 8, 2023 •

edited

hiyouga commented Aug 8, 2023

jiahuanluo commented Aug 9, 2023

kuailehaha commented Aug 9, 2023

hiyouga commented Aug 9, 2023

when per_device_eval_batch_size > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

when per_device_eval_batch_size > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

Comments

jiahuanluo commented Aug 4, 2023

kuailehaha commented Aug 8, 2023 • edited

hiyouga commented Aug 8, 2023

jiahuanluo commented Aug 9, 2023

kuailehaha commented Aug 9, 2023

hiyouga commented Aug 9, 2023

when `per_device_eval_batch_size` > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

when `per_device_eval_batch_size` > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

kuailehaha commented Aug 8, 2023 •

edited