Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

when per_device_eval_batch_size > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

Closed
jiahuanluo opened this issue Aug 4, 2023 · 5 comments
Labels
solved This problem has been already solved.

Comments

@jiahuanluo
Copy link
Contributor

RuntimeError: Tensors must be contiguous occurs when per_device_eval_batch_size > 1
cmd:
deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port $MASTER_PORT src/train_bash.py \ --stage sft \ --model_name_or_path THUDM/chatglm2-6b \ --checkpoint_dir ${CHECKPOINT} \ --do_predict \ --dataset dev_data\ --overwrite_cache \ --finetuning_type lora \ --output_dir ${CHECKPOINT}/predict \ --overwrite_cache \ --per_device_eval_batch_size 4 \ --max_source_length 1024 \ --max_target_length 128 \ --max_samples 1000 \ --predict_with_generate \ --plot_loss \ --fp16

@hiyouga hiyouga added the pending This problem is yet to be addressed. label Aug 4, 2023
@kuailehaha
Copy link

kuailehaha commented Aug 8, 2023

我在ChatGLM和LLaMA的efficient_tuning下,用deepspeed和accelerate的多卡infer同报错
accelerate launch
./LLaMA-Efficient-Tuning/src/train_bash.py
--max_samples 50
--model_name_or_path "Llama-2-13B-fp16/"
--do_predict
--dataset alpaca_zh
--dataset_dir "LLaMA-Efficient-Tuning/data"
--finetuning_type lora
--output_dir Efficient_Tuning/llama2-13b
--per_device_eval_batch_size 4
--predict_with_generate
--fp16
蹲解决方法

@hiyouga
Copy link
Owner

hiyouga commented Aug 8, 2023

目前 do_predict 仅支持单卡

@jiahuanluo
Copy link
Contributor Author

LLaMA-Efficient-Tuning 之前是可以的,代码越更新越多bug了

@hiyouga hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels Aug 9, 2023
@hiyouga hiyouga closed this as completed Aug 9, 2023
@kuailehaha
Copy link

是这样的 6月份的版本可以accelerate launch多卡

@hiyouga
Copy link
Owner

hiyouga commented Aug 9, 2023

@kuailehaha pull 最新的代码试一下

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

3 participants