We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA_VISIBLE_DEVICES=0 python src/train.py --stage sft --do_train True --model_name_or_path /xxx/Llama-2-7b-hf --finetuning_type lora --template default --dataset alpaca_gpt4_en --cutoff_len 1024 --learning_rate 0.0001 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 8 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 10 --save_steps 10000 --warmup_ratio 0.01 --val_size 0.1 --per_device_eval_batch_size 16 --evaluation_strategy steps --eval_steps 5000 --optim adamw_torch --report_to wandb --output_dir saves/llama2-7b-lora-baseline-qv-r8-rotate-noreplace/ --fp16 True --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --lora_target q_proj,v_proj --plot_loss True --load_best_model_at_end
max_grad_norm 应该用于gradient clipping,测试下来好像没有起到作用,加上max_grad_norm之后仍然会有超过max_grad_norm的gradient norm value
No response
The text was updated successfully, but these errors were encountered:
这里记录的是 clip 之前的 norm
Sorry, something went wrong.
No branches or pull requests
Reminder
Reproduction
CUDA_VISIBLE_DEVICES=0 python src/train.py
--stage sft
--do_train True
--model_name_or_path /xxx/Llama-2-7b-hf
--finetuning_type lora
--template default
--dataset alpaca_gpt4_en
--cutoff_len 1024
--learning_rate 0.0001
--num_train_epochs 3.0
--max_samples 100000
--per_device_train_batch_size 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 10
--save_steps 10000
--warmup_ratio 0.01
--val_size 0.1
--per_device_eval_batch_size 16
--evaluation_strategy steps
--eval_steps 5000
--optim adamw_torch
--report_to wandb
--output_dir saves/llama2-7b-lora-baseline-qv-r8-rotate-noreplace/
--fp16 True
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--lora_target q_proj,v_proj
--plot_loss True
--load_best_model_at_end
Expected behavior
max_grad_norm 应该用于gradient clipping,测试下来好像没有起到作用,加上max_grad_norm之后仍然会有超过max_grad_norm的gradient norm value
System Info
Others
No response
The text was updated successfully, but these errors were encountered: