Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_grad_norm 在sft下面不起作用 #3996

Closed
1 task done
HuangOwen opened this issue May 30, 2024 · 1 comment
Closed
1 task done

max_grad_norm 在sft下面不起作用 #3996

HuangOwen opened this issue May 30, 2024 · 1 comment
Labels
solved This problem has been already solved

Comments

@HuangOwen
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

CUDA_VISIBLE_DEVICES=0 python src/train.py
--stage sft
--do_train True
--model_name_or_path /xxx/Llama-2-7b-hf
--finetuning_type lora
--template default
--dataset alpaca_gpt4_en
--cutoff_len 1024
--learning_rate 0.0001
--num_train_epochs 3.0
--max_samples 100000
--per_device_train_batch_size 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 10
--save_steps 10000
--warmup_ratio 0.01
--val_size 0.1
--per_device_eval_batch_size 16
--evaluation_strategy steps
--eval_steps 5000
--optim adamw_torch
--report_to wandb
--output_dir saves/llama2-7b-lora-baseline-qv-r8-rotate-noreplace/
--fp16 True
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--lora_target q_proj,v_proj
--plot_loss True
--load_best_model_at_end

Expected behavior

max_grad_norm 应该用于gradient clipping,测试下来好像没有起到作用,加上max_grad_norm之后仍然会有超过max_grad_norm的gradient norm value

System Info

image

Others

No response

@hiyouga
Copy link
Owner

hiyouga commented Jun 3, 2024

这里记录的是 clip 之前的 norm

@hiyouga hiyouga added the solved This problem has been already solved label Jun 3, 2024
@hiyouga hiyouga closed this as completed Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants