-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradvac的梯度更新 #36
Comments
这里的parameter group k是一个可调的超参数,正如原文的实验,k可以是whole_model也可以是all_layer。我们实现的是whole_model版本,后续我们会修改我们的实现,把k这个超参数加进去。 |
谢谢回复!期待细粒度的版本! |
@MartinPR307 GradVac的实现已经修改,通过 |
Closed as no further updates. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
您好!Gradvac原文中提到了网络不同层间的梯度相似度最后收敛到不同的值,所以对不同任务以及不同层设置了不同的目标值。
原文描述如下:
To incorporate these three factors, we exploit an exponential moving average (EMA) variable for tasks i, j and parameter
group k (e.g. the k-th layer) as:
但你们实现的Gradvac仍然只是对不同任务间设置了不同目标值。这是否合理?
The text was updated successfully, but these errors were encountered: