Gradvac的梯度更新 #36

MartinPR307 · 2023-04-17T10:23:14Z

您好！Gradvac原文中提到了网络不同层间的梯度相似度最后收敛到不同的值，所以对不同任务以及不同层设置了不同的目标值。
原文描述如下：
To incorporate these three factors, we exploit an exponential moving average (EMA) variable for tasks i, j and parameter
group k (e.g. the k-th layer) as:

但你们实现的Gradvac仍然只是对不同任务间设置了不同目标值。这是否合理？

Baijiong-Lin · 2023-04-17T10:42:06Z

这里的parameter group k是一个可调的超参数，正如原文的实验，k可以是whole_model也可以是all_layer。我们实现的是whole_model版本，后续我们会修改我们的实现，把k这个超参数加进去。

MartinPR307 · 2023-04-17T10:57:45Z

谢谢回复！期待细粒度的版本！

Baijiong-Lin · 2023-06-19T09:47:39Z

@MartinPR307 GradVac的实现已经修改，通过--GradVac_group_type可以选择whole_model，all_layer和all_matrix，分别对应原文的描述如下，

Baijiong-Lin · 2023-06-21T05:17:36Z

Closed as no further updates.

Baijiong-Lin added a commit that referenced this issue Jun 19, 2023

update GradVac (#36)

13f5fc7

Baijiong-Lin closed this as completed Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradvac的梯度更新 #36

Gradvac的梯度更新 #36

MartinPR307 commented Apr 17, 2023

Baijiong-Lin commented Apr 17, 2023

MartinPR307 commented Apr 17, 2023

Baijiong-Lin commented Jun 19, 2023

Baijiong-Lin commented Jun 21, 2023

Gradvac的梯度更新 #36

Gradvac的梯度更新 #36

Comments

MartinPR307 commented Apr 17, 2023

Baijiong-Lin commented Apr 17, 2023

MartinPR307 commented Apr 17, 2023

Baijiong-Lin commented Jun 19, 2023

Baijiong-Lin commented Jun 21, 2023