You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not sure about this, but it seems there is a slight mismatch between the implementation of FedNova, and it's description from the paper. It seems that in the current implementation, you compute the cumulative sum of gradient, without multiplying by weights a_{i}, even when etamu is not zero (see below)
# update accumalated local updates
if 'cum_grad' not in param_state:
param_state['cum_grad'] = torch.clone(d_p).detach()
param_state['cum_grad'].mul_(local_lr)
else:
param_state['cum_grad'].add_(local_lr, d_p)
p.data.add_(-local_lr, d_p)
Probably it is needed to add a line in order to multiply d_p by a_i (this variable should be tracked as well), before updating param_state['cum_grad'].
Is what I said correct, or is there anything I am missing?
Thanks in advance for your help
The text was updated successfully, but these errors were encountered:
Hi.
d_p is initially gradient but at the end of the step, it is a complex update of many factors including gradient, momentum, variance (correction), proximal, ....
So param_state['cum_grad'] is the accumulation of step updates d_p, not accumulation of gradients.
By some equivalent mathematical transformation, you will see this accumulated update param_state['cum_grad'] is equivalent to the sum of ai[j]*grad_stepj. Let's see page 6 of the paper for better understanding.
Hello,
I am not sure about this, but it seems there is a slight mismatch between the implementation of FedNova, and it's description from the paper. It seems that in the current implementation, you compute the cumulative sum of gradient, without multiplying by weights a_{i}, even when etamu is not zero (see below)
Probably it is needed to add a line in order to multiply
d_p
bya_i
(this variable should be tracked as well), before updatingparam_state['cum_grad']
.Is what I said correct, or is there anything I am missing?
Thanks in advance for your help
The text was updated successfully, but these errors were encountered: