Implementation of FedNova #1

omarfoq · 2021-06-24T12:05:08Z

Hello,

I am not sure about this, but it seems there is a slight mismatch between the implementation of FedNova, and it's description from the paper. It seems that in the current implementation, you compute the cumulative sum of gradient, without multiplying by weights a_{i}, even when etamu is not zero (see below)

# update accumalated local updates
if 'cum_grad' not in param_state:
      param_state['cum_grad'] = torch.clone(d_p).detach()
      param_state['cum_grad'].mul_(local_lr)
else:
      param_state['cum_grad'].add_(local_lr, d_p)

p.data.add_(-local_lr, d_p)

Probably it is needed to add a line in order to multiply d_p by a_i (this variable should be tracked as well), before updating param_state['cum_grad'].

Is what I said correct, or is there anything I am missing?

Thanks in advance for your help

The text was updated successfully, but these errors were encountered:

nthhiep · 2021-12-23T08:30:29Z

Hi.
d_p is initially gradient but at the end of the step, it is a complex update of many factors including gradient, momentum, variance (correction), proximal, ....
So param_state['cum_grad'] is the accumulation of step updates d_p, not accumulation of gradients.
By some equivalent mathematical transformation, you will see this accumulated update param_state['cum_grad'] is equivalent to the sum of ai[j]*grad_stepj. Let's see page 6 of the paper for better understanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of FedNova #1

Implementation of FedNova #1

omarfoq commented Jun 24, 2021

nthhiep commented Dec 23, 2021

Implementation of FedNova #1

Implementation of FedNova #1

Comments

omarfoq commented Jun 24, 2021

nthhiep commented Dec 23, 2021