Client Sampling Strategy #45

luke-avionics · 2020-10-17T02:43:18Z

Hi, it is indeed a GREAT work!!

However, it seems, when tested at total client number = client number per round, FedAVG distributed's device sampling make the local training on a client, which should be isolated, have the information from other local dataset. (The issue theoretically will persist in cases total client number != client number per round )

In the design, the local trainer ID is separated from the the local dataset, i.e., you need to update the dataset for each trainer at each round with a given client index before do the local training. This might be beneficial for the cases where a large total client number is present, and when total client number = client number, the device sampling did nothing more than permute the client_indexes. However, do so may cause the above issue.

As we can see from the FedML/fedml_api/distributed/fedavg/FedAvgServerManager.py

In each communication round the client_indexes is permuted. And in 59 we can see, the receiver ID (trainer ID) is not necessarily linked with a specific dataset (determined by client_indexes[receiver_id]), because the order for client_indexes is not the same for each round. Even though, after each round's syncing, all clients start with same weight; the weight then is invariant to each local dataset (they are all same across clients). However, the optimizer's history is different across clients, this dissociation between trainer and local dataset will make the optimizer history on one dataset be applied to another one. The result is that the each local client will have the partial training info about the global dataset, unfairly favoring the results.

This can be verified by setting client_indexes to a fixed list at case total client number = client number per round, which yield a significant worse results than permuting it. The performance should theoratically the same for case total client number = client number per round, as each round the participating clients are the same.

In realistic settings sharing these optimizer's history will involve significant data traffic overhead(double the traffic volume)

chaoyanghe · 2020-10-17T03:19:50Z

@luke-avionics This design is specialized for client sampling strategy for large client number (total client number >>> client number per round). For example, only 100 users are active among 1 million users in each round.

As for the case that "client number per round = total client number", I think it won't be a problem when we use naive local SGD. If we use local Adam, the optimizer state you mentioned seems also not a problem. In statistics, it doesn't matter which physical worker compute which part of the dataset. I confirmed this with experiments. That's why I reuse the code of client sampling for settings that does not need sampling.

However, I agree that both our arguments are empirical without theoretical guarantee. So let me modify it and fix the client ID, which is more safe without confusion.

Thank you very much for proposing this issue!

chaoyanghe · 2020-10-17T04:08:54Z

@luke-avionics I've updated the code. Please try again. Thanks for using our library for your research.

chaoyanghe changed the title ~~FedAVG distributed's device sampling make the local training on specific client have the information from other local dataset.~~ Client Sampling Strategy Oct 17, 2020

chaoyanghe added the good first issue Good for newcomers label Oct 17, 2020

chaoyanghe closed this as completed Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client Sampling Strategy #45

Client Sampling Strategy #45

luke-avionics commented Oct 17, 2020

chaoyanghe commented Oct 17, 2020 •

edited

chaoyanghe commented Oct 17, 2020

Client Sampling Strategy #45

Client Sampling Strategy #45

Comments

luke-avionics commented Oct 17, 2020

chaoyanghe commented Oct 17, 2020 • edited

chaoyanghe commented Oct 17, 2020

chaoyanghe commented Oct 17, 2020 •

edited