Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Sampling Strategy #45

Closed
luke-avionics opened this issue Oct 17, 2020 · 2 comments
Closed

Client Sampling Strategy #45

luke-avionics opened this issue Oct 17, 2020 · 2 comments
Labels
good first issue Good for newcomers

Comments

@luke-avionics
Copy link

Hi, it is indeed a GREAT work!!

However, it seems, when tested at total client number = client number per round, FedAVG distributed's device sampling make the local training on a client, which should be isolated, have the information from other local dataset. (The issue theoretically will persist in cases total client number != client number per round )

In the design, the local trainer ID is separated from the the local dataset, i.e., you need to update the dataset for each trainer at each round with a given client index before do the local training. This might be beneficial for the cases where a large total client number is present, and when total client number = client number, the device sampling did nothing more than permute the client_indexes. However, do so may cause the above issue.

As we can see from the FedML/fedml_api/distributed/fedavg/FedAvgServerManager.py

In each communication round the client_indexes is permuted. And in 59 we can see, the receiver ID (trainer ID) is not necessarily linked with a specific dataset (determined by client_indexes[receiver_id]), because the order for client_indexes is not the same for each round. Even though, after each round's syncing, all clients start with same weight; the weight then is invariant to each local dataset (they are all same across clients). However, the optimizer's history is different across clients, this dissociation between trainer and local dataset will make the optimizer history on one dataset be applied to another one. The result is that the each local client will have the partial training info about the global dataset, unfairly favoring the results.

This can be verified by setting client_indexes to a fixed list at case total client number = client number per round, which yield a significant worse results than permuting it. The performance should theoratically the same for case total client number = client number per round, as each round the participating clients are the same.

In realistic settings sharing these optimizer's history will involve significant data traffic overhead(double the traffic volume)

image

@chaoyanghe
Copy link
Member

chaoyanghe commented Oct 17, 2020

@luke-avionics This design is specialized for client sampling strategy for large client number (total client number >>> client number per round). For example, only 100 users are active among 1 million users in each round.

As for the case that "client number per round = total client number", I think it won't be a problem when we use naive local SGD. If we use local Adam, the optimizer state you mentioned seems also not a problem. In statistics, it doesn't matter which physical worker compute which part of the dataset. I confirmed this with experiments. That's why I reuse the code of client sampling for settings that does not need sampling.

However, I agree that both our arguments are empirical without theoretical guarantee. So let me modify it and fix the client ID, which is more safe without confusion.

Thank you very much for proposing this issue!

@chaoyanghe chaoyanghe changed the title FedAVG distributed's device sampling make the local training on specific client have the information from other local dataset. Client Sampling Strategy Oct 17, 2020
@chaoyanghe chaoyanghe added the good first issue Good for newcomers label Oct 17, 2020
@chaoyanghe
Copy link
Member

@luke-avionics I've updated the code. Please try again. Thanks for using our library for your research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants