You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, it seems, when tested at total client number = client number per round, FedAVG distributed's device sampling make the local training on a client, which should be isolated, have the information from other local dataset. (The issue theoretically will persist in cases total client number != client number per round )
In the design, the local trainer ID is separated from the the local dataset, i.e., you need to update the dataset for each trainer at each round with a given client index before do the local training. This might be beneficial for the cases where a large total client number is present, and when total client number = client number, the device sampling did nothing more than permute the client_indexes. However, do so may cause the above issue.
As we can see from the FedML/fedml_api/distributed/fedavg/FedAvgServerManager.py
In each communication round the client_indexes is permuted. And in 59 we can see, the receiver ID (trainer ID) is not necessarily linked with a specific dataset (determined by client_indexes[receiver_id]), because the order for client_indexes is not the same for each round. Even though, after each round's syncing, all clients start with same weight; the weight then is invariant to each local dataset (they are all same across clients). However, the optimizer's history is different across clients, this dissociation between trainer and local dataset will make the optimizer history on one dataset be applied to another one. The result is that the each local client will have the partial training info about the global dataset, unfairly favoring the results.
This can be verified by setting client_indexes to a fixed list at case total client number = client number per round, which yield a significant worse results than permuting it. The performance should theoratically the same for case total client number = client number per round, as each round the participating clients are the same.
In realistic settings sharing these optimizer's history will involve significant data traffic overhead(double the traffic volume)
The text was updated successfully, but these errors were encountered:
@luke-avionics This design is specialized for client sampling strategy for large client number (total client number >>> client number per round). For example, only 100 users are active among 1 million users in each round.
As for the case that "client number per round = total client number", I think it won't be a problem when we use naive local SGD. If we use local Adam, the optimizer state you mentioned seems also not a problem. In statistics, it doesn't matter which physical worker compute which part of the dataset. I confirmed this with experiments. That's why I reuse the code of client sampling for settings that does not need sampling.
However, I agree that both our arguments are empirical without theoretical guarantee. So let me modify it and fix the client ID, which is more safe without confusion.
Thank you very much for proposing this issue!
chaoyanghe
changed the title
FedAVG distributed's device sampling make the local training on specific client have the information from other local dataset.
Client Sampling Strategy
Oct 17, 2020
Hi, it is indeed a GREAT work!!
However, it seems, when tested at total client number = client number per round, FedAVG distributed's device sampling make the local training on a client, which should be isolated, have the information from other local dataset. (The issue theoretically will persist in cases total client number != client number per round )
In the design, the local trainer ID is separated from the the local dataset, i.e., you need to update the dataset for each trainer at each round with a given client index before do the local training. This might be beneficial for the cases where a large total client number is present, and when total client number = client number, the device sampling did nothing more than permute the client_indexes. However, do so may cause the above issue.
As we can see from the FedML/fedml_api/distributed/fedavg/FedAvgServerManager.py
In each communication round the client_indexes is permuted. And in 59 we can see, the receiver ID (trainer ID) is not necessarily linked with a specific dataset (determined by client_indexes[receiver_id]), because the order for client_indexes is not the same for each round. Even though, after each round's syncing, all clients start with same weight; the weight then is invariant to each local dataset (they are all same across clients). However, the optimizer's history is different across clients, this dissociation between trainer and local dataset will make the optimizer history on one dataset be applied to another one. The result is that the each local client will have the partial training info about the global dataset, unfairly favoring the results.
This can be verified by setting client_indexes to a fixed list at case total client number = client number per round, which yield a significant worse results than permuting it. The performance should theoratically the same for case total client number = client number per round, as each round the participating clients are the same.
In realistic settings sharing these optimizer's history will involve significant data traffic overhead(double the traffic volume)
The text was updated successfully, but these errors were encountered: