Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you add more servers in FedAvg for faster training speed? #59

Open
wizard1203 opened this issue Nov 3, 2020 · 10 comments
Open

Could you add more servers in FedAvg for faster training speed? #59

wizard1203 opened this issue Nov 3, 2020 · 10 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@wizard1203
Copy link
Contributor

Could you add more servers in FedAvg for faster training speed?
As the BytePS does.

@chaoyanghe
Copy link
Member

BytePS is for data center-based distributed training, while FedML (e.g., FedAvg) is edge-based distributed training. The particular assumptions of FL include:

  1. heterogeneous data distribution cross devices (non-I.I.D.)
  2. resource constrained edge devices (memory, computational, and communication)
  3. label deficiency (harder to label the data points because of privacy)
  4. concern the security and privacy

So what do you mean "adding more servers in FedAvg"?

@wizard1203
Copy link
Contributor Author

BytePS is for data center-based distributed training, while FedML (e.g., FedAvg) is edge-based distributed training. The particular assumptions of FL include:

  1. heterogeneous data distribution cross devices (non-I.I.D.)
  2. resource constrained edge devices (memory, computational, and communication)
  3. label deficiency (harder to label the data points because of privacy)
  4. concern the security and privacy

So what do you mean "adding more servers in FedAvg"?

I mean adding more parameter servers to improve the communication efficiency. Maybe this can be used suitably only in cluster environment but not true Federated Learning environment with resource constrained edge devices. However, it is still can accelerate the training when doing research.

@prosopher
Copy link
Contributor

prosopher commented Nov 4, 2020

FedML supports multiple parameter servers for the communication efficiency via hierarchical FL and decentralized FL .
In hierarchical FL, there are group parameter servers that split the total client set into multiple client subsets.
In decentralized FL, each client acts as a parameter server.

Please refer to the following links for details.
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/hierarchical_fl
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/decentralized

@chaoyanghe
Copy link
Member

chaoyanghe commented Nov 4, 2020

@wizard1203 Thanks for your suggestion. As for acceleration, FedML is the only research-oriented FL framework that supports cross-machine multiple GPU distributed training. To further accelerate, we can definitely use many techniques from traditional distributed training (very mature with much less research attention). I elaborate a few here:

  1. AllReduce-based GPU-GPU communication using InfiniBand. However, this is not a real FL setting. As you said, it is only useful to evaluate algorithms or modeling which are not insensitive to the training speed.
  2. Hybrid parallelism (model + data parallelism) + pipeline
  3. Bucking BP (as introduced in PyTorch VLDB 2020)
  4. low-bit (half precision)
  5. pruning
    ....

As @prosopher pointed out, you can design any topology as you like. Our topology configuration is very flexible. In distributed computing setting, you can refer the following algorithms with different topologies:
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/distributed

In addition, I have to point out that "adding more parameter servers to improve the communication efficiency" is a bit confusing conceptually. We cannot say using more computation resources improves communication efficiency. Normally, the relationship between computation and communication is a trade-off. Using more parallel computation cannot change the communication itself, and it also does not mean we can speed up the training since the communication cross machines may dominate the training time. But I agree with your idea of using traditional techniques in distributed computing to accelerate FL research. Thanks.

@wizard1203
Copy link
Contributor Author

FedML supports multiple parameter servers for the communication efficiency via hierarchical FL and decentralized FL .
In hierarchical FL, there are group parameter servers that split the total client set into multiple client subsets.
In decentralized FL, each client acts as a parameter server.

Please refer to the following links for details.
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/hierarchical_fl
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/decentralized

@prosopher Thanks for this, I will carefully read them.

@chaoyanghe
Copy link
Member

chaoyanghe commented Nov 4, 2020

FedML supports multiple parameter servers for the communication efficiency via hierarchical FL and decentralized FL .
In hierarchical FL, there are group parameter servers that split the total client set into multiple client subsets.
In decentralized FL, each client acts as a parameter server.

Please refer to the following links for details.
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/hierarchical_fl
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/decentralized

@prosopher Thanks. But I guess he was discussing the distributed computing setting, not the standalone version.

@wizard1203
Copy link
Contributor Author

@chaoyanghe Thanks for your detailed explanation. Maybe I can try to complete it by myself, and when I finish it I would like to push it to your master branch.

@chaoyanghe
Copy link
Member

@chaoyanghe Thanks for your detailed explanation. Maybe I can try to complete it by myself, and when I finish it I would like to push it to your master branch.

Thanks. Looking forward to your contribution.

@chaoyanghe
Copy link
Member

@wizard1203
Copy link
Contributor Author

@wizard1203 Do you mean modifying based on this code?
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/distributed/fedavg

@chaoyanghe No, maybe it needs to base on those codes on fedml_core. Whatever, I may try to do it many days later. In fact, I have some other algorithms that I want to implement more urgently than this.

@fedml-dimitris fedml-dimitris added enhancement New feature or request question Further information is requested labels Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants