[booster] implemented the cluster module #3191
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📌 Checklist before creating the PR
[doc/gemini/tensor/...]: A concise description
🚨 Issue number
Fixed #3051
📝 What does this PR do?
This PR implemented a
cluster
module to help to coordinate and manage the distributed information in the environment. TheDistCoordinator
is a singleton class which provides utility functions for the user and other modules. The two managers will be used by the features such as hybrid parallelism/gemini/auto parallelism to manage the process group/device mesh etc.@YuliangLiu0306 will work on a separate PR to implement
DeviceMeshManager
.💥 Checklist before requesting a review
⭐️ Do you enjoy contributing to Colossal-AI?
Tell us more if you don't enjoy contributing to Colossal-AI.