-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pipeline parallelism for Grok #87
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
paxml/tasks/lm/model_params.py
Outdated
@@ -834,8 +834,8 @@ def task(self) -> pax_fiddle.Config[tasks_lib.SingleTask]: | |||
assert self.NUM_STAGES is not None | |||
assert self.NUM_LAYERS % (self.NUM_STAGES * self.CIRCULAR_REPEAT) == 0 | |||
assert self.NUM_MICROBATCHES is not None or self.MICROBATCH_SIZE is not None | |||
assert self.ICI_MESH_SHAPE is not None and len(self.ICI_MESH_SHAPE) == 4 | |||
assert self.DCN_MESH_SHAPE is not None and len(self.DCN_MESH_SHAPE) == 4 | |||
assert self.ICI_MESH_SHAPE is not None and len(self.ICI_MESH_SHAPE) >= 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this assert > 4 behind the USE_EXPERT_PARALLEL flag?
Someone using regular PP with an incorrect mesh should get stopped by these assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @hx89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I've updated the PR.
No description provided.