-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[autoparallel] add pooling metainfo #1968
Merged
Cypher30
merged 76 commits into
hpcaitech:main
from
Cypher30:feature/metainfo_for_auto_parallel
Nov 18, 2022
Merged
[autoparallel] add pooling metainfo #1968
Cypher30
merged 76 commits into
hpcaitech:main
from
Cypher30:feature/metainfo_for_auto_parallel
Nov 18, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Merge ColossalAI
Daily merge
…r30/ColossalAI into feature/metainfo_for_auto_parallel
…r30/ColossalAI into feature/metainfo_for_auto_parallel
YuliangLiu0306
approved these changes
Nov 18, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What’s New?
In this PR, I implement the metainfo generator for pooling operations, including
AdaptiveAvgPool
andMaxPool
. Also I found one interesting point during aligning the estimated memory cost with the real one. The_split
incomm_spec.py
actually being triggered twice when you meet sharing spec likeS01
, it will split the tensor along two dimensions of device mesh respectively, producing a piece of memory which could confusing when you measure the memory during runtime.For example, you have an input with the shape of
[4, 128, 64, 64]
withdtype=float32
, it takes 8192KB memory, and you want to split it on a device mesh with shape of(2, 2)
and the sharding spec isRS01RR
. To split it, you will found the shape consistency will first call_split
on one dimension, producing a tensor with the shape of[4, 64, 64, 64]
, which will consume 4096KB extra memory because split the tensor on dimension 1 will create non-contiguous tensor. Then the second split will produce a tensor with the shape of[4, 32, 64, 64]
to meet our requirement, thus producing another 2048KB memory, and the former created 4096KB memory will be discarded. Thus, you will observe a peak of 4096KB and the actual memory allocated is 2048KB. It is not being discovered in the previous op patch because the output is much bigger than the input, as we test the memory peak and memory allocated for the whole forward phase, the output it produces is much bigger than the peak that_split
produces, so it covers this tricky little case.