[autoparallel] add pooling metainfo #1968

Cypher30 · 2022-11-16T15:32:56Z

What’s New?

In this PR, I implement the metainfo generator for pooling operations, including AdaptiveAvgPool and MaxPool. Also I found one interesting point during aligning the estimated memory cost with the real one. The _split in comm_spec.py actually being triggered twice when you meet sharing spec like S01, it will split the tensor along two dimensions of device mesh respectively, producing a piece of memory which could confusing when you measure the memory during runtime.

For example, you have an input with the shape of [4, 128, 64, 64] with dtype=float32, it takes 8192KB memory, and you want to split it on a device mesh with shape of (2, 2) and the sharding spec is RS01RR. To split it, you will found the shape consistency will first call _split on one dimension, producing a tensor with the shape of [4, 64, 64, 64], which will consume 4096KB extra memory because split the tensor on dimension 1 will create non-contiguous tensor. Then the second split will produce a tensor with the shape of [4, 32, 64, 64] to meet our requirement, thus producing another 2048KB memory, and the former created 4096KB memory will be discarded. Thus, you will observe a peak of 4096KB and the actual memory allocated is 2048KB. It is not being discovered in the previous op patch because the output is much bigger than the input, as we test the memory peak and memory allocated for the whole forward phase, the output it produces is much bigger than the peak that _split produces, so it covers this tricky little case.

Merge ColossalAI

Daily merge

Merge

Daily Merge

…r30/ColossalAI into feature/metainfo_for_auto_parallel

Cypher30 and others added 30 commits July 14, 2022 16:07

Merge pull request #1 from hpcaitech/main

04e5272

Merge ColossalAI

Merge pull request #2 from hpcaitech/main

75618b3

Daily merge

Merge pull request #3 from hpcaitech/main

3e4620c

Merge

Merge remote-tracking branch 'upstream/main' into main

cf24049

Merge

Merge remote-tracking branch 'upstream/main' into main

3d223b6

Daily Merge

Merge branch 'hpcaitech:main' into main

644115c

Merge branch 'hpcaitech:main' into main

d995ade

Merge branch 'hpcaitech:main' into main

bba2dbe

Merge branch 'hpcaitech:main' into main

05ca628

Merge branch 'hpcaitech:main' into main

0a967da

Merge branch 'hpcaitech:main' into main

0637c0d

Merge branch 'hpcaitech:main' into main

74a6227

Merge branch 'hpcaitech:main' into main

e550490

Merge branch 'hpcaitech:main' into main

2d7f5d9

Merge branch 'hpcaitech:main' into main

b62e870

Merge branch 'hpcaitech:main' into main

b4b0974

Merge branch 'hpcaitech:main' into main

65c20de

Merge branch 'hpcaitech:main' into main

1660bfc

Merge branch 'hpcaitech:main' into main

6eb0ad0

Merge branch 'hpcaitech:main' into main

56df059

Merge branch 'hpcaitech:main' into main

480e932

Merge branch 'hpcaitech:main' into main

0fa66ee

Merge branch 'hpcaitech:main' into main

1d013b0

Merge branch 'hpcaitech:main' into main

5774db2

Merge branch 'hpcaitech:main' into main

e8ff699

Merge branch 'hpcaitech:main' into main

855c728

Merge branch 'main' of github.com:Cypher30/ColossalAI into main

2c113ea

Merge branch 'hpcaitech:main' into main

838ba70

Merge branch 'main' of github.com:Cypher30/ColossalAI into main

cacec2b

Merge branch 'hpcaitech:main' into main

5ed6ef0

Cypher30 and others added 24 commits November 3, 2022 15:16

[fx] modify import

2bea563

[fx] modify import

d0472aa

Merge branch 'hpcaitech:main' into feature/metainfo_for_auto_parallel

4d7cbc5

[fx] move meta profiler to auto parallel

5c44c53

[fx] add conv metainfo class

ba69859

[fx] restore profiler

db76c2f

[fx] fix conflict

da7ac6a

[fx] restore meta profiler

465af2b

[autoparallel] modify unit test

0111911

[fx] modify unit test

c4d52e2

Merge branch 'hpcaitech:main' into feature/metainfo_for_auto_parallel

ff542d1

[autoparallel] add batchnorm metainfo class

00bdcc9

Merge branch 'hpcaitech:main' into feature/metainfo_for_auto_parallel

79754ae

[autoparallel] fix batchnorm unit test function declaration

3356d3c

Merge branch 'feature/metainfo_for_auto_parallel' of github.com:Cyphe…

0eb1f99

…r30/ColossalAI into feature/metainfo_for_auto_parallel

[fx] restore profiler

78535d3

[fx] add relu metainfo class

784fc0e

Merge branch 'hpcaitech:main' into feature/metainfo_for_auto_parallel

f159516

[fx] restore profiler

cdd353a

Merge branch 'feature/metainfo_for_auto_parallel' of github.com:Cyphe…

200f61b

…r30/ColossalAI into feature/metainfo_for_auto_parallel

[autoparallel] modify metainfo input

c79a5ae

Merge branch 'hpcaitech:main' into feature/metainfo_for_auto_parallel

520ff79

[autoparallel] add pooling metainfo

f65ff08

[autoparallel] fix conflict

4f39456

Cypher30 requested review from FrankLeeeee, YuliangLiu0306 and super-dainiu November 16, 2022 15:32

YuliangLiu0306 approved these changes Nov 18, 2022

View reviewed changes

YuliangLiu0306 added the Run Build and Test label Nov 18, 2022

Cypher30 merged commit c26f21d into hpcaitech:main Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[autoparallel] add pooling metainfo #1968

[autoparallel] add pooling metainfo #1968

Cypher30 commented Nov 16, 2022

[autoparallel] add pooling metainfo #1968

[autoparallel] add pooling metainfo #1968

Conversation

Cypher30 commented Nov 16, 2022

What’s New?