The performance of model parallelism (MP) is not good #124

feifeibear · 2022-01-06T11:19:55Z

Hello developers.

I found the performance of MP provided is not good. I compared it with PatrickStar and DeepSpeed. Can you check it with me? See MR #115
BTW: I strongly recommend you add Tflops as an indicator of performance.

Platform: a node of SuperPod including 8xA100 and 1TB memory CPU. BS = batch size, pstar=PatrickStar, deeps=DeepSpeed
Entries indicate the Throughput (batch/elapse). Xd-Xmp is using Colossal-AI.

Model Scale	global BS	1d-4mp	1d-8mp	2d-4mp	2d-8mp	3d-4mp	2.5d-4mp	pstar	deeps	deeps-mp4	deeps-mp8
4B	8	7.61	7.62	9.89	8.47	failed	10.31	8.78	1.15	1.26	1.26
4B	16.	OOM	OOM	OOM	OOM	OOM	OOM	16.67	2.26	2.42	2.36
4B	128	OOM	OOM	OOM	OOM	OOM	OOM	28.39	12.51	10.80	OOM
10B	2	OOM	3.62	OOM	failed	OOM	OOM	-	-	0.15	0.15
10B	4	OOM	4.66	OOM	OOM	OOM	OOM	-	-	0.30	0.30
10B	128	OOM	OOM	OOM	OOM	OOM	OOM	13.43	OOM	6.31	5.73

As you can see, the computing efficiency is the lowerest among the three solutions on 1 node scale. However, Colossal-AI is very competitive on the same batch size. Unfortunately, the batch size severely limits Colossal-AI performance.
The 2.5d-MP is superior on 4B-8bs. But the 1d-8mp has a better generalization.
Heterologous Training (like PatrickStar and DeepSpeed) may be a better solution, rather than a complex MP strategy, on 1 node scale.

kurisusnowdeng · 2022-01-06T12:04:26Z

Hi @feifeibear . Thank you so much for your effort. We would appreciate it if you could also share the configurations used to test the same models with Deepspeed and PatrickStar? We would like to evaluate and improve the performance on a similar node scale as well as larger scale.
BTW, did you try 3d-8mp? 3d requires a cube number of mp.

feifeibear · 2022-01-07T03:00:08Z

The DeepSpeed benchmark script
https://github.com/feifeibear/DeepSpeedZeRO3Benchmark
The PatrickStar
https://github.com/Tencent/PatrickStar/blob/master/examples/run_transformers.sh
The benchmarking is very easy.

export SUFFIX="colossal_compare"
env GPU_NUM=8 MODEL_TYPE="GPT" MODEL_NAME=GPT3_10B BS=2 CPU_EBD=0 AMM=1 MSC=1 CACHE=1 SP=0 CS=288 HYB=1 TILING=0 ACT_OFFLOAD=0 SUFFIX=${SUFFIX} bash run_transformers.sh

feifeibear · 2022-01-07T03:15:27Z

I have uploaded the logs of DeepSpeed and PatirckStar to Baidu WangPan...
Note that for DeepSpeed, the SamplesPerSec is not equal to 'Throughput'. You have to calculate it by batch/elapse.

link: https://pan.baidu.com/s/1vEHl0hPuxDb7HjOlpuW-YA?pwd=1mfd
code: 1mfd

kurisusnowdeng · 2022-01-07T03:23:25Z

@feifeibear Thank you!

github-actions · 2022-01-22T00:12:27Z

This issue is stale because it has been open for 14 days with no activity.

binmakeswell · 2022-04-13T04:08:30Z

Thanks for your report, detailed tests with stable code will come soon.

binmakeswell · 2023-04-13T03:34:23Z

We have updated a lot. This issue was closed due to inactivity. Thanks.

feifeibear changed the title ~~[FEATURE] The performance of model parallelism (MP) is not good~~ The performance of model parallelism (MP) is not good Jan 6, 2022

github-actions bot added the stale label Jan 22, 2022

binmakeswell added enhancement New feature or request and removed stale labels Apr 13, 2022

binmakeswell closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance of model parallelism (MP) is not good #124

The performance of model parallelism (MP) is not good #124

feifeibear commented Jan 6, 2022

kurisusnowdeng commented Jan 6, 2022 •

edited

feifeibear commented Jan 7, 2022

feifeibear commented Jan 7, 2022

kurisusnowdeng commented Jan 7, 2022

github-actions bot commented Jan 22, 2022

binmakeswell commented Apr 13, 2022

binmakeswell commented Apr 13, 2023

The performance of model parallelism (MP) is not good #124

The performance of model parallelism (MP) is not good #124

Comments

feifeibear commented Jan 6, 2022

kurisusnowdeng commented Jan 6, 2022 • edited

feifeibear commented Jan 7, 2022

feifeibear commented Jan 7, 2022

kurisusnowdeng commented Jan 7, 2022

github-actions bot commented Jan 22, 2022

binmakeswell commented Apr 13, 2022

binmakeswell commented Apr 13, 2023

kurisusnowdeng commented Jan 6, 2022 •

edited