[lazyinit] add correctness verification #3147

ver217 · 2023-03-16T09:25:00Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

Closes #3134

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Add correctness verification on many model sets.

Known issues: some params of some models may not be lazy initialized and remain eager.

Here is a report.

Torchvision

model class	param lazy rate	buffer lazy rate	non-lazy numel
AlexNet	16/16	0/0	0.000 M
DenseNet	364/364	363/363	0.000 M
EfficientNet	213/213	147/147	0.000 M
GoogLeNet	187/187	177/177	0.000 M
Inception3	292/292	288/288	0.000 M
MobileNetV2	158/158	156/156	0.000 M
MobileNetV3	142/142	102/102	0.000 M
MNASNet	158/158	156/156	0.000 M
ResNet	62/62	60/60	0.000 M
RegNet	215/215	213/213	0.000 M
ResNet	161/161	159/159	0.000 M
ShuffleNetV2	170/170	168/168	0.000 M
SqueezeNet	52/52	0/0	0.000 M
VGG	22/22	0/0	0.000 M
ResNet	161/161	159/159	0.000 M
VisionTransformer	152/152	0/0	0.000 M
ConvNeXt	344/344	0/0	0.000 M
SwinTransformer	173/173	0/12	0.027 M
EfficientNet	452/452	330/330	0.000 M

Diffusers

model class	param lazy rate	buffer lazy rate
AutoencoderKL	92/92	0/0
VQModel	93/93	0/0
CLIPModel	398/398	2/2
CLIPTextModel	196/196	1/1
CLIPVisionModel	199/199	1/1
UNet2DModel	432/432	0/0

Timm

model class	param lazy rate	buffer lazy rate	non-lazy numel
ResNet	263/263	213/213	0.000 M
Beit	199/199	24/24	0.000 M
Cait	476/476	0/0	0.000 M
ConvMixer	262/262	195/195	0.000 M
EfficientNet	649/649	471/471	0.000 M
MlpMixer	150/150	0/0	0.000 M
VisionTransformer	152/152	0/0	0.000 M
VisionTransformerDistilled	155/155	0/0	0.000 M
Beit	199/199	24/24	0.000 M
CoaT	152/152	0/0	0.000 M
VisionTransformer	176/176	0/0	0.000 M
NormFreeNet	128/185	0/0	20.765 M
EfficientFormer	181/181	99/100	0.002 M
VovNet	93/93	69/69	0.000 M
MlpMixer	102/150	0/0	7.633 M
MlpMixer	306/306	0/0	0.000 M
MobileNetV3	138/138	102/102	0.000 M
HighResolutionNet	279/279	273/273	0.000 M
InceptionV3	284/284	282/282	0.000 M
MlpMixer	150/150	0/0	0.000 M
NormFreeNet	243/347	0/0	40.431 M
NormFreeNet	174/228	0/0	3.946 M
RegNet	293/293	198/198	0.000 M
ResNet	118/118	108/108	0.000 M
TNT	351/351	0/0	0.000 M
ResNet	161/161	159/159	0.000 M
ConViT	180/180	0/0	0.000 M
NormFreeNet	176/233	0/0	44.327 M
ConvNeXt	344/344	0/0	0.000 M
VGG	22/22	0/0	0.000 M
DPN	217/217	216/216	0.000 M
DenseNet	364/364	363/363	0.000 M
ReXNetV1	227/227	186/186	0.000 M
SwinTransformer	329/329	11/35	0.055 M

Transformers

model class	param lazy rate	buffer lazy rate	non-lazy numel
AlbertModel	24/25	2/2	3.662 M
AlbertForPreTraining	30/34	2/2	7.381 M
AlbertForMaskedLM	26/30	2/2	7.381 M
AlbertForSequenceClassification	26/27	2/2	3.662 M
AlbertForTokenClassification	24/25	2/2	3.662 M
AlbertForQuestionAnswering	24/25	2/2	3.662 M
AlbertForMultipleChoice	26/27	2/2	3.662 M
BertModel	38/39	2/2	3.726 M
BertForPreTraining	44/48	2/2	7.510 M
BertLMHeadModel	40/44	2/2	7.510 M
BertForMaskedLM	40/44	2/2	7.510 M
BertForSequenceClassification	40/41	2/2	3.726 M
BertForTokenClassification	38/39	2/2	3.726 M
BertForNextSentencePrediction	40/41	2/2	3.726 M
BertForMultipleChoice	40/41	2/2	3.726 M
GPT2Model	28/28	4/4	0.000 M
GPT2LMHeadModel	28/29	4/4	36.809 M
GPT2DoubleHeadsModel	30/31	4/4	36.809 M
GPT2ForTokenClassification	30/30	4/4	0.000 M
GPT2ForSequenceClassification	29/29	4/4	0.000 M
OPTModel	35/36	0/0	6.137 M
OPTForCausalLM	35/37	0/0	12.273 M
T5Model	47/47	0/0	0.000 M
T5ForConditionalGeneration	47/48	0/0	3.922 M
T5EncoderModel	19/19	0/0	0.000 M

Torchaudio

model class	param lazy rate	buffer lazy rate	non-lazy numel
Conformer	120/120	12/12	0.000 M
ConvTasNet	343/343	0/0	0.000 M
DeepSpeech	18/18	0/0	0.000 M
Emformer	64/64	0/0	0.000 M
Wav2Letter	24/24	0/0	0.000 M
Wav2Letter	22/22	0/0	0.000 M
WaveRNN	36/36	15/15	0.000 M
Tacotron2	60/60	24/24	0.000 M
Wav2Vec2Model	×	×	×

Torchrec

Deepfm

model class	param lazy rate	buffer lazy rate
DenseArch	4/4	0/0
FMInteractionArch	2/2	0/0
OverArch	2/2	0/0
SimpleDeepFMNN	8/10	0/0
SparseArch	0/2	0/0

DLRM

model class	param lazy rate	buffer lazy rate
DLRM	8/10	0/0
DenseArch	4/4	0/0
InteractionArch	0/0	0/0
OverArch	4/4	0/0
SparseArch	0/2	0/0

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

ver217 · 2023-03-16T09:42:17Z

Torch's version in CI is 1.11, which is incompatible with meta tensor. I run test on local machine:

ver217 added 13 commits March 15, 2023 14:53

[lazyinit] fix shared module

7235c1e

[tests] add lazy init test utils

6e5a1cc

[tests] add torchvision for lazy init

8bf6e82

[lazyinit] fix pre op fn

af0ef0e

[lazyinit] handle legacy constructor

430fd2c

[tests] refactor lazy init test models

8d19462

[tests] refactor lazy init test utils

a559faf

[lazyinit] fix ops don't support meta

bc995a4

[tests] lazy init test timm models

90ebff1

[lazyinit] fix set data

a1c9998

[lazyinit] handle apex layers

dc2e0b1

[tests] lazy init test transformers models

b741af4

[tests] lazy init test torchaudio models

26ea428

ver217 added Run Build and Test lazyinit Lazy initialization labels Mar 16, 2023

ver217 requested a review from FrankLeeeee March 16, 2023 09:25

[lazyinit] fix import path

9990077

ver217 requested a review from YuliangLiu0306 March 16, 2023 09:44

ver217 added 2 commits March 17, 2023 10:50

Merge branch 'main' into test/lazy-init

63eb817

[tests] lazy init test torchrec models

d325792

FrankLeeeee approved these changes Mar 17, 2023

View reviewed changes

ver217 added 3 commits March 17, 2023 11:38

[tests] update torch version in CI

67c8ed5

[tests] revert torch version in CI

ec01bfb

[tests] skip lazy init test

f698198

ver217 merged commit 6ae8ed0 into hpcaitech:main Mar 17, 2023

ver217 deleted the test/lazy-init branch March 17, 2023 05:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lazyinit] add correctness verification #3147

[lazyinit] add correctness verification #3147

ver217 commented Mar 16, 2023 •

edited

Loading

ver217 commented Mar 16, 2023 •

edited

Loading

[lazyinit] add correctness verification #3147

[lazyinit] add correctness verification #3147

Conversation

ver217 commented Mar 16, 2023 • edited Loading

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

Torchvision

Diffusers

Timm

Transformers

Torchaudio

Torchrec

Deepfm

DLRM

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

ver217 commented Mar 16, 2023 • edited Loading

ver217 commented Mar 16, 2023 •

edited

Loading

ver217 commented Mar 16, 2023 •

edited

Loading