Auto convert moe param groups #5354

jeffra · 2024-04-02T18:27:32Z

When using frameworks like HF Accelerate with MoE models in HF there's an issue when DeepSpeed is creating the optimizer where we have no way to automatically create the compatible MoE param groups. This PR detects if no client optimizer is set and model_parameters are passed to DeepSpeed that they are either MoE compatible or makes them MoE compatible automatically.

This was never an issue previously since (1) MoE hasn't really been tested outside MDS and (2) MDS manually converts the weight-decay param groups into being MoE compatible before deepspeed.initialize.

The error that is triggered if the param groups are not MoE compatible is triggered here:

DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py

Lines 610 to 612 in cc897ec

    
           assert any( 
        
               [self.is_moe_group(group) for group in self.optimizer.param_groups] 
        
           ), "The model has moe layers, but None of the param groups are marked as MoE. Create a param group with 'moe' key set to True before creating optimizer"

Tagging @tohtana and @ykim362 to help review

deepspeed/moe/utils.py

@tohtana

When using frameworks like HF Accelerate with MoE models in HF there's an issue when DeepSpeed is creating the optimizer where we have no way to automatically create the compatible MoE param groups. This PR detects if no client optimizer is set and model_parameters are passed to DeepSpeed that they are either MoE compatible or makes them MoE compatible automatically. This was never an issue previously since (1) MoE hasn't really been tested outside MDS and (2) MDS manually converts the weight-decay param groups into being MoE compatible before deepspeed.initialize. The error that is triggered if the param groups are not MoE compatible is triggered here: https://github.com/microsoft/DeepSpeed/blob/cc897ecf15fdac5437fa4a2743154dc6c1749da4/deepspeed/runtime/zero/stage_1_and_2.py#L610-L612 Tagging @tohtana and @ykim362 to help review --------- Co-authored-by: Jeff Rasley <jeff.rasley@snowflake.com>

jeffra added 3 commits April 2, 2024 10:55

ensure capacity does not exceed number of tokens

d6d1bfe

auto convert moe param groups

c96e350

remove unrelated test

e942e76

jeffra requested review from mrwyattii, tjruwase, awan-10 and loadams as code owners April 2, 2024 18:27

jeffra and others added 2 commits April 2, 2024 15:19

Update test_moe.py

2c42ef1

Merge branch 'master' into auto-param-group

5905f9b

tohtana reviewed Apr 4, 2024

View reviewed changes

deepspeed/moe/utils.py Show resolved Hide resolved

deepspeed/moe/utils.py Show resolved Hide resolved

jeffra and others added 3 commits April 4, 2024 15:29

Merge branch 'master' into auto-param-group

955e37c

address comments

9bd9a1f

Merge branch 'master' into auto-param-group

0fc1020

tohtana approved these changes Apr 5, 2024

View reviewed changes

formatting

54a8a32

loadams enabled auto-merge April 5, 2024 16:25

loadams added this pull request to the merge queue Apr 5, 2024

Merged via the queue into microsoft:master with commit 42a8eaa Apr 5, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto convert moe param groups #5354

Auto convert moe param groups #5354

jeffra commented Apr 2, 2024 •

edited

	assert any(
	[self.is_moe_group(group) for group in self.optimizer.param_groups]
	), "The model has moe layers, but None of the param groups are marked as MoE. Create a param group with 'moe' key set to True before creating optimizer"

Auto convert moe param groups #5354

Auto convert moe param groups #5354

Conversation

jeffra commented Apr 2, 2024 • edited

jeffra commented Apr 2, 2024 •

edited