Add script for merging expert models via weight averaging #36

mrcabbage972 · 2023-05-03T02:47:08Z

We would like to create a script for creating a merged model by averaging expert weights.

The script would take as input:

List of experts models from the MDEL HF repo.
Name of the output model

The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.

kenhktsui · 2023-05-03T03:19:33Z

@mrcabbage972 I am interested to help! We could use lm-evaluation-harness to benchmark the merged model.
The seedLM EleutherAI/pythia-1b-deduped will be a great baseline.

mrcabbage972 · 2023-05-03T13:03:51Z

@kenhktsui Great, please assign the ticket to yourself!

Regarding lm-evaluation-harness, can you please create a separate issue for that and add the details (e.g. on which tasks we are going to test)?

kenhktsui · 2023-05-03T15:28:14Z

@mrcabbage972 I had added the evaluation ticket.

For the merge, let's align and define terminology as I see there are different implementations so that we could assign different tickets to different contributors:

c-BTM - which is a weighted logits of next token prediction
element-wise averaging/ blending of model parameters
mixture-of-experts

mrcabbage972 · 2023-05-06T20:59:43Z

@kenhktsui Let's keep this ticket as element-wise averaging.
I created a separate one for c-BTM.

kenhktsui · 2023-05-07T07:21:22Z

@mrcabbage972 I think ticket has been done by Concedo and TeH_Venom. I would like to work on the c-BTM ticket.

mrcabbage972 · 2023-05-08T02:34:48Z

@kenhktsui The version of Concedo's script that I saw only merges two experts, we need a solution to merge N.

To close the ticket, I think what is needed is a PR that:

adds the script to the repo
Extends it to support merging of N experts
Adds a section in the readme with usage instructions

If you prefer to focus on the c-BTM ticket, I can take this one.

mrcabbage972 · 2023-05-12T16:10:30Z

May be able to load the models layer by layer

kenhktsui self-assigned this May 3, 2023

kenhktsui removed their assignment May 7, 2023

mrcabbage972 added the Merging label May 12, 2023

Stillerman mentioned this issue May 12, 2023

Merge N experts notebook #52

Merged

mrcabbage972 assigned mrcabbage972 and unassigned mrcabbage972 May 12, 2023

Stillerman self-assigned this May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script for merging expert models via weight averaging #36

Add script for merging expert models via weight averaging #36

mrcabbage972 commented May 3, 2023 •

edited

Loading

kenhktsui commented May 3, 2023

mrcabbage972 commented May 3, 2023

kenhktsui commented May 3, 2023

mrcabbage972 commented May 6, 2023

kenhktsui commented May 7, 2023

mrcabbage972 commented May 8, 2023 •

edited

Loading

mrcabbage972 commented May 12, 2023

Add script for merging expert models via weight averaging #36

Add script for merging expert models via weight averaging #36

Comments

mrcabbage972 commented May 3, 2023 • edited Loading

kenhktsui commented May 3, 2023

mrcabbage972 commented May 3, 2023

kenhktsui commented May 3, 2023

mrcabbage972 commented May 6, 2023

kenhktsui commented May 7, 2023

mrcabbage972 commented May 8, 2023 • edited Loading

mrcabbage972 commented May 12, 2023

mrcabbage972 commented May 3, 2023 •

edited

Loading

mrcabbage972 commented May 8, 2023 •

edited

Loading