Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script for merging expert models via weight averaging #36

Open
mrcabbage972 opened this issue May 3, 2023 · 7 comments
Open

Add script for merging expert models via weight averaging #36

mrcabbage972 opened this issue May 3, 2023 · 7 comments
Assignees
Labels

Comments

@mrcabbage972
Copy link
Collaborator

mrcabbage972 commented May 3, 2023

We would like to create a script for creating a merged model by averaging expert weights.

The script would take as input:

  1. List of experts models from the MDEL HF repo.
  2. Name of the output model

The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.

@kenhktsui
Copy link
Collaborator

@mrcabbage972 I am interested to help! We could use lm-evaluation-harness to benchmark the merged model.
The seedLM EleutherAI/pythia-1b-deduped will be a great baseline.

@mrcabbage972
Copy link
Collaborator Author

@kenhktsui Great, please assign the ticket to yourself!

Regarding lm-evaluation-harness, can you please create a separate issue for that and add the details (e.g. on which tasks we are going to test)?

@kenhktsui kenhktsui self-assigned this May 3, 2023
@kenhktsui
Copy link
Collaborator

@mrcabbage972 I had added the evaluation ticket.

For the merge, let's align and define terminology as I see there are different implementations so that we could assign different tickets to different contributors:

  • c-BTM - which is a weighted logits of next token prediction
  • element-wise averaging/ blending of model parameters
  • mixture-of-experts

@mrcabbage972
Copy link
Collaborator Author

@kenhktsui Let's keep this ticket as element-wise averaging.
I created a separate one for c-BTM.

@kenhktsui
Copy link
Collaborator

@mrcabbage972 I think ticket has been done by Concedo and TeH_Venom. I would like to work on the c-BTM ticket.

@kenhktsui kenhktsui removed their assignment May 7, 2023
@mrcabbage972
Copy link
Collaborator Author

mrcabbage972 commented May 8, 2023

@kenhktsui The version of Concedo's script that I saw only merges two experts, we need a solution to merge N.

To close the ticket, I think what is needed is a PR that:

  1. adds the script to the repo
  2. Extends it to support merging of N experts
  3. Adds a section in the readme with usage instructions

If you prefer to focus on the c-BTM ticket, I can take this one.

@mrcabbage972
Copy link
Collaborator Author

May be able to load the models layer by layer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants