-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script for merging expert models via weight averaging #36
Comments
@mrcabbage972 I am interested to help! We could use lm-evaluation-harness to benchmark the merged model. |
@kenhktsui Great, please assign the ticket to yourself! Regarding lm-evaluation-harness, can you please create a separate issue for that and add the details (e.g. on which tasks we are going to test)? |
@mrcabbage972 I had added the evaluation ticket. For the merge, let's align and define terminology as I see there are different implementations so that we could assign different tickets to different contributors:
|
@kenhktsui Let's keep this ticket as element-wise averaging. |
@mrcabbage972 I think ticket has been done by Concedo and TeH_Venom. I would like to work on the c-BTM ticket. |
@kenhktsui The version of Concedo's script that I saw only merges two experts, we need a solution to merge N. To close the ticket, I think what is needed is a PR that:
If you prefer to focus on the c-BTM ticket, I can take this one. |
May be able to load the models layer by layer |
We would like to create a script for creating a merged model by averaging expert weights.
The script would take as input:
The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.
The text was updated successfully, but these errors were encountered: