Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expert merging: c-BTM #40

Open
mrcabbage972 opened this issue May 4, 2023 · 3 comments
Open

Expert merging: c-BTM #40

mrcabbage972 opened this issue May 4, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request Merging

Comments

@mrcabbage972
Copy link
Collaborator

We would like to create a script for creating a merged model by using the C-BTM method.

The script would take as input:

List of experts models from the [MDEL HF repo](https://huggingface.co/Multi-Domain-Expert-Layers).
Name of the output model

The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.

@mrcabbage972 mrcabbage972 added the enhancement New feature or request label May 4, 2023
@NourFahmy NourFahmy self-assigned this May 5, 2023
@kenhktsui kenhktsui self-assigned this May 7, 2023
@kenhktsui
Copy link
Collaborator

I would also work on this too.
@NourFahmy There are two steps that we could split 😃

  • Clustering of original training data used in different expert
  • Inference code (that weight next token prediction logits according to proximity of input to cluster centroid)

@mrcabbage972
Copy link
Collaborator Author

mrcabbage972 commented May 9, 2023

@NourFahmy @kenhktsui
Check out Minho's adapation of the clustering step from the cBTM repo.

This was referenced May 10, 2023
@NourFahmy
Copy link
Collaborator

Hi @kenhktsui - happy to take on inference and support where need be on clustering, and to fill any gaps from Minho's efforts.

I've put up a PR here

I've made the following assumptions I can easily fix:

  • both the embedded context & current token will be passed to the script
  • only the conditional probability of the token at time t given the context is needed, as per formula 2 in the paper

kindly inform if anything else is needed!

cc: @mrcabbage972

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Merging
Projects
None yet
Development

No branches or pull requests

3 participants