Expert merging: c-BTM #40

mrcabbage972 · 2023-05-04T16:14:17Z

We would like to create a script for creating a merged model by using the C-BTM method.

The script would take as input:

List of experts models from the [MDEL HF repo](https://huggingface.co/Multi-Domain-Expert-Layers).
Name of the output model

The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.

The text was updated successfully, but these errors were encountered:

kenhktsui · 2023-05-07T09:44:52Z

I would also work on this too.
@NourFahmy There are two steps that we could split 😃

Clustering of original training data used in different expert
Inference code (that weight next token prediction logits according to proximity of input to cluster centroid)

mrcabbage972 · 2023-05-09T02:01:23Z

@NourFahmy @kenhktsui
Check out Minho's adapation of the clustering step from the cBTM repo.

NourFahmy · 2023-05-10T02:21:18Z

Hi @kenhktsui - happy to take on inference and support where need be on clustering, and to fill any gaps from Minho's efforts.

I've put up a PR here

I've made the following assumptions I can easily fix:

both the embedded context & current token will be passed to the script
only the conditional probability of the token at time t given the context is needed, as per formula 2 in the paper

kindly inform if anything else is needed!

mrcabbage972 added the enhancement New feature or request label May 4, 2023

NourFahmy self-assigned this May 5, 2023

kenhktsui self-assigned this May 7, 2023

This was referenced May 10, 2023

Add files via upload NourFahmy/MDEL#1

Open

c-btm inference #50

Merged

mrcabbage972 added the Merging label May 12, 2023

Provide feedback