<a href="https://colab.research.google.com/github/palindromeRice/Moe_demo/blob/main/Mixture_of_Experts_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mixture of Experts: Simplified Prompt Routing

Here I tried to explain the core idea behind **Mixture of Experts (MoE)** using a minimal, embedding-based routing system.

We define expert models and route user prompts to the most relevant one based on semantic similarity.


In this step, we install all the libraries we'll need:
- `transformers`: for using pre-trained language models.
- `sentence-transformers`: for turning text into useful number-based representations (called embeddings).



In [2]:
# Install required libraries
!pip install -q transformers sentence-transformers


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m105.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m80.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m53.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

We import the libraries we just installed and set up our embedding model.

This model will help us understand the meaning of both the user's prompt and the expert descriptions by converting them into vectors (think of them as smart number lists).


In [5]:
import warnings
warnings.filterwarnings('ignore')

In [7]:
import torch
from transformers import pipeline
from sentence_transformers import SentenceTransformer, util

embedder = SentenceTransformer('all-MiniLM-L6-v2')


Here, we define a list of "experts." Each expert has:
- A short description of what they do
- The model they'll use
- The kind of task they handle (like text generation or classification)

We also prepare a cache to store expert embeddings so we don't have to recompute them every time.


In [4]:
expert_profiles = {
    "codet5-small": {
        "description": "Code generation expert: generates syntactically correct Python functions and code snippets from natural language instructions.",
        "model_name": "Salesforce/codet5-small",
        "task": "text2text-generation"
    },
    "t5-commongen": {
        "description": "Text transformation expert: rephrases or refines general text without specializing in code.",
        "model_name": "mrm8488/t5-small-finetuned-common_gen",
        "task": "text2text-generation"
    },
    "qa": {
        "description": "Question answering expert: extracts answers from context given a question.",
        "model_name": "distilbert-base-uncased-distilled-squad",
        "task": "question-answering"
    },
    "classification": {
        "description": "Text classification expert: analyzes sentiment and categorizes text.",
        "model_name": "distilbert-base-uncased-finetuned-sst-2-english",
        "task": "sentiment-analysis"
    },
    "tiny-gpt2": {
        "description": "Open-ended text generation expert: generates creative text not specific to code.",
        "model_name": "sshleifer/tiny-gpt2",
        "task": "text-generation"
    }
}

_expert_embedding_cache = {}


Now we define two functions:
1. One to get and store the expert descriptions as embeddings.
2. Another to match a user prompt with the most relevant expert(s), based on how similar the meanings are.

We'll use this to figure out which expert is best for answering a prompt.


In [8]:
def get_cached_expert_embeddings(profiles):
    global _expert_embedding_cache
    if not _expert_embedding_cache:
        expert_names = list(profiles.keys())
        expert_descriptions = [profiles[expert]["description"] for expert in expert_names]
        embeddings = embedder.encode(expert_descriptions, convert_to_tensor=True)
        _expert_embedding_cache = {name: emb for name, emb in zip(expert_names, embeddings)}
        print("Expert embeddings computed and cached.")
    return _expert_embedding_cache

def route_prompt_to_experts(prompt, profiles, threshold=0.3, top_k=3):
    expert_embeddings = get_cached_expert_embeddings(profiles)
    expert_names = list(expert_embeddings.keys())
    embeddings_tensor = torch.stack([expert_embeddings[name] for name in expert_names])
    prompt_embedding = embedder.encode(prompt, convert_to_tensor=True)
    cosine_scores = util.cos_sim(prompt_embedding, embeddings_tensor)[0]
    expert_scores = {name: score.item() for name, score in zip(expert_names, cosine_scores)}
    for name, score in expert_scores.items():
        print(f"Expert '{name}' similarity: {score:.4f}")
    filtered_experts = {name: score for name, score in expert_scores.items() if score >= threshold}
    if not filtered_experts:
        print("No expert met the threshold. Using all expert scores for selection.")
        filtered_experts = expert_scores
    sorted_experts = sorted(filtered_experts.items(), key=lambda item: item[1], reverse=True)
    selected_experts = [name for name, score in sorted_experts[:top_k]]
    print(f"Selected experts: {selected_experts}")
    return selected_experts


In this final step, we test everything using an example prompt

We’ll see how similar this prompt is to each expert's description and choose the most relevant one based on that.


In [9]:
sample_prompt = "Write a Python function to reverse a string."
selected_experts = route_prompt_to_experts(sample_prompt, expert_profiles, threshold=0.3, top_k=1)
print("Final selected expert(s):", selected_experts)


Expert embeddings computed and cached.
Expert 'codet5-small' similarity: 0.3539
Expert 't5-commongen' similarity: 0.2246
Expert 'qa' similarity: 0.1086
Expert 'classification' similarity: 0.0846
Expert 'tiny-gpt2' similarity: 0.1466
Selected experts: ['codet5-small']
Final selected expert(s): ['codet5-small']
