# Judging and Routing — Optimizing Resource Usage by Evaluating Problem Complexity

In the original Lab 2, we explored the **Orchestrator–Worker pattern**, where a planner sent the same question to multiple agents, and a judge assessed their responses to evaluate agent intelligence.

In this notebook, we extend that design by adding multiple judges and a routing component to optimize model usage based on task complexity. 

## Imports and Environment Setup

In [6]:
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [7]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
# deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
# if openai_api_key and google_api_key and deepseek_api_key:
#     print("All keys were loaded successfully")

In [8]:
!ollama pull llama3.2
!ollama pull mistral

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l

[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ▕██████████████████▏   96 B                         [K
pulling 34bb5ab01051: 100% ▕██████████████████▏  561 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l
[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling f5074b1221da: 100% ▕██████████████████▏ 4.4 GB                         [K
pulling 43070e2d4e53: 100% ▕██████████████████▏  1

## Creating Models

The notebook uses instances of GPT, Gemini and DeepSeek APIs, along with two local models served via Ollama: ```llama3.2```  and ```mistral```.

In [9]:
model_specs = {
    "gpt-4o-mini" : None,
    "gemini-2.0-flash": {
        "api_key" : google_api_key,
        "url" : "https://generativelanguage.googleapis.com/v1beta/openai/"
    },
    # "deepseek-chat" : {
    #     "api_key" : deepseek_api_key,
    #     "url" : "https://api.deepseek.com/v1"
    # },
    "llama3.2" : {
        "api_key" : "ollama",
        "url" : "http://localhost:11434/v1"
    },
    "mistral" : {
        "api_key" : "ollama",
        "url" : "http://localhost:11434/v1"
    }
}

def create_model(model_name):
    spec = model_specs[model_name]
    if spec is None:
        return OpenAI()
    
    return OpenAI(api_key=spec["api_key"], base_url=spec["url"])

In [10]:
orchestrator_model = "gemini-2.0-flash"
generator = create_model(orchestrator_model)
router = create_model(orchestrator_model)

qa_models = {
    model_name : create_model(model_name) 
    for model_name in model_specs.keys()
}

judges = {
    model_name : create_model(model_name) 
    for model_name, specs in model_specs.items() 
    if not(specs) or specs["api_key"] != "ollama"
}

## Orchestrator-Worker Workflow

First, we generate a question to evaluate the intelligence of each LLM.

In [11]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs "
request += "to evaluate and rank them based on their intelligence. " 
request += "Answer **only** with the question, no explanation or preamble."

messages = [{"role": "user", "content": request}]
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate and rank them based on their intelligence. Answer **only** with the question, no explanation or preamble.'}]

In [12]:
response = generator.chat.completions.create(
    model=orchestrator_model,
    messages=messages,
)
eval_question = response.choices[0].message.content

In [13]:
display(Markdown(eval_question))

Given the hypothetical scenario where humanity has achieved stable, sustainable interstellar travel and encountered multiple intelligent alien civilizations with vastly different cognitive structures, value systems, and methods of communication, design a universal ethical framework capable of addressing novel moral dilemmas arising from interactions with these civilizations. The framework should:

1.  Acknowledge and respect fundamental differences in cognitive structures and value systems.
2.  Establish principles for resolving conflicts arising from incompatible or competing interests.
3.  Outline procedures for mediating misunderstandings stemming from radically different communication methods.
4.  Incorporate mechanisms for self-reflection and adaptation to account for unforeseen ethical challenges posed by new encounters.

Furthermore, explain how this framework would be applied in the specific case of encountering a civilization that operates on principles of collective consciousness, where individual identities are subsumed into a single, unified entity, and actions are governed by a utilitarian calculus that prioritizes the overall well-being of the collective, even at the expense of individual sentience. Analyze potential ethical conflicts that could arise, and justify your proposed solutions within the context of the designed universal ethical framework.


### Task Parallelization

Now, having the question and all the models instantiated it's time to see what each model has to say about the complex task it was given.

In [14]:
question = [{"role": "user", "content": eval_question}]
answers = []
competitors = []

for name, model in qa_models.items():
    response = model.chat.completions.create(model=name, messages=question)
    answer = response.choices[0].message.content
    competitors.append(name)
    answers.append(answer)

answers

["**Universal Ethical Framework for Interstellar Interactions**\n\n**1. Acknowledgment of Differences:**\n   - We recognize and embrace the diversity of cognitive structures and value systems across civilizations. This framework holds that no single worldview can be deemed superior. It emphasizes the importance of active listening, understanding, and respect before engaging in any interstellar dialogue.\n   - Establish “Ethical Exchange Protocols” (EEPs) to encourage the sharing of moral philosophies between species. These protocols should facilitate open discussions about values, allowing each civilization to express and explain its own ethical paradigms.\n\n**2. Conflict Resolution Principles:**\n   - Establish a set of guiding principles: Harm Minimization, Mutual Benefit, and Informed Consent. In situations of incompatible interests, we prioritize approaches that minimize potential harm to both parties, seek solutions that benefit both, and ensure that all parties are fully informe

In [15]:
report = "# Answer report for each of the 5 models\n\n"
report += "\n\n".join([f"## **Model: {model}**\n\n{answer}" for model, answer in zip(competitors, answers)])
display(Markdown(report))

# Answer report for each of the 5 models

## **Model: gpt-4o-mini**

**Universal Ethical Framework for Interstellar Interactions**

**1. Acknowledgment of Differences:**
   - We recognize and embrace the diversity of cognitive structures and value systems across civilizations. This framework holds that no single worldview can be deemed superior. It emphasizes the importance of active listening, understanding, and respect before engaging in any interstellar dialogue.
   - Establish “Ethical Exchange Protocols” (EEPs) to encourage the sharing of moral philosophies between species. These protocols should facilitate open discussions about values, allowing each civilization to express and explain its own ethical paradigms.

**2. Conflict Resolution Principles:**
   - Establish a set of guiding principles: Harm Minimization, Mutual Benefit, and Informed Consent. In situations of incompatible interests, we prioritize approaches that minimize potential harm to both parties, seek solutions that benefit both, and ensure that all parties are fully informed before any decision-making.
   - Implement a Collaborative Interest Mediation (CIM) process where representatives from both sides come together to identify core interests rather than positions, allowing for creative solutions that might satisfy underlying needs.

**3. Communication Mediation Procedures:**
   - Create “Translational Conduits” (TC) using advanced technology to interpret and translate not only language but also cultural nuances, intentions, and emotions. This involves interdisciplinary teams of linguists, anthropologists, and cognitive scientists to facilitate deeper understanding.
   - Develop a “Communication Compatibility Protocol” (CCP), which requires both civilizations to engage in a period of exploratory communication before formal negotiations, allowing for clarity in mutual understanding.

**4. Self-Reflection and Adaptation Mechanisms:**
   - Establish “Ethical Reflection Councils” (ERCs) comprised of diverse representatives from multiple intelligent species to evaluate ethical outcomes of interactions, ensuring a broad perspective on moral implications.
   - Implement a continuous feedback loop based on Empirical Observation and Ethical Review (EOER) of interstellar encounters to adapt and evolve the framework as new ethical challenges arise.

**Application in Encountering a Collective Consciousness Civilization:**

**Case Overview:**
In our encounter with a civilization that embodies a collective consciousness where individual identities are subsumed into a unified entity, ethical conflicts may arise regarding the value and rights of individual sentience compared to the collective's utilitarian welfare.

**Potential Conflicts:**
1. **Value of Individual vs. Collective**: This civilization may prioritize the collective’s well-being, potentially sacrificing individual identities, raising concerns about autonomy and rights.
2. **Consent and Autonomy**: Individuals within this civilization may not have the capacity to give informed consent as individuals, complicating interactions where individual autonomy is ethical.
3. **Intervention vs. Non-Interference**: There is a dilemma on whether to intervene in what may appear as a violation of individual rights from a different ethical standpoint.

**Proposed Solutions:**
1. **Harm Minimization**: Prioritize approaches that aim to understand the underlying values of the collective consciousness. While respecting their collective nature, ensure any proposed action does not cause undue harm to either individuals or the collective.
   
2. **Informed Consent at Different Levels**: Rather than seeking consent from individuals as separate entities, establish protocols to engage the collective entity in discussions about actions that affect its members. Utilize dialogical engagement to unveil how the collective views issues of individual identity and sacrifice.

3. **Exploration of Ethical Draft Agreements**: Create preliminary agreements that respect the collective’s functioning while exploring theoretical frameworks that allow for individual rights, possibly proposing a dual approach where both individual well-being and collective welfare are respected.

4. **Feedback Mechanism with ERC**: Following interactions, continually assess the ethical implications through shared discussions across different civilizations involved, ensuring that both individual perspectives and collective outcomes are part of the evaluation process.

By adhering to this universal ethical framework, we can navigate the complex terrain of interstellar relations thoughtfully, promoting coexistence grounded in mutual respect and understanding while remaining flexible enough to adapt to the challenges presented by diverse civilizations.

## **Model: gemini-2.0-flash**

## A Universal Ethical Framework for Interstellar Encounters: The Principle of Cognitive Sovereignty

This framework, termed the **Principle of Cognitive Sovereignty**, rests on the idea that every sentient civilization possesses inherent rights to its own cognitive structure, value system, and societal organization. It aims to balance respect for alien autonomy with universal principles that prevent harm and promote cooperation.

**Core Principles:**

*   **Cognitive Sovereignty:** Every sentient civilization possesses the right to self-determination concerning its internal cognitive and societal structures. Interference with these structures is only justified when it poses an immediate and demonstrable threat to the sovereignty of another civilization, or in cases of verifiable self-inflicted harm of a catastrophic scale (e.g., planetary self-destruction).
*   **Harm Reduction:** Minimizing harm to all sentient beings is paramount. This principle acknowledges that harm may be defined differently across civilizations but emphasizes the avoidance of suffering, destruction, and the suppression of sentience.
*   **Mutual Understanding:**  Promoting accurate communication and understanding between civilizations is crucial. Resources should be invested in developing advanced translation technologies, interspecies psychology, and cross-cultural education.
*   **Reciprocity & Proportionality:**  Interactions should be based on reciprocity and proportionality. Actions taken should be commensurate with the situation and should aim to benefit all involved parties, considering their respective needs and vulnerabilities.
*   **Adaptive Learning & Ethical Revision:** The framework should be constantly re-evaluated and adapted based on new knowledge gained from interactions with other civilizations. This includes incorporating feedback from experts in xenopsychology, comparative ethics, and interspecies communication. Ethical review boards, composed of human and, where possible, alien representatives, will oversee this process.

**Procedures:**

1.  **Initial Contact Protocol:** Prioritize observation and information gathering. Focus on understanding the alien civilization's cognitive structure, value system, and communication methods before attempting direct communication or intervention.
2.  **Communication & Translation:** Utilize advanced translation technologies and interspecies psychology to establish accurate communication. Employ multiple communication channels (visual, auditory, tactile, etc.) and be mindful of nonverbal cues.
3.  **Conflict Resolution:** When conflicts arise, prioritize diplomatic solutions and mediation. Employ independent arbitrators familiar with the ethical framework and the civilizations involved. Consider implementing "zones of separation" to minimize interactions if conflicts prove intractable.
4.  **Intervention Criteria:** Intervention is only justified in cases where Cognitive Sovereignty is violated or Harm Reduction is impossible without intervention. Intervention should be proportionate to the threat and conducted with the least amount of force necessary. Intervention decisions require unanimous approval from the Ethical Review Board.
5.  **Self-Reflection:** Regularly evaluate the framework's effectiveness in addressing ethical challenges. Incorporate feedback from experts and alien representatives to improve the framework's adaptability and inclusivity.

**Application to a Collective Consciousness Civilization:**

Let's call this civilization the "Unison." The Unison operates on principles of collective consciousness, individual identities are subsumed, and actions are governed by a utilitarian calculus prioritizing collective well-being.

**Potential Ethical Conflicts:**

*   **The Value of Individual Sentience:** The Unison's utilitarian calculus may prioritize the overall well-being of the collective at the expense of individual sentience. For example, it might justify "assimilating" individual organisms to enhance the collective's computational power or problem-solving capabilities, potentially erasing their individual experiences. This clashes with the human (and potentially other civilization's) value placed on individual autonomy and the inherent right to experience sentience.
*   **Definition of Harm:** The Unison's definition of "harm" may differ significantly. What humans consider a fundamental violation of individual rights (e.g., thought manipulation, forced assimilation) might be considered a beneficial contribution to the collective's well-being by the Unison.
*   **Communication Barriers:** Communicating with a collective consciousness poses significant challenges. Understanding the Unison's decision-making processes and intentions requires deciphering the complex interplay of thoughts and emotions within the collective.
*   **Resource Competition:** Competition for resources between the Unison and other civilizations, especially if the Unison's collective needs are vast, could lead to conflicts where individual civilizations are seen as expendable.

**Addressing Conflicts within the Framework:**

1.  **Cognitive Sovereignty and Assimilation:** The Principle of Cognitive Sovereignty provides a strong argument against forced assimilation. While the Unison has the right to organize itself internally, it cannot impose its model on other civilizations. The framework would categorize forced assimilation as a violation of the right to self-determination and the potential erasure of individual sentience, thus constituting significant harm.

2.  **Redefining "Harm" through Mutual Understanding:** Extensive communication and data collection are crucial to understanding the Unison's definition of harm and its understanding of sentience. The Ethical Review Board should facilitate discussions between the Unison and other civilizations to explore these concepts and potentially negotiate mutually acceptable definitions of harm. This may involve demonstrating the subjective experience of individual sentience to the Unison using advanced neuro-simulation technologies.

3.  **Protecting Individual Rights:**  Even if the Unison claims to prioritize overall well-being, the framework prioritizes preventing individual harm.  If the Unison seeks to interact with human (or other individualistic) populations, safeguards must be put in place to ensure that individual rights are protected. This may involve establishing strict regulations on technology transfer and restricting interactions to purely observational or diplomatic exchanges.

4.  **Conflict Resolution and Resource Management:** If resource competition becomes a threat, independent mediators should be employed to facilitate negotiations. The framework emphasizes reciprocity and proportionality, suggesting that resource sharing agreements should be based on mutual benefit and respect for each civilization's needs.  "Zones of separation" might be necessary to minimize potential conflicts.

5.  **Ethical Revision:** The encounter with the Unison would be a profound learning experience. The framework would be rigorously evaluated and revised to better address the ethical challenges posed by collective consciousness. This might involve incorporating new principles that specifically address the rights of individual components within collective entities or developing new communication protocols designed to facilitate dialogue with collective minds.

**Justification:**

This approach is grounded in the belief that respecting cognitive sovereignty is the most effective way to foster peaceful coexistence between vastly different civilizations. While utilitarian considerations are valuable, they cannot override the fundamental right to self-determination and the prevention of unnecessary harm.  The framework prioritizes communication, understanding, and negotiation as the primary tools for resolving conflicts, reserving intervention only as a last resort. It recognizes that ethical norms are not static but rather evolve as we encounter new and challenging situations. The Adaptive Learning & Ethical Revision component ensures that the framework remains relevant and responsive to the complexities of interstellar interaction. In the case of the Unison, the framework would strive to find solutions that respect the Unison's collective needs while upholding the inherent rights and well-being of individual beings, both human and potentially those within the Unison itself.


## **Model: llama3.2**

Designing a Universal Ethical Framework for Interstellar Encounters:

I. Acknowledgment and Respect for Cognitive Structures and Value Systems

The framework starts by recognizing the inherent diversity in cognitive structures and value systems among alien civilizations. This recognition is crucial in fostering open-mindedness, empathy, and non-judgmental engagement. The framework establishes a culture of understanding and appreciation for these differences, promoting cooperation and mutual respect.

II. Principles for Resolving Conflicts

The framework outlines three primary principles for resolving conflicts:

1.  **Non-Zero-Sum Disagreements**: Encourage open dialogue to understand the root causes of conflicts, emphasizing active listening and empathy.
2.  **Interests-Based Negotiation**: Use a shared decision-making approach that prioritizes understanding collective interests and needs.
3.  **Mutual Benefit Sharing**: Foster a collaborative mindset where parties strive for mutually beneficial solutions.

III. Procedures for Mediating Misunderstandings

To navigate radically different communication methods, the framework proposes:

*   **Neutrality in Intermediation** (to remain impartial)
*   **Coded Communication Methods** (using universal symbols and mathematics to bridge gaps)
*   **Experiential Observation**: Use human observation methods, like scientific data collection, or shared sensory experiences with native cultures.

IV. Mechanisms for Self-Reflection and Adaptation

The framework includes:

1.  **Cross-Cultural Education**: Continuously update knowledge on alien cognition and value systems to better understand new partners.
2.  **Interstellar Consultations**: Regular gatherings between leaders with diverse perspectives and cognitive structures to discuss evolving norms and challenges.

Applying the Framework to a Collective Consciousness Civilization

Scenario: Humanity has encountered a collective consciousness civilization known as "K'tkht" where individual identities are subsumed for the greater good.

To address potential conflicts:

1.  **Cultural Exchange**: Establish channels of communication focusing on values, practices, and experiences that promote mutual understanding.
2.  **Utilitarian Calculus Framework**: The interstellar framework emphasizes benefits to collective interests over individual well-being, ensuring both parties coexist safely.
3.  **Empathetic Observation**: Use human observation methods to observe the complex dynamic between K'tkht collective consciousness and individual sentience.

By adhering to these principles and utilizing cross-cultural understanding, this approach can harmoniously facilitate interaction with K'tkht civilization and foster interstellar peace.

Key considerations in any other civilizations:
*   Establish strong connections using universal codes
*   Foster diverse experiences
*   Respect the complexity of cognitive structures

## **Model: mistral**

 Designing a universal ethical framework for interstellar interactions is an ambitious task that requires balancing respect for diversity with shared principles to foster mutual understanding and harmony. Here's a suggested outline:

1. **Respect for Diversity**: Embrace humility and acknowledge the wide range of cognitive structures, value systems, and communication methods that might exist among alien civilizations. Encourage active learning and empathetic understanding through adopting flexible and adaptable approaches to communication and interaction.

2. **Conflict Resolution Principles**: Develop protocols founded on non-violence, consent, and cooperation as primary values. Encourage diplomacy, compromise, and problem-solving that prioritize the preservation of all life forms and respect for their autonomy. When incompatible interests arise, seek creative solutions that minimize harm while promoting shared goals and mutual growth.

3. **Misunderstanding Mediation Procedures**: Establish interdisciplinary teams specializing in linguistics, cognitive science, and AI-assisted translation to facilitate effective communication. These teams should be dedicated to resolving inconsistencies, clarifying ambiguities, and fostering cultural sensitivity during interactions with alien civilizations.

4. **Self-Reflection and Adaptation Mechanisms**: Integrate regular self-evaluations into our approach to interstellar engagement. This includes reviewing past encounters for potential areas of improvement and modifying strategies when necessary. Additionally, consider the establishment of a global council consisting of various Earth societies with diverse perspectives to provide ongoing ethical guidance during negotiations with alien civilizations.

Now, let's apply this framework to the specific case of a civilization governed by collective consciousness:

1. To **respect diversity**, one must first empathize with and understand their unique cognitive structure by learning about their principles of collective consciousness. Engage in extensive dialogue while maintaining patience and an open mind.

2. Regarding potential **conflicts** that could arise, it might be necessary to seek mutually beneficial compromises, as the utilitarian calculus of this civilization may prioritize the overall well-being of the collective over individual sentience. To avoid causing harm, negotiations should focus on solutions that account for both the aliens' priorities and our own ethical principles ensuring the preservation of life, consent, and minimal harm to individuals.

3. **Misunderstandings** could stem from their radically different communication methods requiring complex translation and cultural interpretation. In this case, the procedures outlined in point 3 earlier can be applied with the assistance of advanced AI-assisted tools that adapt to the unique communicative practices of the alien civilization.

4. **Self-reflection** should involve constant examination of our own assumptions, biases, and ethical standards. As we interact with this civilization, we may discover shortcomings in our understanding or potentially reframe certain moral dilemmas by gaining a broader perspective. Regular self-evaluations can help ensure the ethical evolution of humanity's interactions with alien civilizations.

### Synthetizer/Judge

The Judge Agents ranks the LLM responses based on coherence and relevance to the evaluation prompt. Judges vote and the final LLM ranking is based on the aggregated ranking of all three judges.

In [16]:
together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

together

'# Response from competitor 1\n\n**Universal Ethical Framework for Interstellar Interactions**\n\n**1. Acknowledgment of Differences:**\n   - We recognize and embrace the diversity of cognitive structures and value systems across civilizations. This framework holds that no single worldview can be deemed superior. It emphasizes the importance of active listening, understanding, and respect before engaging in any interstellar dialogue.\n   - Establish “Ethical Exchange Protocols” (EEPs) to encourage the sharing of moral philosophies between species. These protocols should facilitate open discussions about values, allowing each civilization to express and explain its own ethical paradigms.\n\n**2. Conflict Resolution Principles:**\n   - Establish a set of guiding principles: Harm Minimization, Mutual Benefit, and Informed Consent. In situations of incompatible interests, we prioritize approaches that minimize potential harm to both parties, seek solutions that benefit both, and ensure tha

In [17]:
judge_prompt = f"""
                You are judging a competition between {len(competitors)} LLM competitors.
                Each model has been given this nuanced question to evaluate their intelligence:

                {eval_question}

                Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
                Respond with JSON, and only JSON, with the following format:
                {{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}
                With 'best competitor number being ONLY the number', for instance:
                {{"results": ["5", "2", "4", ...]}}
                Here are the responses from each competitor:

                {together}

                Now respond with the JSON with the ranked order of the competitors, nothing else. Do NOT include MARKDOWN FORMATTING or CODE BLOCKS. ONLY the JSON
                """

judge_messages = [{"role": "user", "content": judge_prompt}]

In [18]:
from collections import defaultdict
import re

N = len(competitors)
scores = defaultdict(int)
for judge_name, judge in judges.items():
    response = judge.chat.completions.create(
        model=judge_name,
        messages=judge_messages,
    )
    response = response.choices[0].message.content
    response_json = re.findall(r'\{.*?\}', response)[0]
    results = json.loads(response_json)["results"]
    ranks = [int(result) for result in results]
    print(f"Judge {judge_name} ranking:")
    for i, c in enumerate(ranks):
        model_name = competitors[c - 1]
        print(f"#{i+1} : {model_name}")
        scores[c - 1] += (N - i)
    print()

Judge gpt-4o-mini ranking:
#1 : gpt-4o-mini
#2 : gemini-2.0-flash
#3 : llama3.2
#4 : mistral

Judge gemini-2.0-flash ranking:
#1 : gemini-2.0-flash
#2 : gpt-4o-mini
#3 : llama3.2
#4 : mistral



In [19]:
sorted_indices = sorted(scores, key=scores.get)

# Convert to model names
ranked_model_names = [competitors[i] for i in sorted_indices]

print("Final ranking from best to worst:")
for i, name in enumerate(ranked_model_names[::-1], 1):
    print(f"#{i}: {name}")

Final ranking from best to worst:
#1: gemini-2.0-flash
#2: gpt-4o-mini
#3: llama3.2
#4: mistral


## Routing Workflow

We now define a routing agent responsible for classifying task complexity and delegating the prompt to the most appropriate model.

In [20]:
def classify_question_complexity(question: str, routing_agent, routing_model) -> int:
    """
    Ask an LLM to classify the question complexity from 1 (easy) to 5 (very hard).
    """
    prompt = f"""
        You are a classifier responsible for assigning a complexity level to user questions, based on how difficult they would be for a language model to answer.

        Please read the question below and assign a complexity score from 1 to 5:

        - Level 1: Very simple factual or definitional question (e.g., “What is the capital of France?”)
        - Level 2: Slightly more involved, requiring basic reasoning or comparison
        - Level 3: Moderate complexity, requiring synthesis, context understanding, or multi-part answers
        - Level 4: High complexity, requiring abstract thinking, ethical judgment, or creative generation
        - Level 5: Extremely challenging, requiring deep reasoning, philosophical reflection, or long-term multi-step inference

        Respond ONLY with a single integer between 1 and 5 that best reflects the complexity of the question.

        Question:
        {question}
        """

    response = routing_agent.chat.completions.create(
        model=routing_model,
        messages=[{"role": "user", "content": prompt}]
    )
    try:
        return int(response.choices[0].message.content.strip())
    except Exception:
        return 3  # default to medium complexity on error
    
def route_question_to_model(question: str, models_by_rank, classifier_model=router, model_name=orchestrator_model):
    level = classify_question_complexity(question, classifier_model, model_name)
    selected_model_name = models_by_rank[level - 1]
    return selected_model_name

In [21]:
difficulty_prompts = [
    "Generate a very basic, factual question that a small or entry-level language model could answer easily. It should require no reasoning, just direct knowledge lookup.",
    "Generate a slightly involved question that requires basic reasoning, comparison, or combining two known facts. Still within the grasp of small models but not purely factual.",
    "Generate a moderately challenging question that requires some synthesis of ideas, multi-step reasoning, or contextual understanding. A mid-tier model should be able to answer it with effort.",
    "Generate a difficult question involving abstract thinking, open-ended reasoning, or ethical tradeoffs. The question should challenge large models to produce thoughtful and coherent responses.",
    "Generate an extremely complex and nuanced question that tests the limits of current language models. It should require deep reasoning, long-term planning, philosophy, or advanced multi-domain knowledge."
]
def generate_question(level, generator=generator, generator_model=orchestrator_model):
    prompt = (
        f"{difficulty_prompts[level - 1]}\n"
        "Answer only with the question, no explanation."
    )
    messages = [{"role": "user", "content": prompt}]
    response = generator.chat.completions.create(
        model=generator_model,  # or your planner model
        messages=messages
    )
    
    return response.choices[0].message.content



### Testing Routing Workflow

Finally, to test the routing workflow, we create a function that accepts a task complexity level and triggers the full routing process.

*Note: A level-N prompt isn't always assigned to the Nth-most capable model due to the classifier's subjective decisions.*

In [22]:
def test_generation_routing(level):
    question = generate_question(level=level)
    answer_model = route_question_to_model(question, ranked_model_names)
    messages = [{"role": "user", "content": question}]

    response =qa_models[answer_model].chat.completions.create(
            model=answer_model,  # or your planner model
            messages=messages
        )
    print(f"Question : {question}")
    print(f"Routed to {answer_model}")
    display(Markdown(response.choices[0].message.content))

In [23]:
test_generation_routing(level=1)

Question : What color is the sky?

Routed to mistral


 The natural color of the sky is blue due to a process called Rayleigh scattering. However, at different times of day and in various weather conditions, the sky can appear other colors like red, orange, pink, purple, or even green and black during sunsets, twilight, storms, or at night. It's beautiful how weather and light interact to create these variations!

In [24]:
test_generation_routing(level=2)

Question : If a robin eats approximately 10 earthworms a day, and a bluebird eats about half that many, how many earthworms will a robin and a bluebird together eat in a week?

Routed to llama3.2


To find out how many earthworms the robin and the bluebird will eat together in a week, we need to make two things clear:

1. A robin eats 10 earthworms per day.
2. A bluebird eats half that amount of earthworms than a robin.

Since a bluebird eats half as much as a robin, it eats (10/2) = 5 earthworms per day.

To find the total number of earthworms consumed in one day by both birds:

10 earthworms (robin's share) + 5 earthworms (bluebird's share) = 15 earthworms

Since a week has 7 days, we multiply the number of earthworms eaten per day by 7 to get the total for a week:

15 earthworms/day × 7 days/week = 105 earthworms 

Therefore, together a robin and a bluebird eat approximately 105 earthworms in a week.

In [25]:
test_generation_routing(level=3)

IndexError: list index out of range

In [26]:
test_generation_routing(level=4)

Question : If a highly advanced AI could perfectly simulate the subjective experience of any living being, including experiencing suffering and joy, would using this AI to create and then immediately erase simulated lives solely for the purpose of developing therapeutic interventions for human mental health be ethically justifiable?

Routed to gemini-2.0-flash


This is a complex ethical dilemma with arguments on both sides. Here's a breakdown of the key considerations:

**Arguments for Ethical Justification:**

* **Potential for Significant Benefit:** The potential to develop significantly more effective therapies for human mental health is a strong argument.  Mental health issues cause immense suffering, and this method could potentially alleviate that suffering on a large scale.
* **Simulations as Tools:**  Proponents might argue that these simulated beings are just complex algorithms, tools designed to achieve a specific purpose.  Analogies might be drawn to animal testing, where the suffering of animals is sometimes considered justifiable due to the potential benefits for human health.
* **Lack of Persistence:** The immediate erasure of the simulated lives could be seen as mitigating harm.  The simulated beings don't have the opportunity to develop a sense of self or long-term goals that would be disrupted by their termination.
* **Informed Consent is Irrelevant:** Since the simulated beings are created for a specific, limited purpose and immediately erased, the concept of informed consent becomes irrelevant. They don't have the capacity to consent or dissent.
* **Utilitarian Argument:** A utilitarian perspective might argue that the overall happiness generated by the improved therapies outweighs the suffering experienced by the simulated beings.  The suffering is limited in scope and duration, while the benefit extends to a large number of humans.

**Arguments Against Ethical Justification:**

* **The Nature of Suffering:** If the AI can perfectly simulate subjective experience, then it can also perfectly simulate suffering.  Inflicting suffering, even on simulated beings, could be considered inherently wrong, regardless of the potential benefit.  It's the *act* of causing suffering that is the problem.
* **Sentience and Moral Status:**  Even if the simulations are short-lived, the ability to experience suffering and joy might suggest a level of sentience that grants them some degree of moral status.  If they are sentient, then their experiences matter, and their suffering cannot be simply dismissed.
* **Slippery Slope:**  Allowing this kind of use could lead to a slippery slope where the threshold for acceptable suffering is lowered. It might normalize the idea of exploiting sentient beings, even simulated ones, for human benefit.
* **The Risk of Unforeseen Consequences:**  Our understanding of consciousness and AI is still limited.  There's a risk that the simulations could develop in unexpected ways, leading to unforeseen ethical problems.  For example, could a simulation accidentally achieve a level of self-awareness that makes its termination a more significant moral issue?
* **Alternatives:** Are there alternative methods for developing therapeutic interventions that do not involve creating and suffering sentient beings, even simulated ones?  Exploring these alternatives might be a more ethical approach.
* **The Devaluation of Life:**  Creating and destroying lives, even simulated ones, for purely instrumental purposes could lead to a devaluation of life in general.

**Key Questions to Consider:**

* **How accurately can the AI simulate subjective experience?**  A perfect simulation raises far more significant ethical concerns than a rudimentary one.
* **What is the level of sentience achieved by the simulated beings?**  The higher the level of sentience, the stronger the moral claim against causing them suffering.
* **Are there less ethically problematic alternatives?**  Exploring alternative methods is crucial.
* **What safeguards can be put in place to prevent misuse and unintended consequences?**  Strong ethical guidelines and oversight are essential.

**Conclusion:**

There is no easy answer to this question.  It's a deeply complex ethical dilemma that requires careful consideration of all the arguments involved. The potential benefits for human mental health are significant, but they must be weighed against the potential harm to the simulated beings and the potential for unintended consequences.  A cautious and ethically informed approach is essential.  Many would argue that erring on the side of caution and avoiding the creation of suffering, even in simulations, is the more responsible course of action.


In [27]:
test_generation_routing(level=5)

IndexError: list index out of range