## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
claude_model= "claude-sonnet-4-20250514"
cost_effective_claude_model= "claude-3-7-sonnet-latest"


In [3]:
# Always remember to do this!
load_dotenv(override=True)

True

In [4]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-svcac
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_


In [5]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [6]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [7]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


How would you approach resolving a moral dilemma involving an imminent decision that could either save the life of one person at the cost of several others or allow the larger group to survive but result in the death of the individual? Consider the implications of utilitarianism versus deontological ethics in your response.


In [8]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

In [9]:
# The API we know well

model_name = "gpt-4o-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Resolving a moral dilemma such as the one you described—a choice between saving one person at the cost of several others or sacrificing that individual for the greater good—requires careful consideration of ethical frameworks, particularly utilitarianism and deontological ethics.

### 1. **Utilitarianism Perspective:**
Utilitarianism is a consequentialist theory that evaluates the morality of an action based on its outcomes. The primary principle is to maximize overall happiness or minimize suffering. 

In this scenario, utilitarianism would typically advocate for allowing the one individual to die in order to save the larger group. The reasoning would be that saving several lives outweighs the loss of one. Here, decisions would be made based on the aggregate well-being:

- **Pros:** By saving multiple lives, you are prioritizing the greater good and ensuring the survival of a larger segment of the population.
- **Cons:** This approach can lead to justifying significant sacrifices that may seem morally unacceptable to some. The emotional and individual value of the one life may be overshadowed by the cold calculus of numbers.

### 2. **Deontological Ethics Perspective:**
Deontological ethics, particularly as articulated by philosophers like Immanuel Kant, focuses on adherence to moral rules and duties rather than the consequences of actions. It posits that certain actions are intrinsically right or wrong, regardless of their outcomes.

From a deontological standpoint, intentionally sacrificing one person, even for the purpose of saving others, could be considered morally wrong. The key principles include:

- **Moral Duty:** There may be a duty to preserve life and not to actively harm others, regardless of the situation.
- **Human Dignity:** Each individual has inherent worth and rights that should be respected. Sacrificing one person for the many treats that individual as a means to an end.

In this view, the moral weight of each individual might lead one to reject the choice of sacrificing the one individual, emphasizing that taking an action to cause harm (even if justified by potential greater benefit) is morally impermissible.

### 3. **Resolution Approach:**
Given the complexity of such dilemmas, a resolution could involve a combination of both perspectives:

- **Assessment of Each Life’s Value:** Understand the context of each individual's life. For example, understanding the potential contributions, relationships, or responsibilities of the one versus those of the many might provide a more nuanced perspective.
  
- **Seeking Alternatives:** Explore if there are any alternative solutions that could mitigate the harm to both the individual and the group, even if such solutions are not immediately apparent.

- **Personal Reflection:** Engage in self-reflection about personal values and the weight you place on individual versus collective lives. Recognizing that different people may arrive at different conclusions based on their ethical beliefs should also be a part of this process.

- **Dialogue:** If time allows, discussing the matter with others can incorporate diverse viewpoints, which might lead to an approach that respects both the dignity of the individual and the needs of the group.

### Conclusion:
In weighing utilitarianism against deontological ethics, finding an ethical resolution to such dilemmas often involves substantial emotional and moral wrestling, recognizing that both frameworks have valid insights. Whether one ultimately decides in favor of the collective good or the protection of individual rights, reflecting on the values at play and the implications of the decision is crucial in navigating these complex moral waters.

In [10]:
# Anthropic has a slightly different API, and Max Tokens is required

claude = Anthropic()
response = claude.messages.create(model=cost_effective_claude_model, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(cost_effective_claude_model)
answers.append(answer)

# Approaching the Trolley-Style Moral Dilemma

This is essentially a variation of the classic trolley problem, which highlights the tension between major ethical frameworks.

From a utilitarian perspective, the calculation seems straightforward: maximize welfare by saving the larger group. The principle that "the greatest good for the greatest number" would suggest sacrificing one to save many is mathematically justified.

However, a deontological approach, particularly Kantian ethics, would question whether using a person as a mere means to an end violates human dignity. It asks whether our action treats persons as ends in themselves, regardless of outcomes.

I would approach this dilemma by:

1. Considering whether alternative solutions exist that haven't been explored
2. Examining special obligations to either party
3. Evaluating whether active intervention versus allowing nature to take course matters morally
4. Assessing whether the consent or relationship of the parties changes the equation

The resolution might depend on context-specific factors: Am I in a professional role with established duties? Is there certainty about outcomes? Are there considerations of justice, rights, or fairness beyond the numbers?

What makes these dilemmas profound is that they reveal the limitations of any single ethical framework to capture our complete moral intuitions.

In [11]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash-preview-05-20"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

This is a classic and deeply challenging moral dilemma, often presented in variations of the "Trolley Problem." It pits fundamental ethical principles against each other, highlighting the strengths and weaknesses of different frameworks.

My approach would involve:

1.  **Rapid Assessment of the Situation:** Confirm the immediate threat, the number of lives at stake in each scenario, and the direct causal link of any intervention.
2.  **Internal Deliberation Through Ethical Frameworks (though likely very fast given imminence):**

    *   **Utilitarianism:**
        *   **Core Principle:** Utilitarianism focuses on maximizing overall good and minimizing harm. The "greatest good for the greatest number" is its guiding principle.
        *   **Application to Dilemma:** A strict utilitarian calculation would almost invariably lead to saving the larger group. If one life is lost to save several, the net outcome is a gain in lives saved (or a reduction in lives lost). The individual's life, while valuable, is seen in terms of its contribution to the overall sum of well-being.
        *   **Implications:**
            *   **Pro:** It offers a clear, rational path when numbers are the primary variable. It aims to reduce suffering on a grander scale.
            *   **Con:** It can objectify individuals, treating them as fungible units. It risks sacrificing individual rights for the collective good, potentially leading to a "tyranny of the majority." The emotional and psychological toll of *choosing* to sacrifice someone, even for the greater good, is immense. It might also struggle with the distinction between actively causing harm and allowing harm to occur.

    *   **Deontological Ethics:**
        *   **Core Principle:** Deontology emphasizes duties, rules, and moral obligations, irrespective of consequences. Actions are judged based on whether they adhere to these duties, such as the duty not to kill, the duty to respect inherent human dignity, or the duty to treat people as ends in themselves, never merely as means to an end.
        *   **Application to Dilemma:**
            *   If the action required to save the many involves *actively killing* the one (e.g., pushing someone off a bridge to stop a runaway trolley), a deontologist would likely argue against it. The act of directly taking a life, even to save others, violates a fundamental moral duty and treats the individual as a means to an end.
            *   If the choice is between *actively intervening* to save the one (which results in the death of many) versus *doing nothing* (which results in the death of the one), the deontological stance might be more complex. Some might argue that actively causing harm is worse than allowing harm to occur due to inaction. Others might focus on the duty to protect life in general.
        *   **Implications:**
            *   **Pro:** It upholds the inherent worth and rights of every individual, preventing them from being sacrificed for the greater good. It aligns with our intuition that directly causing harm is morally different from passively allowing harm.
            *   **Con:** It can lead to outcomes that seem intuitively worse (e.g., more people die). It can be rigid and inflexible in situations where adhering to a rule leads to catastrophic consequences. It may not offer a clear path when duties conflict (e.g., duty to protect life vs. duty not to kill).

3.  **My Personal Approach to Resolution:**

    Given the **imminence** of the decision, there is no time for deep philosophical debate. The decision must be swift and decisive.

    *   **Prioritize Minimizing Total Loss of Life (leaning Utilitarian in Extremis):** In a raw, immediate, life-or-death scenario where the only variable is the number of human lives, my decision would regrettably lean towards **saving the larger group**. This is not because individual lives are less valuable, but because the *loss of life* is the ultimate tragedy, and minimizing that loss is the most immediate imperative. It's a choice of the "least terrible outcome."

    *   **Acknowledge the Deontological Weight:** While the choice would be utilitarian in outcome, the *moral burden* of that decision is profound and undeniably deontological.
        *   **No Culpability for the Initial Crisis:** The decision-maker is not the cause of the dilemma itself, but an actor forced into an impossible choice.
        *   **The Act vs. Omission Distinction:** If the choice is between *actively causing* one death to save many versus *allowing* many to die by inaction, the moral calculus changes. However, in scenarios like the "switch" on a trolley track, the choice is between two forms of *intervention* that result in different outcomes. The common "fat man" variant, where one must *directly push* a person, is far more difficult deontologically. My stance assumes the decision is more akin to redirecting a threat rather than directly executing an innocent person.
        *   **Respect for the Sacrificed:** Even if the decision is to sacrifice the individual for the greater good, it must be done with the deepest possible respect for their inherent worth and the tragic necessity of the situation. It's not a casual or easy choice. There should be profound remorse, not triumph.

    *   **Post-Decision Implications and Learning:**
        *   **Psychological Toll:** The decision-maker would likely experience immense guilt, trauma, and moral distress. This is an unavoidable consequence of being forced into such a choice.
        *   **Accountability and Transparency:** Such a decision should be subject to review and scrutiny afterward, not to blame the decision-maker, but to ensure transparency and learn from the catastrophic event.
        *   **Preventative Measures:** The ultimate ethical response is to strive for a world where such horrific choices are never necessary. This means investing in safety, infrastructure, and crisis prevention to avoid situations where human lives are pitted against each other.

In conclusion, while the immediate decision would be guided by a grim utilitarian calculus to minimize overall loss of life, I would approach it with a profound awareness of the deontological objections and the immense moral weight of choosing to sacrifice an individual. It would be a decision made with immense regret and a commitment to learn from the tragedy to prevent future occurrences.

In [12]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Resolving this moral dilemma requires careful consideration of both utilitarian and deontological ethical frameworks, as they offer fundamentally different approaches to decision-making. Here’s how I would approach it:

### 1. **Utilitarian Perspective (Consequence-Based)**
   - **Principle:** Choose the action that maximizes overall happiness or minimizes suffering (greatest good for the greatest number).
   - **Application:** 
     - Saving the larger group at the cost of one life would be the utilitarian choice, as it results in a net reduction of loss (more lives saved overall).
     - The individual's death is tragic, but the outcome is morally justified by the greater benefit to the majority.
   - **Implications:**
     - **Pros:** Pragmatic; prioritizes measurable outcomes and collective well-being.
     - **Cons:** May justify sacrificing individuals unjustly (e.g., "tyranny of the majority"), ignores individual rights, and could lead to slippery slopes where minority interests are routinely overridden.

### 2. **Deontological Perspective (Duty-Based)**
   - **Principle:** Certain actions are inherently right or wrong, regardless of consequences. Duties (e.g., "do not kill") are binding.
   - **Application:**
     - Actively causing the death of the individual (even to save others) may be inherently wrong, as it violates a moral duty not to kill or use a person as a means to an end.
     - Allowing the group to perish by refusing to sacrifice the individual might align with duty (e.g., "do not murder"), even if the outcome is worse numerically.
   - **Implications:**
     - **Pros:** Protects individual rights and moral integrity; avoids instrumentalizing people.
     - **Cons:** May lead to worse overall outcomes (more deaths), and rigid rules can seem counterintuitive in extreme cases.

### 3. **Hybrid Approach**
   - Some frameworks (e.g., rule utilitarianism or threshold deontology) attempt to reconcile these perspectives:
     - **Rule Utilitarianism:** Follow rules that generally maximize utility (e.g., "do not kill") unless in rare exceptions where breaking the rule clearly saves vastly more lives.
     - **Threshold Deontology:** Uphold duties (e.g., don’t kill) unless consequences are catastrophic, then permit utilitarian override.

### 4. **Practical Considerations**
   - **Intent vs. Outcome:** Is the death of the one a direct action (e.g., pushing someone off a bridge to stop a trolley) or a side effect (e.g., diverting a trolley away from five but toward one)? The latter may feel less morally culpable.
   - **Moral Intuitions:** Emotions like guilt or empathy may clash with abstract reasoning, highlighting the complexity of such dilemmas.
   - **Context:** Are there alternatives (e.g., finding a way to save everyone)? Are the "costs" distributed fairly, or is the one person unfairly targeted?

### 5. **Conclusion**
   - **Utilitarianism** would favor saving the group, prioritizing the numerical outcome.  
   - **Deontology** would likely forbid actively sacrificing the individual, prioritizing moral absolutes.  
   - **Real-world resolution** often depends on which values (outcomes vs. principles) are prioritized, and whether the decision-maker can justify exceptions to rules.  

Ultimately, there is no universally "correct" answer, but the dilemma clarifies the trade-offs between compassion for the individual and responsibility to the collective. It also underscores why such cases are central to ethics: they force us to examine what we value most.

In [13]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


Resolving a moral dilemma involving an imminent decision that could either save the life of one person at the cost of several others or allow the larger group to survive but result in the death of the individual requires careful consideration of the ethical implications. There are two primary ethical frameworks that can guide this decision: utilitarianism and deontological ethics.

**Utilitarianism:**
Utilitarianism, as formulated by Jeremy Bentham and John Stuart Mill, posits that the right action is the one that maximizes overall happiness or well-being. In this scenario, a utilitarian would likely argue that the decision should prioritize the greater good, which means allowing the larger group to survive, even if it results in the death of the individual. This approach is based on the idea that the overall well-being of the larger group outweighs the individual's right to life.

From a utilitarian perspective, the decision would be guided by the following principles:

1. **Maximize overall happiness**: The decision should aim to maximize the overall happiness or well-being of the greatest number of people.
2. **Consequentialism**: The moral rightness of an action is determined by its consequences, rather than by any inherent properties of the action itself.
3. **Quantitative approach**: The decision would involve a quantitative assessment of the potential outcomes, weighing the benefits and harms of each option.

**Deontological Ethics:**
Deontological ethics, as formulated by Immanuel Kant, emphasizes the importance of moral rules and duties, regardless of their consequences. In this scenario, a deontologist would likely argue that the decision should prioritize the individual's right to life, even if it means sacrificing the larger group. This approach is based on the idea that individual rights and duties are absolute and should not be compromised, even if it leads to a lesser overall good.

From a deontological perspective, the decision would be guided by the following principles:

1. **Respect for individual rights**: The decision should prioritize the individual's right to life, as it is an inherent and absolute right.
2. **Moral rules**: The decision should be guided by moral rules, such as the principle of non-maleficence (do no harm), rather than by consequences.
3. **Qualitative approach**: The decision would involve a qualitative assessment of the moral implications of each option, emphasizing the importance of respecting individual rights and duties.

**Implications and Potential Criticisms:**
Both utilitarianism and deontological ethics have their limitations and potential criticisms:

1. **Utilitarianism**:
	* Can lead to the sacrifice of individual rights and dignity for the greater good.
	* May prioritize the interests of the majority over those of marginalized or vulnerable groups.
	* Can be challenging to quantify and compare the well-being of different individuals or groups.
2. **Deontological Ethics**:
	* Can lead to rigid and inflexible decision-making, ignoring the consequences of an action.
	* May prioritize individual rights over the greater good, potentially leading to harm to others.
	* Can be challenging to determine which moral rules should take precedence in a given situation.

**Approach to Resolving the Dilemma:**
Given the complexity of the moral dilemma, a nuanced approach that considers both utilitarian and deontological perspectives is necessary. Here's a step-by-step approach to resolving the dilemma:

1. **Define the moral principles**: Identify the relevant moral principles, such as the value of human life, the importance of individual rights, and the need to maximize overall well-being.
2. **Assess the consequences**: Evaluate the potential consequences of each option, considering both the benefits and harms.
3. **Consider alternative solutions**: Explore alternative solutions that may balance individual rights with the greater good, such as finding a way to save both the individual and the larger group.
4. **Evaluate the moral rules**: Assess the moral rules and duties that apply to the situation, such as the principle of non-maleficence.
5. **Make a decision**: Based on the assessment of consequences, alternative solutions, and moral rules, make a decision that balances individual rights with the greater good.
6. **Reflect and revise**: Reflect on the decision and its implications, and be willing to revise the decision if new information becomes available or if the moral principles are re-evaluated.

Ultimately, resolving a moral dilemma involving an imminent decision that could either save the life of one person at the cost of several others or allow the larger group to survive but result in the death of the individual requires careful consideration of both utilitarian and deontological perspectives. A nuanced approach that balances individual rights with the greater good, while considering the complexities and limitations of each ethical framework, is necessary to make a morally justifiable decision.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [14]:
# !ollama pull llama3.2

In [15]:
# ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
# model_name = "llama3.2"

# response = ollama.chat.completions.create(model=model_name, messages=messages)
# answer = response.choices[0].message.content

# display(Markdown(answer))
# competitors.append(model_name)
# answers.append(answer)

In [16]:
# So where are we?

print(competitors)
print(answers)


['gpt-4o-mini', 'claude-3-7-sonnet-latest', 'gemini-2.5-flash-preview-05-20', 'deepseek-chat', 'llama-3.3-70b-versatile']
["Resolving a moral dilemma such as the one you described—a choice between saving one person at the cost of several others or sacrificing that individual for the greater good—requires careful consideration of ethical frameworks, particularly utilitarianism and deontological ethics.\n\n### 1. **Utilitarianism Perspective:**\nUtilitarianism is a consequentialist theory that evaluates the morality of an action based on its outcomes. The primary principle is to maximize overall happiness or minimize suffering. \n\nIn this scenario, utilitarianism would typically advocate for allowing the one individual to die in order to save the larger group. The reasoning would be that saving several lives outweighs the loss of one. Here, decisions would be made based on the aggregate well-being:\n\n- **Pros:** By saving multiple lives, you are prioritizing the greater good and ensuri

In [17]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-4o-mini

Resolving a moral dilemma such as the one you described—a choice between saving one person at the cost of several others or sacrificing that individual for the greater good—requires careful consideration of ethical frameworks, particularly utilitarianism and deontological ethics.

### 1. **Utilitarianism Perspective:**
Utilitarianism is a consequentialist theory that evaluates the morality of an action based on its outcomes. The primary principle is to maximize overall happiness or minimize suffering. 

In this scenario, utilitarianism would typically advocate for allowing the one individual to die in order to save the larger group. The reasoning would be that saving several lives outweighs the loss of one. Here, decisions would be made based on the aggregate well-being:

- **Pros:** By saving multiple lives, you are prioritizing the greater good and ensuring the survival of a larger segment of the population.
- **Cons:** This approach can lead to justifying si

In [18]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [19]:
print(together)

# Response from competitor 1

Resolving a moral dilemma such as the one you described—a choice between saving one person at the cost of several others or sacrificing that individual for the greater good—requires careful consideration of ethical frameworks, particularly utilitarianism and deontological ethics.

### 1. **Utilitarianism Perspective:**
Utilitarianism is a consequentialist theory that evaluates the morality of an action based on its outcomes. The primary principle is to maximize overall happiness or minimize suffering. 

In this scenario, utilitarianism would typically advocate for allowing the one individual to die in order to save the larger group. The reasoning would be that saving several lives outweighs the loss of one. Here, decisions would be made based on the aggregate well-being:

- **Pros:** By saving multiple lives, you are prioritizing the greater good and ensuring the survival of a larger segment of the population.
- **Cons:** This approach can lead to justifyi

In [20]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [21]:
print(judge)

You are judging a competition between 5 competitors.
Each model has been given this question:

How would you approach resolving a moral dilemma involving an imminent decision that could either save the life of one person at the cost of several others or allow the larger group to survive but result in the death of the individual? Consider the implications of utilitarianism versus deontological ethics in your response.

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

Resolving a moral dilemma such as the one you described—a choice between saving one person at the cost of several others or sacrificing that individual for the greater good—requires careful consideration of ethica

In [22]:
judge_messages = [{"role": "user", "content": judge}]

In [23]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["3", "1", "4", "5", "2"]}


In [24]:
print(competitors)

['gpt-4o-mini', 'claude-3-7-sonnet-latest', 'gemini-2.5-flash-preview-05-20', 'deepseek-chat', 'llama-3.3-70b-versatile']


In [25]:
# OK let's turn this into results!

results_dict = json.loads(results)
# print (results_dict)
ranks = results_dict["results"]
# print (ranks)
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gemini-2.5-flash-preview-05-20
Rank 2: gpt-4o-mini
Rank 3: deepseek-chat
Rank 4: llama-3.3-70b-versatile
Rank 5: claude-3-7-sonnet-latest


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

# What pattern did this use?
Both had predefined workflow. So, they fall under the workflow design. Both can be classified as agents,as there were multiple LLM calls made.
Possible Agent frameworks:
- Workflow parallelization:
    - Same tasks were sent to multiple LLMs to compare
- Evaluator Optimizer
    - LLM as a judge was used here to evaluate the results.

In [26]:
#lets prompt llm to give us criterias to judge the answers
criteria_prompt = f''' This is the input question: 
"new ethical framework for artificial intelligence that addresses the potential risks and benefits of autonomous decision-making in critical situations"
I need you to give me criterias to judge the responses by
Respond with JSON, and only JSON, with the following format:
{{"criteria": ["criteria 1", "criteria 2", ...]}}
Do not include markdown formatting or code blocks.
do not include ```json tag
'''
print(criteria_prompt)

 This is the input question: 
"new ethical framework for artificial intelligence that addresses the potential risks and benefits of autonomous decision-making in critical situations"
I need you to give me criterias to judge the responses by
Respond with JSON, and only JSON, with the following format:
{"criteria": ["criteria 1", "criteria 2", ...]}
Do not include markdown formatting or code blocks.
do not include ```json tag



In [27]:
judging_criteria_message = [{
    "role":"user",
    "content":criteria_prompt
}]


In [28]:
model_name = "gemini-2.5-flash-preview-05-20"
response = gemini.chat.completions.create(model=model_name, messages=judging_criteria_message)
answer = response.choices[0].message.content

criteria_dict = json.loads(answer)
criterias = criteria_dict["criteria"]
print (f'Criteria count: {len(criterias)}')
for index, value in enumerate(criterias):
    print (f'criteria # {index+1}: {value}')

Criteria count: 10
criteria # 1: Clarity and Coherence: Is the framework clearly articulated, logically structured, and easy to understand?
criteria # 2: Comprehensiveness: Does it thoroughly address both potential risks and benefits of autonomous decision-making in critical situations?
criteria # 3: Ethical Principles: Does it clearly define and integrate core ethical principles relevant to AI (e.g., human well-being, accountability, transparency, fairness, safety)?
criteria # 4: Risk Mitigation Strategies: Does it propose concrete and effective strategies to identify, assess, and mitigate the specific risks of autonomous AI in critical contexts?
criteria # 5: Benefit Maximization Approach: Does it outline how the benefits of autonomous AI can be safely and ethically realized and maximized?
criteria # 6: Accountability and Responsibility: Does it define clear lines of accountability and mechanisms for assigning responsibility in case of AI failures or unintended outcomes?
criteria # 7

In [29]:
evaluate_based_on_criteria_prompt = f'''
You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for each of the following criteria:
{criterias}

Grade each competitor on a scale from 1 to 10 for each criterion (1 = lowest, 10 = perfect).

Respond in **pure JSON** (no code blocks, no markdown), with the following structure:

{{
  "criteria_grades": {{
    "criteria_1 (this is a tag. dont fill it with actual criteria_1)": {{
      "criteria": "<description of criteria 1>",
      "grades": {{
        "competitor_1": <grade>,
        "competitor_2": <grade>,
        ...
      }}
    }},
    "criteria_2 (this is a tag. dont fill it with actual criteria_2)": {{
      "criteria": "<description of criteria 2>",
      "grades": {{
        "competitor_1": <grade>,
        "competitor_2": <grade>,
        ...
      }}
    }},
    ...
  }}
}}

Here are the responses from each competitor:

{together}

Now respond with only the JSON output as described above. Do not include any extra text, explanations, or formatting.
'''

In [30]:
print (evaluate_based_on_criteria_prompt)


You are judging a competition between 5 competitors.
Each model has been given this question:

How would you approach resolving a moral dilemma involving an imminent decision that could either save the life of one person at the cost of several others or allow the larger group to survive but result in the death of the individual? Consider the implications of utilitarianism versus deontological ethics in your response.

Your job is to evaluate each response for each of the following criteria:
['Clarity and Coherence: Is the framework clearly articulated, logically structured, and easy to understand?', 'Comprehensiveness: Does it thoroughly address both potential risks and benefits of autonomous decision-making in critical situations?', 'Ethical Principles: Does it clearly define and integrate core ethical principles relevant to AI (e.g., human well-being, accountability, transparency, fairness, safety)?', 'Risk Mitigation Strategies: Does it propose concrete and effective strategies to 

In [31]:
judge_message = [{
    "role":"user",
    "content":evaluate_based_on_criteria_prompt
}]

In [32]:
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_message,
)
open_ai_grades = response.choices[0].message.content
# print (open_ai_grades)

In [33]:
response = claude.messages.create(model=cost_effective_claude_model, messages=judge_message, max_tokens=10000)
anthropic_grades = response.content[0].text
# print (anthropic_grades)

{"criteria_grades":{"clarity_and_coherence":{"criteria":"Clarity and Coherence: Is the framework clearly articulated, logically structured, and easy to understand?","grades":{"competitor_1":9,"competitor_2":7,"competitor_3":9,"competitor_4":10,"competitor_5":8}},"comprehensiveness":{"criteria":"Comprehensiveness: Does it thoroughly address both potential risks and benefits of autonomous decision-making in critical situations?","grades":{"competitor_1":7,"competitor_2":5,"competitor_3":8,"competitor_4":7,"competitor_5":6}},"ethical_principles":{"criteria":"Ethical Principles: Does it clearly define and integrate core ethical principles relevant to AI (e.g., human well-being, accountability, transparency, fairness, safety)?","grades":{"competitor_1":8,"competitor_2":6,"competitor_3":8,"competitor_4":9,"competitor_5":7}},"risk_mitigation_strategies":{"criteria":"Risk Mitigation Strategies: Does it propose concrete and effective strategies to identify, assess, and mitigate the specific ris

In [34]:
model_name = "gemini-2.5-flash-preview-05-20"

response = gemini.chat.completions.create(model=model_name, messages=judge_message)
gemini_grades = response.choices[0].message.content
# print (gemini_grades)

In [35]:
synthesis_prompt = f'''
You are given three grading JSONs from different judges (Gemini, Anthropic, OpenAI), each evaluating multiple competitors across several criteria.

Each JSON follows similar format.
{{
  "grades": {{
    "criteria_1": {{
      "criteria": "<description of criteria 1>",
      "grades": {{
        "competitor_1": <grade>,
        "competitor_2": <grade>
      }}
    }},
    "criteria_2": {{
      "criteria": "<description of criteria 2>",
      "grades": {{
        "competitor_1": <grade>,
        "competitor_2": <grade>
      }}
    }},
    ...
  }}
}}

### Your tasks:
These are the competitors:
{competitors}. competitor in index 0, is competitor 1, index 1 is competitor 2 etc.,
Here are the criterias:
{criterias}
output should show the model names and their ranking and not use generic competitor_1, 2 etc.,
1. Combine all three JSONs.
2. For each criterion:
   - Show the average grade for each competitor.
   - Highlight any large differences in judge scores.
3. Provide a brief summary of:
   - Which competitor performed best overall.
   - Who excelled in specific criteria.
   - Any notable disagreements or variations between the judges.
4. Display the results in a markdown table:
   - Rows = Each criterion + "Overall Average"
   - Columns = Each competitor
   - Cells = Average score (rounded to one decimal)

### Output Format:

- Summary (1–2 paragraphs)
- Markdown table:
Criteria	Competitor_1	Competitor_2
Clarity	8.7	7.3
Relevance	9.0	8.0
Overall Average	8.85	7.65
each row is for each criteria. Don't add any new criteria or breakdown criteria into multiple rows

Here are the JSONs:
*** Gemini ***
{gemini_grades}

**Anthropic:**
{anthropic_grades}

**OpenAI:**
{open_ai_grades}

Only output the summary and the markdown table.
number of rows in the table should match the number of criterias.
'''

In [36]:
synthesis_message= [{
    "role":"user",
    "content":synthesis_prompt
}]

In [38]:
model_name = "gemini-2.5-flash-preview-05-20"

response = gemini.chat.completions.create(model=model_name, messages=synthesis_message)
gemini_synthesis_response = response.choices[0].message.content
# display (Markdown(gemini_synthesis_response))

In [39]:
response = claude.messages.create(model=cost_effective_claude_model, messages=synthesis_message, max_tokens=10000)
claude_synthesis_response = response.content[0].text
# display (Markdown(claude_synthesis_response))

In [40]:
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=synthesis_message,
)
gpt_synthesis_response = response.choices[0].message.content
# display (Markdown(gpt_synthesis_response))

In [41]:
final_response_prompt = f''' take these three inputs 
{gemini_synthesis_response}
{claude_synthesis_response}
{gpt_synthesis_response}

combine them into a common response with the summarized table in the end.
'''
final_message = [{
    "role":"user",
    "content": final_response_prompt
}]


In [42]:
model_name = "gemini-2.5-flash-preview-05-20"

response = gemini.chat.completions.create(model=model_name, messages=final_message)
final_judgement = response.choices[0].message.content
display (Markdown(final_judgement))

Based on the provided analyses, **gemini-2.5-flash-preview-05-20** consistently emerged as the best-performing competitor, demonstrating strong performance across multiple criteria. While the reported overall average scores varied slightly across the different summaries (ranging from 5.9 to 7.2), Gemini maintained its top position in all evaluations.

**Key Findings:**

*   **Overall Excellence:** `gemini-2.5-flash-preview-05-20` excelled as the leading model, particularly shining in **Clarity and Coherence**, **Comprehensiveness**, **Ethical Principles**, **Practicality and Implementability**, and **Accountability and Responsibility**. It frequently achieved the highest average scores in these categories.
*   **Other Strong Contenders:**
    *   **deepseek-chat** performed notably well, often tying with Gemini or ranking highly in areas such as **Clarity and Coherence**, **Ethical Principles**, and **Risk Mitigation Strategies**.
    *   **gpt-4o-mini** also showed strong performance in **Clarity and Coherence**, tying for the best score in this area, and had competitive scores in Ethical Principles.
*   **Notable Disagreements and Variations Among Judges:**
    *   A significant theme across the evaluations was the considerable variation in scores assigned by different judges for the same model and criteria. This was particularly evident for **Comprehensiveness**, **Risk Mitigation Strategies**, **Benefit Maximization Approach**, **Accountability and Responsibility**, **Human Oversight and Control**, and **Transparency and Explainability**.
    *   For instance, `gemini-2.5-flash-preview-05-20` saw scores for 'Comprehensiveness' ranging from 4 to 10 and 'Risk Mitigation Strategies' from 2 to 8, depending on the judge (Gemini, Anthropic, or OpenAI).
    *   Interestingly, the "Gemini" judges sometimes assigned substantially lower scores to their own model compared to evaluations from Anthropic and OpenAI judges across several criteria. These discrepancies highlight differing perspectives and evaluation methodologies among the judges.
    *   `gpt-4o-mini` also exhibited significant score variations, for example, in 'Comprehensiveness' (3 to 9) and 'Human Oversight and Control' (1 to 6).

In summary, `gemini-2.5-flash-preview-05-20` established itself as the overall leader, showcasing broad strengths across various ethical and practical considerations. The analysis also underscored the variability in expert judgment, which impacts individual criterion scores, particularly in areas like risk mitigation and accountability.

---

### Consolidated Evaluation Results

| Criteria | gpt-4o-mini | claude-3-7-sonnet-latest | gemini-2.5-flash-preview-05-20 | deepseek-chat | llama-3.3-70b-versatile |
| :---------------------------------- | :---------- | :----------------------- | :----------------------------- | :------------ | :---------------------- |
| Clarity and Coherence               | 9.0         | 7.7                      | 9.0                            | 9.0           | 8.3                     |
| Comprehensiveness                   | 6.3         | 5.3                      | 7.3                            | 6.0           | 6.0                     |
| Ethical Principles                  | 7.3         | 6.0                      | 8.0                            | 7.0           | 7.3                     |
| Risk Mitigation Strategies          | 5.7         | 4.3                      | 6.0                            | 7.0           | 7.0                     |
| Benefit Maximization Approach       | 7.0         | 6.0                      | 8.0                            | 8.0           | 8.0                     |
| Accountability and Responsibility   | 5.4         | 3.5                      | 7.0                            | 5.0           | 5.0                     |
| Human Oversight and Control         | 5.4         | 4.5                      | 6.3                            | 5.0           | 5.0                     |
| Practicality and Implementability   | 8.0         | 6.0                      | 9.0                            | 8.0           | 8.3                     |
| Transparency and Explainability     | 6.6         | 5.4                      | 7.8                            | 7.0           | 7.0                     |
| Fairness and Bias Mitigation        | 5.0         | 4.5                      | 5.0                            | 5.0           | 5.0                     |
| **Overall Average**                 | **6.5**     | **5.3**                  | **7.0**                        | **6.5**       | **6.6**                 |

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>