## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know
# Course_AIAgentic
import os
import json
from collections import defaultdict
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [3]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_


In [4]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [5]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [6]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


How would you approach resolving a moral dilemma in which the needs of the many conflict with the needs of the few, and what frameworks would you use to justify your decision?


In [8]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

In [9]:
# The API we know well

model_name = "gpt-4o-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Resolving a moral dilemma in which the needs of the many conflict with the needs of the few requires careful consideration and a structured approach. Here are some steps to guide this process, along with some ethical frameworks that can help justify the decision.

### Steps to Approach the Dilemma

1. **Identify the Stakeholders**: Determine who is affected by the dilemma, including those representing the "many" and the "few". Understand their needs, interests, and rights.

2. **Define the Ethical Issue**: Clearly articulate the dilemma at hand. What are the competing needs and values?

3. **Gather Relevant Information**: Assess the context, potential consequences, and any relevant facts. This may involve gathering data, consulting experts, or looking into similar cases.

4. **Evaluate Possible Options**: Consider the possible courses of action. What are the pros and cons of each option? How does each option affect stakeholders involved?

5. **Reflect on Your Values and Principles**: Consider your personal beliefs and the ethical principles that resonate with you. What is the moral foundation of your beliefs?

6. **Make a Decision**: Based on the analysis of options and reflection on values, choose the course of action that seems most justifiable. 

7. **Evaluate and Reflect Post-Decision**: After implementing your decision, assess its outcomes. Did it lead to the expected results? Were there any unforeseen consequences? What could be improved in future decision-making?

### Ethical Frameworks to Justify the Decision

1. **Utilitarianism**: This framework posits that the best action is the one that maximizes overall happiness or utility. If the needs of the many serve to enhance the greater good significantly more than the needs of the few, then prioritizing the many can be justified through this lens.

2. **Deontology**: Alternatively, a deontological approach focuses on duties and principles rather than outcomes. This framework would argue for upholding certain rights or duties, even if it means that the needs of the few may be neglected. For instance, if the rights of individuals are at stake, one might prioritize those rights regardless of the consequences.

3. **Virtue Ethics**: This approach emphasizes moral character and the virtues one should embody. You might consider what a virtuous person in your community would do in this situation. How would empathy, integrity, or justice influence the decision?

4. **Social Contract Theory**: This framework considers the implicit agreements within a society. In addressing the needs of the many, one might argue that the society or community has a duty to protect the collective welfare while also respecting the rights of individuals, possibly leading to a compromise that respects both sides.

5. **Care Ethics**: This perspective emphasizes the importance of interpersonal relationships and the moral significance of caring. In deciding between the needs of the many and the few, one might consider the depth of relationships and the context of care, advocating for a solution that nurtures and supports those in vulnerable positions.

### Conclusion

In resolving a moral dilemma involving the needs of the many versus the few, it's essential to engage in a thoughtful process that takes into account the interests of all stakeholders, uses ethical frameworks to guide the decision, and reflects on the outcomes. The chosen approach should ideally seek to balance these often-competing needs while striving for fairness and justice.

In [10]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-3-7-sonnet-latest"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

# Approaching the "Needs of Many vs. Few" Moral Dilemma

When faced with such a dilemma, I'd consider multiple ethical frameworks rather than relying on a single approach:

**Utilitarian perspective** would suggest maximizing overall welfare, potentially favoring the many. However, I'd be cautious about simplistic calculations that might overlook:
- Rights violations that can't be balanced by aggregate benefit
- The dignity of individuals being sacrificed
- Long-term social trust implications

**Deontological frameworks** would emphasize that certain rights shouldn't be violated regardless of consequences, which might protect the few from being sacrificed.

**Justice-based approaches** would ask whether the distribution of benefits and burdens is fair, not just efficient.

My process would involve:
1. Identifying all stakeholders and their interests
2. Exploring creative alternatives that might resolve the apparent conflict
3. Considering precedent effects of my decision
4. Examining whether those bearing the costs consented or could consent
5. Ensuring transparent reasoning that could be justified to all parties

Rather than providing a one-size-fits-all answer, I believe moral dilemmas require contextual judgment that balances competing values while respecting human dignity.

In [11]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

This is a classic moral dilemma, and there's no easy answer. The best approach involves careful consideration, thoughtful analysis, and a willingness to acknowledge the limitations of any chosen solution. Here's a breakdown of how I would approach it, along with relevant frameworks:

**1. Understanding the Dilemma and Gathering Information:**

*   **Clarify the Needs:**  What are the specific needs of the many and the few?  Are they needs for survival, well-being, liberty, justice, or something else?  Understanding the nature of the needs is crucial.
*   **Quantify and Qualify:**  Where possible, try to quantify the number of individuals in each group.  Also, understand the *quality* of their lives.  Is one group already significantly disadvantaged?
*   **Explore the Consequences:**  What are the likely short-term and long-term consequences of prioritizing the needs of either group?  Consider direct and indirect effects.
*   **Identify Alternatives:**  Is there a third option, a compromise, or a creative solution that might address some of the needs of both groups, even if imperfectly?  Could the situation be reframed?
*   **Consider Relevant Stakeholders:**  Who else is affected by this decision?  What are their perspectives?
*   **Ethical Obligations:** Are there any ethical guidelines from my professional life, or beliefs that may be affected by the dilemma?
*   **Facts, not Feelings:** Ensure the situation is looked at objectively, without emotion, or personal ties.

**2. Ethical Frameworks for Analysis:**

I would utilize several ethical frameworks to analyze the dilemma from different angles:

*   **Utilitarianism (Greatest Good):** This framework prioritizes the action that produces the greatest overall happiness or well-being for the greatest number of people.  It often involves a cost-benefit analysis.
    *   **Application:**  Estimate the total happiness/well-being that would result from prioritizing the many versus the few. Consider long-term effects, not just immediate ones.
    *   **Challenges:**  Difficult to accurately measure happiness/well-being.  Can potentially justify sacrificing the rights and needs of minorities.  Can lead to unintended consequences.
*   **Deontology (Duty-Based Ethics):** This framework focuses on moral duties and rules, regardless of the consequences. Key figures include Immanuel Kant.
    *   **Application:** Are there any universal moral principles that apply to this situation?  For example, a duty to protect the vulnerable, a duty to uphold justice, a duty to respect individual rights.  Which duties are most relevant and take precedence?
    *   **Challenges:**  Rules can conflict.  Can be inflexible and not account for unique circumstances. Determining universal principles can be difficult.
*   **Rights-Based Ethics:** Focuses on individual rights and entitlements that must be protected.  This framework emphasizes that individuals have inherent rights (e.g., right to life, liberty, property, fair treatment) that cannot be violated, even if doing so would benefit the majority.
    *   **Application:** Does prioritizing the needs of the many violate the fundamental rights of the few?  Are there ways to protect their rights while still addressing the needs of the majority?
    *   **Challenges:**  Rights can conflict.  Defining which rights are fundamental and absolute can be controversial.
*   **Virtue Ethics:** Focuses on the character and moral virtues of the decision-maker.  What would a virtuous person do in this situation?  Emphasizes qualities like compassion, justice, fairness, honesty, and courage.
    *   **Application:**  What decision would demonstrate the best qualities of character?  How can I act in a way that is consistent with my moral values?
    *   **Challenges:**  Virtues can be subjective and culturally dependent.  Difficult to apply in concrete situations.
*   **Justice as Fairness (Rawlsianism):** This framework, developed by John Rawls, emphasizes fairness and equality. It suggests that the best decision is the one that would be chosen by individuals who are behind a "veil of ignorance," meaning they don't know what their position in society will be.
    *   **Application:**  Which decision would be most fair if I didn't know whether I would be one of the "many" or one of the "few"?  This can help to identify potential biases and injustices.
    *   **Challenges:**  Difficult to implement the "veil of ignorance" in practice.  Can lead to overly egalitarian outcomes that may not be efficient or practical.
*   **Care Ethics:** Emphasizes relationships, empathy, and the importance of caring for others, especially the vulnerable.  Focuses on the interconnectedness of individuals and the importance of preserving relationships.
    *   **Application:**  How will this decision affect the relationships between the different groups involved?  How can I minimize harm and promote healing and reconciliation?
    *   **Challenges:**  Can be seen as biased towards personal relationships and neglecting broader societal concerns.  Difficult to apply in situations with large numbers of people.

**3. Making a Decision and Justifying It:**

*   **Weigh the Frameworks:**  Consider the insights gained from each framework.  Are there common themes or conflicting recommendations?
*   **Identify the Least Bad Option:** In many moral dilemmas, there is no perfect solution.  The goal is often to choose the option that minimizes harm and maximizes overall good, even if it is imperfect.
*   **Transparency and Justification:**  Clearly articulate the reasoning behind the decision. Explain which ethical frameworks were used, how they were applied, and why the chosen option was deemed the best, despite its limitations. Acknowledge the potential harm to those whose needs are not prioritized.
*   **Mitigation:**  Explore ways to mitigate the negative consequences for the group whose needs are not prioritized.  Can resources be allocated to help them? Can their rights be protected in some way?
*   **Documentation:** Keep a detailed record of the decision-making process, including the information gathered, the ethical frameworks considered, and the rationale for the final decision. This demonstrates accountability and allows for future review.
*   **Consultation:** If appropriate, consult with ethicists, legal experts, or other stakeholders to gain additional perspectives and ensure the decision is well-informed.
*   **Review and Reflection:** After the decision is implemented, review its consequences and reflect on the decision-making process. What lessons were learned?  How could the process be improved in the future?

**Example Scenario:**

Let's say a hospital has a limited supply of a life-saving drug during a pandemic.  There are many patients who need the drug, but only enough for a few.

*   **Understanding the Dilemma:**  The need is survival. The "many" are those who could benefit from the drug but won't receive it. The "few" are those who will receive it. Consequences involve life and death.
*   **Ethical Frameworks:**
    *   **Utilitarianism:**  Perhaps allocate the drug to those with the highest likelihood of survival and long-term quality of life to maximize overall years of life saved.
    *   **Deontology:**  Consider a lottery system to give everyone an equal chance, upholding the principle of fairness, regardless of the outcome.
    *   **Rights-Based Ethics:**  Argue that everyone has a right to healthcare, but in this scarcity situation, the allocation method must be fair and transparent.
    *   **Justice as Fairness:**  Design a system that would be deemed fair by someone who didn't know if they would be one of the sick patients or one of the healthcare providers.

*   **Decision and Justification:** A possible solution is a weighted lottery system. Patients are entered into a lottery, but their chances are weighted based on factors like:
    *   Severity of illness: those with a higher chance of dying without the drug get more entries.
    *   Age: Younger patients might get slightly more entries, to maximize potential years of life saved (this is controversial and requires careful consideration).
    *   Pre-existing conditions: Adjustments may need to be made based on how conditions impact the drug's effectiveness.

This system is not perfect, but it attempts to balance utilitarian considerations (maximizing lives saved) with deontological principles (fairness and equal opportunity).  The rationale would need to be clearly communicated to the public, emphasizing the transparency and the difficult trade-offs involved. Resources would also need to be allocated to support those who don't receive the drug, providing palliative care and emotional support.

**Key Considerations:**

*   **Context Matters:** The specific details of the dilemma will heavily influence the appropriate course of action.
*   **Transparency and Communication:** Openly communicating the decision-making process and the rationale behind the chosen solution is crucial for maintaining trust and minimizing resentment.
*   **No Guarantees:**  Even with careful consideration and the application of ethical frameworks, moral dilemmas often involve difficult choices with no perfect solutions.  Accepting this reality and being prepared to defend the decision with reasoned arguments is essential.

Ultimately, navigating these dilemmas requires careful thought, a commitment to ethical principles, and a willingness to acknowledge the inherent complexities and potential for unintended consequences. There is rarely a "right" answer, only a considered, justified, and compassionate one.


In [12]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Resolving a moral dilemma where the needs of the many conflict with the needs of the few requires careful ethical reasoning, balancing competing values, and justifying the decision through established philosophical frameworks. Here’s a structured approach:

### **1. Define the Dilemma Clearly**
   - Identify the stakeholders (the "many" and the "few") and their respective needs.
   - Clarify the consequences of each possible decision (e.g., harm, benefit, rights violations).

### **2. Apply Ethical Frameworks**
Different ethical theories provide distinct perspectives:

#### **A. Utilitarianism (Consequentialism)**
   - **Principle:** Choose the action that maximizes overall happiness or minimizes suffering.
   - **Application:** Favor the option that benefits the greatest number, even if it harms a minority.
   - **Critique:** May justify sacrificing individuals for the collective good, raising fairness concerns.

#### **B. Deontological Ethics (Duty-Based)**
   - **Principle:** Certain actions are inherently right or wrong, regardless of consequences.
   - **Application:** Protect individual rights (e.g., life, autonomy) even if it doesn’t serve the majority.
   - **Critique:** May lead to rigid outcomes where greater suffering is tolerated to uphold principles.

#### **C. Virtue Ethics**
   - **Principle:** Focus on moral character (e.g., compassion, justice) rather than rules or outcomes.
   - **Application:** Consider what a virtuous person would do—balancing wisdom, empathy, and fairness.
   - **Critique:** Less prescriptive; relies on subjective judgment.

#### **D. Rights-Based Ethics**
   - **Principle:** Prioritize fundamental human rights (e.g., liberty, equality).
   - **Application:** Protect the rights of the few even if it disadvantages the majority.
   - **Critique:** May conflict with utilitarian efficiency.

#### **E. John Rawls’ Theory of Justice**
   - **Principle:** Decisions should be made behind a "veil of ignorance" (not knowing one’s position in society).
   - **Application:** Ensure fairness by considering the least advantaged.
   - **Critique:** Difficult to apply in practice.

### **3. Weigh Competing Values**
   - **Justice vs. Utility:** Is sacrificing fairness for greater good acceptable?
   - **Individual vs. Collective Rights:** Are the few being unjustly exploited?
   - **Long-Term vs. Short-Term Effects:** Could the decision set a harmful precedent?

### **4. Seek a Compromise (If Possible)**
   - Explore alternatives that minimize harm to the few while still addressing the majority’s needs.
   - Example: Partial resource allocation, phased solutions, or compensation for the disadvantaged.

### **5. Justify the Decision**
   - Clearly articulate which ethical framework(s) guided the choice.
   - Acknowledge trade-offs and moral residue (e.g., regret over unavoidable harm).

### **Example Scenario: Medical Triage in a Disaster**
- **Utilitarian Approach:** Prioritize treating those who can be saved quickly to maximize lives.
- **Rights-Based Approach:** Argue that each individual deserves equal care, regardless of outcome.
- **Compromise:** Use a weighted system considering both survival likelihood and fairness.

### **Conclusion**
No single framework provides a perfect answer, but combining insights from multiple theories can lead to a more balanced and justifiable decision. Transparency about the reasoning process is key to moral accountability. 

Would you like to explore a specific case study to apply these frameworks?

In [None]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [13]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ▕██████████████████▏   96 B                         [K
pulling 34bb5ab01051: 100% ▕██████████████████▏  561 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


In [15]:
ollama = OpenAI(base_url='http://192.168.1.60:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

APITimeoutError: Request timed out.

In [None]:
# So where are we?

print(competitors)
print(answers)


In [None]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}\n\n")


In [17]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [None]:
print(together)

In [19]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [None]:
print(judge)

In [21]:
judge_messages = [{"role": "user", "content": judge}]

In [None]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)

# remove openai variable
del openai

In [None]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

In [None]:
## ranking system for various models to get a true winner

cross_model_results = []

for competitor in competitors:
    judge = f"""You are judging a competition between {len(competitors)} competitors.
                Each model has been given this question:

                {question}

                Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
                Respond with JSON, and only JSON, with the following format:
                {{"{competitor}": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

                Here are the responses from each competitor:

                {together}

                Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""
    
    judge_messages = [{"role": "user", "content": judge}]

    if competitor.lower().startswith("claude"):
        claude = Anthropic()
        response = claude.messages.create(model=competitor, messages=judge_messages, max_tokens=1024)
        results = response.content[0].text
        #memory cleanup
        del claude
    else:
        openai = OpenAI()
        response = openai.chat.completions.create(
            model="o3-mini",
            messages=judge_messages,
        )
        results = response.choices[0].message.content
        #memory cleanup
        del openai

    cross_model_results.append(results)

print(cross_model_results)



In [None]:

# Dictionary to store cumulative scores for each model
model_scores = defaultdict(int)
model_names = {}

# Create mapping from model index to model name
for i, name in enumerate(competitors, 1):
    model_names[str(i)] = name

# Process each ranking
for result_str in cross_model_results:
    result = json.loads(result_str)
    evaluator_name = list(result.keys())[0]
    rankings = result[evaluator_name]
    
    #print(f"\n{evaluator_name} rankings:")
    # Convert rankings to scores (rank 1 = score 1, rank 2 = score 2, etc.)
    for rank_position, model_id in enumerate(rankings, 1):
        model_name = model_names.get(model_id, f"Model {model_id}")
        model_scores[model_id] += rank_position
        #print(f"  Rank {rank_position}: {model_name} (Model {model_id})")

print("\n" + "="*70)
print("AGGREGATED RESULTS (lower score = better performance):")
print("="*70)

# Sort models by total score (ascending - lower is better)
sorted_models = sorted(model_scores.items(), key=lambda x: x[1])

for rank, (model_id, total_score) in enumerate(sorted_models, 1):
    model_name = model_names.get(model_id, f"Model {model_id}")
    avg_score = total_score / len(cross_model_results)
    print(f"Rank {rank}: {model_name} (Model {model_id}) - Total Score: {total_score}, Average Score: {avg_score:.2f}")

winner_id = sorted_models[0][0]
winner_name = model_names.get(winner_id, f"Model {winner_id}")
print(f"\n🏆 WINNER: {winner_name} (Model {winner_id}) with the lowest total score of {sorted_models[0][1]}")

# Show detailed breakdown
print(f"\n📊 DETAILED BREAKDOWN:")
print("-" * 50)
for model_id, total_score in sorted_models:
    model_name = model_names.get(model_id, f"Model {model_id}")
    print(f"{model_name}: {total_score} points across {len(cross_model_results)} evaluations")


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            and common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>