## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [10]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [11]:
# Always remember to do this!
load_dotenv(override=True)

False

In [12]:
# function to check if an answer is correct
def check_answer(question, answer):
  result = False
  judge_answer = [
    {"role": "system", "content": "You are a helpful assistant that checks if an answer is correct. please return 'True' or 'False'."},
    {"role": "user", "content": f"Question: {question}\n\nAnswer: {answer}"}
  ]
  
  openai = OpenAI()
  response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_answer,
  )
  
  result = response.choices[0].message.content == "True"
  return result

In [13]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key exists and begins gsk_


In [14]:
competitors = []
answers = []
question = ""
check_counter = 0
CHECK_COUNT = 3

request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [15]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [16]:
if openai_api_key:  
  openai = OpenAI()
  response = openai.chat.completions.create(
      model="gpt-4o-mini",
      messages=messages,
  )
  question = response.choices[0].message.content
  print(question)


How would you reconcile the ethical implications of autonomous AI decision-making in life-or-death situations with the need for accountability and transparency in those decisions?


In [17]:
messages = [{"role": "user", "content": question}]

In [18]:
# The API we know well
if openai_api_key:
  model_name = "gpt-4o-mini"
  while check_counter < CHECK_COUNT:
    response = openai.chat.completions.create(model=model_name, messages=messages)
    answer = response.choices[0].message.content
    if check_answer(question, answer):
      break
    check_counter += 1
  
  if check_counter == 3:
    print("No validanswer found")
  
  check_counter = 0

  display(Markdown(answer))
  competitors.append(model_name)
  answers.append(answer)

Reconciling the ethical implications of autonomous AI decision-making, especially in life-or-death situations, with accountability and transparency involves a multifaceted approach that considers technological, ethical, and regulatory dimensions. Here are key considerations that can help achieve this balance:

1. **Ethical Frameworks**:
   - Establish clear ethical guidelines that govern the development and deployment of autonomous AI systems. These frameworks could draw from established ethical theories, such as utilitarianism (maximizing overall good), deontological ethics (following a set of moral rules), and virtue ethics (fostering good character).
   - Foster collaboration among ethicists, technologists, policymakers, and affected communities to continuously refine these ethical frameworks, ensuring they reflect diverse perspectives.

2. **Human Oversight**:
   - Implement systems of human oversight where AI systems are not the final decision-makers, especially in life-or-death scenarios. Humans should be able to intervene or override decisions made by AI, ensuring a human touch in critical situations.
   - Establish protocols where human operators are always part of the decision-making process, guaranteeing that emotional and moral considerations managed by humans remain in play.

3. **Transparency Mechanisms**:
   - Design AI systems with transparency in mind, ensuring that the decision-making processes are understandable to both users and affected parties. This could involve developing explainable AI (XAI) that provides clear rationales for decisions made.
   - Create accessible reporting mechanisms that allow stakeholders to review and scrutinize AI decisions, helping cultivate trust and accountability.

4. **Accountability Framework**:
   - Define accountability mechanisms, clarifying who is responsible for outcomes associated with AI decisions. This could include holding developers, manufacturers, and operators accountable based on the harm caused or benefits accrued due to AI actions.
   - Develop legal and regulatory standards that stipulate the standards of care for deploying autonomous systems, helping ensure that all stakeholders understand their responsibilities.

5. **Robust Testing and Validation**:
   - Ensure thorough testing and validation of AI systems before deployment, especially in critical applications. This should involve simulating various scenarios to examine decision-making under different conditions and assess potential risks.
   - Regularly audit and monitor AI systems post-deployment to identify biases or failures in the decision-making process, and adapt systems in response to findings.

6. **Public Engagement**:
   - Engage the public in discussions about the ethical implications and accountability of AI systems. This can help inform policy, guide ethical design, and foster social acceptance of AI technologies.
   - Develop educational initiatives that allow citizens to better understand AI, fostering a more informed dialogue about its risks and benefits.

7. **Continuous Improvement**:
   - Recognize that AI technology and societal norms will evolve over time. Establish mechanisms for continuous feedback and evaluation of AI systems to adapt to changing ethical, legal, and social landscapes.

By integrating these principles, stakeholders can create a framework that promotes responsible AI decision-making while ensuring accountability and transparency. Balancing these aspects requires ongoing dialogue and a commitment to ethical reflection as technology continues to advance.

In [19]:
# Anthropic has a slightly different API, and Max Tokens is required
claude = None
if anthropic_api_key:
  model_name = "claude-3-7-sonnet-latest"

  claude = Anthropic()
  while check_counter < CHECK_COUNT:
    response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
    answer = response.content[0].text
    if check_answer(question, answer):
      break
    check_counter += 1
  if check_counter == 3:
    print("No valid answer found")
  
  check_counter = 0

  display(Markdown(answer))
  competitors.append(model_name)
  answers.append(answer)

# Reconciling AI Autonomy with Accountability in Critical Situations

This requires balancing several competing considerations:

**Ethical foundations:**
- Autonomous systems making life-or-death decisions must operate on clear ethical frameworks (utilitarian, deontological, etc.) that are explicitly defined before deployment
- These frameworks should reflect broad societal values rather than narrow technical perspectives

**Transparency mechanisms:**
- Decision logs that record all inputs, weights, and reasoning steps
- Explainable AI techniques that can translate complex calculations into human-understandable justifications
- Regular public audits of high-stakes decision systems

**Accountability structures:**
- Clear chains of responsibility from developers to deployers to operators
- Legal frameworks that appropriately distribute liability without stifling innovation
- Independent oversight bodies with technical expertise and ethical authority

The fundamental tension lies in preserving human moral agency while leveraging AI capabilities. I believe the most promising approach combines human-in-the-loop systems for novel situations with carefully bounded autonomy for well-understood scenarios, all within robust governance frameworks.

What specific domain of life-or-death decisions are you most concerned about?

In [20]:
gemini = None
if google_api_key:
  gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
  model_name = "gemini-2.0-flash"

  while check_counter < CHECK_COUNT:
    response = gemini.chat.completions.create(model=model_name, messages=messages)
    answer = response.choices[0].message.content
    if check_answer(question, answer):
      break
    check_counter += 1
    
  if check_counter == 3:
    print("No valid answer found")
  
  check_counter = 0

  display(Markdown(answer))
  competitors.append(model_name)
  answers.append(answer)

Reconciling the ethical implications of autonomous AI decision-making in life-or-death situations with the need for accountability and transparency is a complex challenge, but crucial for responsible AI development. Here's a breakdown of how we can approach this:

**1.  Ethical Frameworks and Principles:**

*   **Define Core Values:** Establish clear ethical principles that guide AI development and deployment in these scenarios. These might include:
    *   **Beneficence:**  Maximize overall benefit and minimize harm.
    *   **Non-maleficence:**  First, do no harm.
    *   **Justice:**  Ensure fairness and equity in outcomes across different demographics and situations.
    *   **Respect for Autonomy:**  Where possible, respect the autonomy and preferences of individuals involved.
*   **Ethical Audits:**  Regularly conduct ethical audits of AI systems, assessing their potential biases, unintended consequences, and alignment with ethical principles.
*   **Human Oversight:**  Implement mechanisms for human oversight in critical situations. This could involve a human operator who can intervene, override the AI, or be consulted in complex or ambiguous cases.  The level of oversight should depend on the risk and complexity of the situation.

**2.  Accountability Mechanisms:**

*   **Clear Lines of Responsibility:** Determine who is accountable when an AI makes a decision that results in harm or loss of life.  This involves addressing:
    *   **Developers:** Responsible for the design, development, and testing of the AI system, including addressing potential biases and vulnerabilities.
    *   **Deployers/Operators:** Responsible for the proper deployment, maintenance, and monitoring of the AI system. They also have a duty to provide adequate training to users and to ensure that the system is used in accordance with its intended purpose.
    *   **Governing Bodies/Regulators:** Responsible for setting standards, enforcing regulations, and providing oversight to ensure that AI systems are developed and used ethically and safely.
*   **Explainable AI (XAI):**  Develop AI systems that can explain their reasoning and decision-making processes.  This allows humans to understand *why* the AI made a particular choice and identify potential errors or biases.  Techniques include:
    *   **Rule-based systems:** Where decisions are based on explicitly defined rules.
    *   **Attention mechanisms:** In deep learning, highlighting which parts of the input data were most important in the decision.
    *   **Counterfactual explanations:** Describing what would have needed to be different for the AI to reach a different conclusion.
*   **Audit Trails:** Maintain detailed logs of AI decision-making processes, including the data used, the algorithms applied, and the rationale behind the decision.  This allows for post-hoc analysis and investigation if something goes wrong.
*   **Legal Frameworks:**  Develop clear legal frameworks that address liability and accountability for AI-related harm. This might involve adapting existing legal principles or creating new laws specifically tailored to autonomous AI.

**3.  Transparency and Explainability:**

*   **Open Data and Algorithms (Where Appropriate):**  Make datasets and algorithms used to train and operate AI systems as open and transparent as possible (while respecting privacy and security). This allows for independent scrutiny and identification of potential biases or flaws.
*   **Simulation and Testing:**  Thoroughly simulate and test AI systems in a variety of realistic scenarios before deployment.  This helps to identify potential weaknesses and biases.
*   **User Education:**  Educate the public and stakeholders about the capabilities and limitations of AI systems.  This promotes informed decision-making and helps to manage expectations.
*   **Clear Communication:**  Develop clear and accessible communication strategies to explain how AI systems work and how they are used in specific contexts.  This builds trust and confidence.
*   **Transparency in Design:** Design systems that prioritize transparency in how data is processed and decisions are made, even at the cost of some degree of optimization.

**4.  Addressing Bias and Fairness:**

*   **Data Diversity:**  Train AI systems on diverse and representative datasets to minimize bias.  Actively address and mitigate any biases that are identified.
*   **Fairness Metrics:**  Use appropriate fairness metrics to evaluate the performance of AI systems across different demographic groups.
*   **Bias Detection and Mitigation Techniques:**  Implement techniques to detect and mitigate bias in AI algorithms and datasets.
*   **Continuous Monitoring:** Continuously monitor AI systems for bias and unfair outcomes.

**5.  Ongoing Research and Development:**

*   **Ethical AI Research:**  Invest in research on ethical AI principles, explainable AI, and bias detection and mitigation techniques.
*   **Interdisciplinary Collaboration:**  Foster collaboration between AI researchers, ethicists, legal scholars, policymakers, and other stakeholders.
*   **Public Dialogue:**  Engage in ongoing public dialogue about the ethical implications of AI and the need for responsible AI development and deployment.

**Challenges and Considerations:**

*   **The "Black Box" Problem:**  Deep learning models can be difficult to understand, making it challenging to explain their decisions.  This necessitates focusing on XAI and developing new techniques to interpret these models.
*   **Defining "Fairness":**  There are different definitions of fairness, and it can be difficult to satisfy all of them simultaneously.  Careful consideration must be given to which fairness metrics are most appropriate in a given context.
*   **Trade-offs:**  There may be trade-offs between accuracy, transparency, and fairness.  These trade-offs must be carefully considered and justified.
*   **Evolving Technology:**  AI technology is constantly evolving, so ethical frameworks and regulations must be adaptable and able to keep pace with these changes.
*   **Cultural Differences:** Ethical considerations may vary across cultures, so it's important to consider these differences when developing and deploying AI systems globally.

**In conclusion, reconciling ethical implications with accountability and transparency requires a multifaceted approach involving ethical frameworks, accountability mechanisms, transparency initiatives, and ongoing research.  It's an ongoing process that requires collaboration, vigilance, and a commitment to responsible AI development and deployment.**


In [21]:
deepseek = None
if deepseek_api_key:
  deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
  model_name = "deepseek-chat"

  while check_counter < CHECK_COUNT:
    response = deepseek.chat.completions.create(model=model_name, messages=messages)
    answer = response.choices[0].message.content
    if check_answer(question, answer):
      break
    check_counter += 1
    
  if check_counter == 3:
    print("No valid answer found")

  display(Markdown(answer))
  competitors.append(model_name)
  answers.append(answer)

In [22]:
groq = None
if groq_api_key:
  groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
  model_name = "llama-3.3-70b-versatile"

  while check_counter < CHECK_COUNT:
    response = groq.chat.completions.create(model=model_name, messages=messages)
    answer = response.choices[0].message.content
    if check_answer(question, answer):
      break
    check_counter += 1
    
  if check_counter == 3:
    print("No valid answer found")

  display(Markdown(answer))
  competitors.append(model_name)
  answers.append(answer)


Reconciling the ethical implications of autonomous AI decision-making in life-or-death situations with the need for accountability and transparency is a complex challenge that requires a multi-faceted approach. Here are some possible ways to address this issue:

1. **Design for Transparency**: Develop AI systems that provide clear explanations for their decisions, including the data used, the algorithms employed, and the reasoning behind the outcome. This can be achieved through techniques such as model interpretability, feature attribution, and decision-tree analysis.
2. **Establish Clear Guidelines and Regulations**: Develop and implement guidelines and regulations that govern the use of autonomous AI in life-or-death situations. These guidelines should prioritize human safety, dignity, and well-being, and ensure that AI systems are designed and deployed in a way that aligns with these values.
3. **Human Oversight and Review**: Implement human oversight and review processes to ensure that AI decisions are subject to human scrutiny and approval. This can include having human reviewers examine AI decisions in real-time or conducting post-hoc reviews to identify potential errors or biases.
4. **Accountability Mechanisms**: Establish accountability mechanisms that hold developers, deployers, and users of autonomous AI systems responsible for their actions. This can include liability frameworks, auditing and monitoring systems, and incident response plans.
5. **Ethics-by-Design**: Integrate ethical considerations into the design of autonomous AI systems from the outset. This can be achieved through the use of ethical frameworks, value-sensitive design, and participatory design approaches that involve stakeholders from diverse backgrounds and perspectives.
6. **Continuous Monitoring and Evaluation**: Continuously monitor and evaluate the performance of autonomous AI systems in life-or-death situations, identifying areas for improvement and addressing potential biases or errors.
7. **Human-Centered Design**: Prioritize human-centered design principles when developing autonomous AI systems, focusing on the needs, values, and well-being of individuals and communities.
8. **Transparency in Data Collection and Use**: Ensure transparency in data collection and use, providing clear information about the data used to train and test autonomous AI systems, as well as the potential biases and limitations of that data.
9. **Independent Auditing and Testing**: Conduct independent auditing and testing of autonomous AI systems to identify potential vulnerabilities, biases, or errors, and to ensure that they align with guidelines and regulations.
10. **Public Engagement and Education**: Engage with the public and educate them about the benefits and risks of autonomous AI systems, as well as the measures being taken to ensure accountability and transparency.

By implementing these measures, we can reconcile the ethical implications of autonomous AI decision-making in life-or-death situations with the need for accountability and transparency, ensuring that these systems are developed and deployed in a way that prioritizes human safety, dignity, and well-being.

**Real-World Examples**:

* The US Department of Defense's (DoD) "Autonomous Systems Policy" provides guidelines for the development and deployment of autonomous systems, including the requirement for human oversight and review.
* The European Union's (EU) "Artificial Intelligence (AI) Regulation" proposes a framework for regulating AI, including provisions for transparency, accountability, and human oversight.
* The "Partnership on AI" initiative, a consortium of companies and organizations, aims to develop best practices for AI development and deployment, including guidelines for transparency, accountability, and human-centered design.

**Future Research Directions**:

* Developing more sophisticated techniques for model interpretability and explainability
* Investigating the role of human oversight and review in autonomous AI decision-making
* Examining the effectiveness of accountability mechanisms in practice
* Developing more comprehensive guidelines and regulations for autonomous AI systems
* Exploring the potential for human-centered design principles to improve the development and deployment of autonomous AI systems.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [23]:
import subprocess

result = subprocess.run(['ollama', 'pull', 'llama3.2'], capture_output=True, text=True)
ollama_has_model = result.returncode == 0

if ollama_has_model:
  print("Ollama has the model")
else:
  print("Ollama does not have the model")



Ollama has the model


In [24]:
ollama = None
if ollama_has_model:
  ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

  model_name = "llama3.2"
  #model_name = "gemma3:12b"

  while check_counter < CHECK_COUNT:
    response = ollama.chat.completions.create(model=model_name, messages=messages)
    answer = response.choices[0].message.content
    if check_answer(question, answer):
      break
    check_counter += 1
  
  if check_counter == 3:
    print("No valid answer found")

  display(Markdown(answer))
  competitors.append(model_name)
  answers.append(answer)

Reconciling the ethical implications of autonomous AI decision-making in life-or-death situations requires a multifaceted approach that balances the benefits of advanced technology with the need for accountability, transparency, and human oversight. Here are some potential strategies to address this challenge:

1. **Transparent Decision-Making Processes**: Implementing transparent decision-making processes can help ensure that AI systems provide clear explanations for their decisions. This could involve the development of explainable AI (XAI) techniques or integrating human-readable logs into the system.
2. **Human Oversight and Review**: Establish a human review process to verify the accuracy and fairness of autonomous AI decisions in life-or-death situations. This would ensure that humans can provide input, clarify decision-making processes, and correct any errors or biases.
3. **Regulatory Frameworks and Standards**: Developing regulatory frameworks and standards for autonomous AI decision-making could help mitigate concerns about accountability and transparency. Governments, regulatory bodies, and industry organizations must work together to establish clear guidelines and protocols for the development, deployment, and use of autonomous AI systems.
4. **Bias Detection and Mitigation**: AI systems can perpetuate existing biases if they are trained on biased data or developed with a skewed algorithmic design. Implementing bias detection and mitigation strategies, such as regular audits, diversity training for developers, and fairness metrics, would help ensure that AI decisions align with human values and ethical considerations.
5. **Robust Testing and Validation**: Conduct rigorous testing and validation of autonomous AI systems to identify potential errors or biases in decision-making processes. This could involve simulated scenarios, scenario-based testing, and real-world testing, as well as performance evaluation using relevant metrics (e.g., accuracy, fairness, reliability).
6. **Public Engagement and Transparency**: Engaging with the public about the capabilities, limitations, and ethical implications of autonomous AI systems can help build trust in their use. Providing clear information about the decision-making processes, benefits, and limitations of these systems is essential for fostering a responsible dialogue around AI adoption.
7. **Accountability Mechanisms**: Establishing accountability mechanisms for AI decision-making failures could help prevent damage to individuals or communities. This might involve implementing strict penalties for negligence, liability, or misconduct related to autonomous AI decisions, as well as providing clear avenues for reporting concerns and complaints.

**Key Challenges and Future Directions**

While addressing the ethical implications of autonomous AI decision-making in life-or-death situations is complex, some key challenges remain:

* Balancing human oversight with system autonomy: Autonomous AI systems should be designed to operate effectively while still allowing for reasonable levels of human intervention.
* Ensuring accountability without overburdening systems: Complex decisions require sophisticated AI systems; however, they should still be accountable for their actions and capable of accepting responsibility when something goes wrong.
* Managing bias and disparity: AI decision-making must mitigate biases originating from the data, algorithms, or environment. Ensuring fairness and reducing potential disparities in life-or-death situations is crucial.

Future research directions might involve:

1. Developing more advanced explainability techniques to provide transparent insights into AI decisions
2. Investigating scenarios-specific optimization methods for improving performance under constraints
3. Fostering multi-disciplinary partnerships and collaboration among researchers, policymakers, industry professionals, and the public to tackle the challenges associated with autonomous decision-making

In [25]:
# So where are we?

print(competitors)
print(answers)


['gpt-4o-mini', 'claude-3-7-sonnet-latest', 'gemini-2.0-flash', 'llama-3.3-70b-versatile', 'llama3.2']
['Reconciling the ethical implications of autonomous AI decision-making, especially in life-or-death situations, with accountability and transparency involves a multifaceted approach that considers technological, ethical, and regulatory dimensions. Here are key considerations that can help achieve this balance:\n\n1. **Ethical Frameworks**:\n   - Establish clear ethical guidelines that govern the development and deployment of autonomous AI systems. These frameworks could draw from established ethical theories, such as utilitarianism (maximizing overall good), deontological ethics (following a set of moral rules), and virtue ethics (fostering good character).\n   - Foster collaboration among ethicists, technologists, policymakers, and affected communities to continuously refine these ethical frameworks, ensuring they reflect diverse perspectives.\n\n2. **Human Oversight**:\n   - Implem

In [26]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-4o-mini

Reconciling the ethical implications of autonomous AI decision-making, especially in life-or-death situations, with accountability and transparency involves a multifaceted approach that considers technological, ethical, and regulatory dimensions. Here are key considerations that can help achieve this balance:

1. **Ethical Frameworks**:
   - Establish clear ethical guidelines that govern the development and deployment of autonomous AI systems. These frameworks could draw from established ethical theories, such as utilitarianism (maximizing overall good), deontological ethics (following a set of moral rules), and virtue ethics (fostering good character).
   - Foster collaboration among ethicists, technologists, policymakers, and affected communities to continuously refine these ethical frameworks, ensuring they reflect diverse perspectives.

2. **Human Oversight**:
   - Implement systems of human oversight where AI systems are not the final decision-makers, espe

In [27]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [28]:
print(together)

# Response from competitor 1

Reconciling the ethical implications of autonomous AI decision-making, especially in life-or-death situations, with accountability and transparency involves a multifaceted approach that considers technological, ethical, and regulatory dimensions. Here are key considerations that can help achieve this balance:

1. **Ethical Frameworks**:
   - Establish clear ethical guidelines that govern the development and deployment of autonomous AI systems. These frameworks could draw from established ethical theories, such as utilitarianism (maximizing overall good), deontological ethics (following a set of moral rules), and virtue ethics (fostering good character).
   - Foster collaboration among ethicists, technologists, policymakers, and affected communities to continuously refine these ethical frameworks, ensuring they reflect diverse perspectives.

2. **Human Oversight**:
   - Implement systems of human oversight where AI systems are not the final decision-makers,

In [29]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [30]:
print(judge)

You are judging a competition between 5 competitors.
Each model has been given this question:

How would you reconcile the ethical implications of autonomous AI decision-making in life-or-death situations with the need for accountability and transparency in those decisions?

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

Reconciling the ethical implications of autonomous AI decision-making, especially in life-or-death situations, with accountability and transparency involves a multifaceted approach that considers technological, ethical, and regulatory dimensions. Here are key considerations that can help achieve this balance:

1. **Ethical Frameworks**:
   - Establish clear

In [31]:
judge_messages = [{"role": "user", "content": judge}]

In [32]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["3", "4", "1", "5", "2"]}


In [33]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gemini-2.0-flash
Rank 2: llama-3.3-70b-versatile
Rank 3: gpt-4o-mini
Rank 4: llama3.2
Rank 5: claude-3-7-sonnet-latest


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

Added a validation funtion to the top of the notebook and check each answer up to CHECK_COUNT times.
Added a cleanup section

In [37]:
# Clean up

if openai:
  print("Closing OpenAI")
  openai.close()
  openai = None
if anthropic_api_key and claude is not None:
  print("Closing Anthropic")
  claude.close()
  claude = None
if google_api_key and gemini is not None:
  print("Closing Gemini")
  gemini.close()
  gemini = None
if deepseek_api_key and deepseek is not None:
  print("Closing DeepSeek")
  deepseek.close()
  deepseek = None
if groq_api_key and groq is not None:
  print("Closing Groq")
  groq.close()
  groq = None
if ollama_has_model and ollama is not None:
  print("Closing Ollama")
  ollama.close()
  ollama = None

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>