In [1]:
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [2]:
load_dotenv(override=True)


google_api_key = os.getenv("GEMINI_API_KEY")
groq_api_key = os.getenv("GROQ_API_KEY")

In [3]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [4]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
question = response.choices[0].message.content

display(Markdown(question))


Imagine a world where subjective experiences can be perfectly quantified and reliably transferred between individuals. If you could experience any other person's existence for 24 hours, knowing you would retain the memory but return to your own body and life afterward, whose life would you choose and why? Consider not just fame, fortune, or happiness, but also hardship, moral complexity, and historical impact.


In [5]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

In [6]:
model_name = "gemini-2.5-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

competitors.append(model_name)
answers.append(answer)



In [7]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

competitors.append(model_name)
display(Markdown(answer))
answers.append(answer)

If I could experience any other person's existence for 24 hours, I would choose to walk in the shoes of Malala Yousafzai, the Pakistani activist for girls' education and the youngest Nobel Prize laureate. Here's why:

1. **Unwavering courage and resilience**: Malala's life is a testament to her extraordinary bravery and determination. She has faced numerous death threats, survived a Taliban assassination attempt, and continued to advocate for girls' education despite the risks. I want to understand what drives her courage and how she maintains her resolve in the face of adversity.
2. **Unique perspective on education and equality**: As a champion of girls' education, Malala has a deep understanding of the importance of education in empowering individuals and communities. I would like to experience her thoughts and feelings on the value of education, the challenges she has faced in promoting it, and the impact it has had on her life and the lives of others.
3. **Cultural and historical context**: Malala's life is deeply rooted in Pakistani culture and the complexities of the region. I would like to experience the daily life of a Pashtun woman in the Swat Valley, with all its traditions, customs, and challenges. This would give me a unique insight into the historical and cultural context of the region and the ways in which it shapes individual experiences.
4. **Moral complexity and nuance**: Malala's story is not just one of triumph, but also of struggle and controversy. She has faced criticism and backlash from some quarters, and I would like to understand the moral complexities and trade-offs she has had to navigate in her advocacy work. How does she balance her commitment to education with the need to protect herself and her family?
5. **Empathy and compassion**: By experiencing Malala's life, I hope to gain a deeper understanding of the struggles faced by girls and women in patriarchal societies. I want to feel the weight of her responsibilities, the fear and the hope, and the resilience that drives her to continue fighting for a better future.
6. **Inspiration and personal growth**: Malala's story is a powerful reminder of the impact one person can have on the world. By experiencing her life, I hope to be inspired by her example and to gain a new perspective on my own life and priorities. I want to understand what drives her to make sacrifices for the greater good and how I can apply those lessons to my own life.

For 24 hours, I would like to wake up in Malala's body, to see the world through her eyes, and to experience the beauty and the brutality of her reality. I would like to feel the sun on my face as I walk to school, to taste the food she eats, and to hear the sounds of her community. I would like to attend meetings with world leaders, to speak at rallies, and to advocate for girls' education. I would like to understand the intricacies of her relationships with her family, friends, and allies.

After 24 hours, I would return to my own body and life, but I would carry with me the memories and lessons of Malala's existence. I would be reminded of the power of courage, resilience, and compassion, and I would strive to apply those qualities to my own life and work.

In [9]:
a = zip(competitors, answers)

In [12]:
list(a)

[('gemini-2.5-flash',
  'This is a fascinating and profoundly challenging thought experiment. Given the ability to perfectly quantify and transfer subjective experience, my choice would be to experience the existence of **Viktor Frankl, specifically during a pivotal 24-hour period within a Nazi concentration camp.**\n\nHere\'s why:\n\n1.  **Hardship Beyond Comprehension:** To truly grasp the unfathomable depths of human suffering, degradation, and the systematic attempt to strip away individuality and meaning. It\'s not just about witnessing it from the outside; it\'s about *feeling* the hunger, the cold, the fear, the constant threat of death, the loss of loved ones, the despair. This experience would be a crucible of the human spirit.\n\n2.  **Moral Complexity in the Extreme:** Frankl\'s experience was saturated with moral dilemmas – not just his own, but witnessing the choices made by fellow prisoners, the kapos, and the guards. Who shared a crust of bread? Who stole? Who maintained

In [19]:
together = ""
for index, answer in enumerate(answers):
    together += f"Response from Competitor {index+1}:\n\n"
    together += answer + "\n\n"

In [20]:
print(together)

Response from Competitor 1:

This is a fascinating and profoundly challenging thought experiment. Given the ability to perfectly quantify and transfer subjective experience, my choice would be to experience the existence of **Viktor Frankl, specifically during a pivotal 24-hour period within a Nazi concentration camp.**

Here's why:

1.  **Hardship Beyond Comprehension:** To truly grasp the unfathomable depths of human suffering, degradation, and the systematic attempt to strip away individuality and meaning. It's not just about witnessing it from the outside; it's about *feeling* the hunger, the cold, the fear, the constant threat of death, the loss of loved ones, the despair. This experience would be a crucible of the human spirit.

2.  **Moral Complexity in the Extreme:** Frankl's experience was saturated with moral dilemmas – not just his own, but witnessing the choices made by fellow prisoners, the kapos, and the guards. Who shared a crust of bread? Who stole? Who maintained their

In [21]:
# LLM Evaluator

judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [22]:
print(judge)

You are judging a competition between 2 competitors.
Each model has been given this question:

Imagine a world where subjective experiences can be perfectly quantified and reliably transferred between individuals. If you could experience any other person's existence for 24 hours, knowing you would retain the memory but return to your own body and life afterward, whose life would you choose and why? Consider not just fame, fortune, or happiness, but also hardship, moral complexity, and historical impact.


Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

Response from Competitor 1:

This is a fascinating and profoundly challenging thought experiment. Given the ability to perfectly quantify and transfer sub

In [23]:
judge_messages = [{"role": "user", "content": judge}]

In [24]:
model_name = "gemini-2.0-flash"
response = gemini.chat.completions.create(model=model_name, messages=judge_messages)
results = response.choices[0].message.content
print(results)

{"results": ["1", "2"]}


In [26]:
results_dict = json.loads(results)
print(results_dict)

{'results': ['1', '2']}


In [27]:
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gemini-2.5-flash
Rank 2: llama-3.3-70b-versatile
