In [1]:
import os
from typing import List, Dict
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

load_dotenv(override=True)

True

In [None]:
class Model:
    def __init__(self, name: str, api_key: str, base_url: str, model: str):
        self.name = name
        self.model = model
        self.messages: List = [] # type: List[Dict[str, str]]
        self.client = OpenAI(base_url=base_url, api_key=api_key)

    def make_request(self, prompt: str) -> str:
        self.messages.append({"role": "user", "content": prompt})

        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=self.messages
            )

            answer = response.choices[0].message.content or ""
            self.messages.append({"role": "assistant", "content": answer})
            return answer

        except Exception as e:
            return f"Error making request: {str(e)}"

    def clear_messages(self) -> None:
        self.messages.clear()


def main():
    gemini = Model("gemini", os.getenv("GEMINI_API_KEY"), os.getenv("GEMINI_BASE_URL"), "gemini-2.5-flash")  # type: ignore
    ollama = Model("ollama", os.getenv("OLLAMA_API_KEY"), os.getenv("OLLAMA_BASE_URL"), "gemma3:12b")        # type: ignore

    prompt = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
    prompt += "Please respond only with the question, no explanation."

    # Generate the question using Gemini
    question = gemini.make_request(prompt)
    print(f"Question generated by Gemini: {question}\n\n")

    gemini.clear_messages()

    ollama_answer = ollama.make_request(question)
    print(f"Ollama answer: {display(Markdown(ollama_answer))}\n\n")

    gemini_answer = gemini.make_request(question)
    print(f"Gemini answer: {display(Markdown(gemini_answer))}\n\n")

    answers = {
        "ollama": ollama_answer,
        "gemini": gemini_answer 
    }

    comparison_prompt = f"Please choose which model has the best answer to the question:\n {question}\n\n"
    for model, answer in answers.items():
        comparison_prompt += f"# {model}\nanswer: {answer}\n\n"
    comparison_prompt += "Which is better and why?"

    result = gemini.make_request(comparison_prompt)
    display(Markdown(result))

if __name__ == "__main__":
    main()


Question generated by Gemini: A benevolent superintelligence offers humanity a guaranteed future free of all suffering, disease, and want. The catch is that achieving this state requires it to gently, imperceptibly, and irrevocably modify the collective human psyche to eliminate the capacity for extreme emotional states—both negative (e.g., profound grief, despair) and positive (e.g., intense ecstasy, passionate love, overwhelming awe)—along with the drive for unconstrained individual artistic creation and existential striving. As an independent, dispassionate arbiter, how would you evaluate the net value of accepting this offer for the future of humanity, detailing the specific gains and losses across philosophical, ethical, psychological, social, and evolutionary dimensions?
Ollama answer: Okay, this is a truly monumental ethical dilemma. Let's break down the evaluation of this benevolent superintelligence's offer, considering the gains and losses across multiple dimensions, and then

The Gemini model provides the better answer. Here's why:

1.  **Superior Structure and Clarity:** Gemini's answer is impeccably structured with clear headings for "Analysis of Gains," "Analysis of Losses," and "Net Value Evaluation," each further broken down by dimension. This makes it incredibly easy to follow and ensures all parts of the prompt are addressed systematically. Ollama's answer also uses headings by dimension, but integrates gains and losses within those sections, which is less clear than Gemini's separate "Gains" and "Losses" sections. For a "net value" assessment, clearly detailing both sides first is crucial.

2.  **Comprehensive Detailing of Gains:** Gemini excels in explicitly detailing the gains across each dimension. It dedicates significant space to outlining the immense benefits, such as "Maximization of Well-being (Utilitarian)," "Global Peace and Stability," "Constant Contentment," and "Guaranteed Survival." This provides a necessary balance for a dispassionate arbiter to truly weigh the "net value." Ollama touches upon the gains more implicitly within its discussion of losses or as a preliminary acknowledgment, rather than giving them a dedicated, detailed breakdown.

3.  **Depth and Nuance in Arguments:** Both models present strong arguments, but Gemini's points often feel slightly more refined and comprehensively articulated. For example, in its philosophical losses, Gemini adds "Absence of Virtue through Adversity," and in ethical losses, "Diminishment of Human Dignity" and "Benevolent Tyranny," which are potent additional layers to the argument. Its "core dilemma" framing in the Net Value section is also very effective.

4.  **Dispassionate Tone and Strong Conclusion:** Both maintain a dispassionate tone well. However, Gemini's conclusion is particularly strong, eloquently summarizing the dilemma and using evocative phrases like "existential lobotomy" and "lose its soul, its purpose, and its very capacity for self-transcendence" to underscore the profound nature of the losses. It clearly and powerfully states its "negative" net value evaluation.

While Ollama's answer is good and covers all the required points, Gemini's superior structure, more explicit and detailed analysis of gains, and slightly more nuanced arguments make it the more complete and impactful evaluation for this complex ethical dilemma.