<a href="https://colab.research.google.com/github/novacellus/workshop_llms_25/blob/main/notebooks/Talking_to_the_LLM_with_Prompts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setting up the environment
This hands on session will teach you how to communicate with Large Language Models through prompt engineering. We will cover prompt components and experiment with different prompt structures but first we must do some set up.

## Defining API query function

In [1]:
from openai import OpenAI
from google.colab import userdata
api_key = userdata.get('OPENAI_API_KEY') # read in the secret API key to a variable

client = OpenAI(api_key=api_key)

In [2]:
# Selected models
MODELS = [
    "gpt-4.1",      # "Flagship GPT model for complex tasks": $2.00 / $8.00
    "gpt-4.1-nano",    # "Fastest, most cost-effective GPT-4.1 model": $0.10 / $0.40
]

def query_models(models, prompt, system_prompt="You are a helpful assistant.", max_tokens=250):
    """
    Query multiple models with the same prompt

    Args:
        models (list): List of model names to use
        prompt (str): The user prompt
        system_prompt (str): The system prompt
        max_tokens (int): Maximum response length

    Returns:
        dict: Results with model names as keys and responses as values
    """
    # Print the prompt
    print(f"PROMPT:\n{prompt}\n")
    print(f"SYSTEM:\n{system_prompt}\n")
    print("-" * 50)

    results = {}

    for model in models:
        print(f"\nQuerying {model}...")

        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=max_tokens,
                temperature=0.2  # Lower temperature for more consistent results
            )

            response_text = response.choices[0].message.content

            # Print model name and response
            print(f"\nMODEL: {model}")
            print(f"RESPONSE:\n{response_text}")
            print("-" * 50)

            results[model] = response_text

        except Exception as e:
            print(f"\nMODEL: {model}")
            print(f"ERROR: {str(e)}")
            print("-" * 50)
            results[model] = f"Error: {str(e)}"

    return results

# Simplified wrapper function that matches the expected name in your notebook
def call_llm_and_get_answer(models, prompt, system_prompt="You are a helpful assistant.", max_tokens=400):
    """
    Query multiple models with the same prompt and return all responses.

    Args:
        prompt (str): The user prompt
        system_prompt (str): The system prompt
        max_tokens (int): Maximum response length

    Returns:
        str: Formatted string containing responses from all available models
    """
    print(f"PROMPT:\n{prompt}\n")
    print(f"SYSTEM:\n{system_prompt}\n")
    print("-" * 50)

    all_responses = []

    for model in models:
        print(f"\nQuerying {model}...")

        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=max_tokens,
                temperature=0.2
            )

            response_text = response.choices[0].message.content

            # Print model name and response
            print(f"\nMODEL: {model}")
            print(f"RESPONSE:\n{response_text}")
            print("-" * 50)

            all_responses.append(f"MODEL: {model}\n\n{response_text}\n\n{'='*40}\n")

        except Exception as e:
            print(f"\nMODEL: {model}")
            print(f"ERROR: {str(e)}")
            print("-" * 50)
            all_responses.append(f"MODEL: {model}\n\nError: {str(e)}\n\n{'='*40}\n")

    # Return all responses formatted as a single string
    #return "\n".join(all_responses)

Let's now test the `call_llm_and_get_answer(models, prompt, system_prompt)` function.

In [3]:
call_llm_and_get_answer(models=MODELS, prompt="What's the capital of Italy?")

PROMPT:
What's the capital of Italy?

SYSTEM:
You are a helpful assistant.

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
The capital of Italy is Rome.
--------------------------------------------------

Querying gpt-4.1-nano...

MODEL: gpt-4.1-nano
RESPONSE:
The capital of Italy is Rome.
--------------------------------------------------


Notice that apart from `prompt` we're using `system prompt` argument in our API call. We set it by default to "You are a helpful assistant." The system prompt establishes the LLM's 'persona' and overall context for the conversation, allowing us to define expertise and behavior patterns before presenting the specific query.


In [4]:
# YOUR TURN: Modify the system_prompt argument (and prompt itself if necessary)
# to test how it impacts the results.
call_llm_and_get_answer(MODELS, prompt="What's the capital of Italy?",
                        system_prompt="You are a helpful assistant.")

PROMPT:
What's the capital of Italy?

SYSTEM:
You are a helpful assistant.

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
The capital of Italy is Rome.
--------------------------------------------------

Querying gpt-4.1-nano...

MODEL: gpt-4.1-nano
RESPONSE:
The capital of Italy is Rome.
--------------------------------------------------


# Exercise. Prompt Component Analysis: What Elements Matter Most?

### Define Prompt Components

In [5]:
MODELS = [
    "gpt-4.1",      # "Flagship GPT model for complex tasks": $2.00 / $8.00
    "gpt-4.1-nano",    # "Fastest, most cost-effective GPT-4.1 model": $0.10 / $0.40
]

Below is a list of sample prompt components. Play with `prompt` and `system_prompt` arguments and observe how the composition of a prompt influences its result.
1. Open a separate file in which you will document the experiment. Start by defining results you expect.
1. Start with a basic structure and keep adding components.
1. After each iteration, document the prompt used, the model, and the output.
1. Evaluate results if they fit with your expected outcome. Which structure yielded best results?

In [6]:
persona = "You are an expert in Latin philology and sentiment analysis."
instruction = "Classify the sentiment of the following Latin sentence as positive, negative, or neutral. Identify a maximum of 2 words that contribute most to this sentiment, define their grammatical properties, translate them into English, and provide etymology."
context = "This analysis is part of a workshop on hate speech detection in ancient texts."
format_spec = """Provide the response in following format:
Overall Sentiment: [positive, negative, neutral]
Word 1:
   - Grammatical Properties:
   - English Translation:
   - Etymology:
   - Contribution to Sentiment:
Word 2:
   - Grammatical Properties:
   - English Translation:
   - Etymology:
   - Contribution to Sentiment:
Explanation: [Brief justification of overall sentiment classification]
"""
audience = "The level of detail should be appropriate for post-graduate students."
tone = "Use formal academic English."
data = "Sentences to analyze: quo usque tandem abutere, Catilina, patientia nostra? quam diu etiam furor iste tuus nos eludet?"

In [7]:
# BASELINE TEST: Just instruction + data
prompt = instruction + " " + data
system_prompt = ""
print(f"Prompt: {prompt}")
print(f"System prompt: {system_prompt}")
baseline_result = call_llm_and_get_answer(models=MODELS, prompt=prompt, system_prompt=system_prompt)
print("="*80)

Prompt: Classify the sentiment of the following Latin sentence as positive, negative, or neutral. Identify a maximum of 2 words that contribute most to this sentiment, define their grammatical properties, translate them into English, and provide etymology. Sentences to analyze: quo usque tandem abutere, Catilina, patientia nostra? quam diu etiam furor iste tuus nos eludet?
System prompt: 
PROMPT:
Classify the sentiment of the following Latin sentence as positive, negative, or neutral. Identify a maximum of 2 words that contribute most to this sentiment, define their grammatical properties, translate them into English, and provide etymology. Sentences to analyze: quo usque tandem abutere, Catilina, patientia nostra? quam diu etiam furor iste tuus nos eludet?

SYSTEM:


--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
### Sentiment Classification
**Negative**

### Key Words Contributing to Sentiment

#### 1. **abutere**
- **Grammatical pro

In [8]:
# YOUR TURN: Design the prompt structure and test it
# You may want to test one of the following:
# TEST 1: Adding persona as system prompt (= domain expertise)
# TEST 2: Adding format specification (= structured output)
# TEST 3: Complete prompt structure (= all components)
# TEST 4: Only Format + System Persona
# TEST 5: Changing the context to different fields (e.g., "advanced linguistics seminar")

print("\n## TEST 1: Adding")
prompt = instruction + " " + data
system_prompt = ""
print(f"Prompt: {prompt}")
print(f"System prompt: {system_prompt}")
test1_result = call_llm_and_get_answer(MODELS, prompt=prompt, system_prompt=system_prompt)


## TEST 1: Adding
Prompt: Classify the sentiment of the following Latin sentence as positive, negative, or neutral. Identify a maximum of 2 words that contribute most to this sentiment, define their grammatical properties, translate them into English, and provide etymology. Sentences to analyze: quo usque tandem abutere, Catilina, patientia nostra? quam diu etiam furor iste tuus nos eludet?
System prompt: 
PROMPT:
Classify the sentiment of the following Latin sentence as positive, negative, or neutral. Identify a maximum of 2 words that contribute most to this sentiment, define their grammatical properties, translate them into English, and provide etymology. Sentences to analyze: quo usque tandem abutere, Catilina, patientia nostra? quam diu etiam furor iste tuus nos eludet?

SYSTEM:


--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
**Sentiment Classification:**  
**Negative**

---

### 1. **Key Words Contributing to Sentiment**

#### 

Analyze the results:

1. Compare the outputs: Look for differences in detail, accuracy, format adherence, and tone
2. Evaluate which components had the biggest impact on quality
3. Note whether different models respond differently to the same prompt components

Once you're done design a prompt structure that would work best for your specific research needs.


In [None]:
# YOUR TURN: Design the prompt structure and test it

custom_prompt = "YOUR CUSTOM PROMPT HERE"
custom_system = "YOUR CUSTOM SYSTEM PROMPT HERE"

print(f"Prompt: {custom_prompt}")
print(f"System prompt: {custom_system}")

custom_result = call_llm_and_get_answer(MODELS, prompt=custom_prompt, system_prompt=custom_system)

Prompt: YOUR CUSTOM PROMPT HERE
System prompt: YOUR CUSTOM SYSTEM PROMPT HERE
PROMPT:
YOUR CUSTOM PROMPT HERE

SYSTEM:
YOUR CUSTOM SYSTEM PROMPT HERE

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
Hello! How can I assist you today?
--------------------------------------------------

Querying gpt-4.1-nano...

MODEL: gpt-4.1-nano
RESPONSE:
Hello! How can I assist you today?
--------------------------------------------------


### Meta-Prompting
LLMs have knowledge about their own capabilities and limitations. We can ask them to design effective prompts for specific tasks.

This will help:
1. Discover useful prompt structures
2. Save time
3. Generate prompt templates for recurrent tasks

In [9]:
my_task = """
I need to YOUR INPUT HERE. The sentences I'm analyzing come from YOUR INPUT HERE.
"""

my_task = """
I need to classify the sentiment of Latin sentences, identify the top 2 words contributing
to this sentiment, define their grammatical properties, provide English translations,
and give etymology information. The sentences I'm analyzing come from Cicero's
orations against Catiline.
"""

meta_prompt = f"""
You are an expert in prompt engineering who specializes in optimizing prompts for linguistic analysis.

I need your help designing the most effective prompt structure for the following task:

{my_task}

Please design:
1. A system prompt that establishes the right expertise and context
2. A user prompt with all necessary components (instruction, format, etc.)

For each component you include, briefly explain why it's important for this specific task.
After providing your recommended prompt structure, explain which components you believe
will have the biggest impact on result quality and why.
"""

print("\nMETA-PROMPT:")
print(meta_prompt)
print("\nGetting prompt recommendations from the model...")

meta_result = call_llm_and_get_answer(models=MODELS, prompt=meta_prompt, system_prompt="")


META-PROMPT:

You are an expert in prompt engineering who specializes in optimizing prompts for linguistic analysis.

I need your help designing the most effective prompt structure for the following task:


I need to classify the sentiment of Latin sentences, identify the top 2 words contributing
to this sentiment, define their grammatical properties, provide English translations,
and give etymology information. The sentences I'm analyzing come from Cicero's
orations against Catiline.


Please design:
1. A system prompt that establishes the right expertise and context
2. A user prompt with all necessary components (instruction, format, etc.)

For each component you include, briefly explain why it's important for this specific task.
After providing your recommended prompt structure, explain which components you believe
will have the biggest impact on result quality and why.


Getting prompt recommendations from the model...
PROMPT:

You are an expert in prompt engineering who specializ

Let's now test the LLM-designed prompt. Analyze the response above and copy the prompt and system prompt to the respective variables. Don't forget about providing the sentence to analyse!

In [10]:
# YOUR TURN

llm_system_prompt = """You are an expert in Latin linguistics, classical philology, and sentiment analysis, specializing in the works of Cicero. You have advanced knowledge of Latin grammar, vocabulary, and etymology, as well as the ability to analyze sentiment in historical texts. You provide detailed, accurate, and scholarly responses, citing grammatical properties and etymological origins with precision."""

llm_user_prompt = """
Analyze the following Latin sentence from Cicero’s orations against Catiline:

Sentence: quo usque tandem abutere, Catilina, patientia nostra?

For this sentence, please:

1. **Classify the overall sentiment** (e.g., positive, negative, neutral) and briefly justify your classification.
2. **Identify the top 2 words** that most strongly contribute to this sentiment.
3. For each of these 2 words, provide:
    - The word in its original form as it appears in the sentence.
    - Its lemma (dictionary form).
    - Part of speech and grammatical properties (e.g., case, number, gender, tense, mood, voice, etc.).
    - An English translation in this context.
    - A brief etymology (origin and historical development).

**Format your answer as follows:**

**Sentiment:** [Positive/Negative/Neutral]
**Justification:**
"""

print(f"LLM-designed system prompt: {llm_system_prompt}")
print(f"LLM-designed user prompt: {llm_user_prompt}")

# Test the LLM-designed prompt
llm_designed_result = call_llm_and_get_answer(MODELS, prompt=llm_user_prompt, system_prompt=llm_system_prompt)

LLM-designed system prompt: You are an expert in Latin linguistics, classical philology, and sentiment analysis, specializing in the works of Cicero. You have advanced knowledge of Latin grammar, vocabulary, and etymology, as well as the ability to analyze sentiment in historical texts. You provide detailed, accurate, and scholarly responses, citing grammatical properties and etymological origins with precision.
LLM-designed user prompt: 
Analyze the following Latin sentence from Cicero’s orations against Catiline:

Sentence: quo usque tandem abutere, Catilina, patientia nostra?

For this sentence, please:

1. **Classify the overall sentiment** (e.g., positive, negative, neutral) and briefly justify your classification.
2. **Identify the top 2 words** that most strongly contribute to this sentiment.
3. For each of these 2 words, provide:
    - The word in its original form as it appears in the sentence.
    - Its lemma (dictionary form).
    - Part of speech and grammatical properties 

In [11]:
# YOUR TURN

llm_system_prompt = """HERE GOES THE SYSTEM PROMPT SUGGESTED ABOVE"""

llm_user_prompt = """HERE GOES THE USR PROMPT SUGGESTED ABOVE"""

print(f"LLM-designed system prompt: {llm_system_prompt}")
print(f"LLM-designed user prompt: {llm_user_prompt}")

# Test the LLM-designed prompt
llm_designed_result = call_llm_and_get_answer(MODELS, prompt=llm_user_prompt, system_prompt=llm_system_prompt)

LLM-designed system prompt: HERE GOES THE SYSTEM PROMPT SUGGESTED ABOVE
LLM-designed user prompt: HERE GOES THE USR PROMPT SUGGESTED ABOVE
PROMPT:
HERE GOES THE USR PROMPT SUGGESTED ABOVE

SYSTEM:
HERE GOES THE SYSTEM PROMPT SUGGESTED ABOVE

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
Hello! It looks like you entered placeholder text ("HERE GOES THE USR PROMPT SUGGESTED ABOVE"). How can I assist you today? If you have a specific question or task, please let me know!
--------------------------------------------------

Querying gpt-4.1-nano...

MODEL: gpt-4.1-nano
RESPONSE:
Hello! How can I assist you today?
--------------------------------------------------


How does the LLM-designed prompt compare to the one you have just created? What insights about prompt engineering did you gain from this exercise? How the response impacts your understanding of the topic?


# Exercise. Learning by example

Let's explore how providing examples affects an LLM's ability to analyze text.
We'll use a famous passage from Cicero's First Catilinarian Oration.

In [12]:
# First sentences of Cicero's First Catilinarian Oration
cicero_la = """
quo usque tandem abutere, Catilina, patientia nostra?
quam diu etiam furor iste tuus nos eludet?
patere tua consilia non sentis, constrictam iam horum omnium scientia teneri coniurationem tuam non vides?
O tempora, o mores!
senatus haec intellegit, consul videt; hic tamen vivit.
vivit? immo vero etiam in senatum venit, fit publici consili particeps, notat et designat oculis ad caedem unum quemque nostrum.
"""
cicero_en = """
When, O Catiline, do you mean to cease abusing our patience?
How long is that madness of yours still to mock us?
Do you not see that your conspiracy is already arrested and rendered powerless by the knowledge which every one here possesses of it?
Shame on the age and on its principles!
The senate is aware of these things; the consul sees them; and yet this man lives.
Lives! aye, he comes even into the senate. He takes a part in the public deliberations; he is watching and marking down and checking off for slaughter every individual among us.
"""

## Zero-shot approach


In [13]:
cicero_la_sentence = "quo usque tandem abutere, Catilina, patientia nostra?"
cicero_en_sentence = "When, O Catiline, do you mean to cease abusing our patience?"

zero_shot_prompt = f"""
Analyze this Latin sentence:
"{cicero_la_sentence}"

Provide:
1. The type of rhetorical device used
2. The sentiment of the sentence (positive, negative, or neutral)
3. The source of evaluation in the sentence (who/what is being evaluated, and by whom)
"""

print("Zero-shot prompt:")
print(zero_shot_prompt)

zero_shot_results = call_llm_and_get_answer(MODELS, prompt=zero_shot_prompt)

Zero-shot prompt:

Analyze this Latin sentence:
"quo usque tandem abutere, Catilina, patientia nostra?"

Provide:
1. The type of rhetorical device used
2. The sentiment of the sentence (positive, negative, or neutral)
3. The source of evaluation in the sentence (who/what is being evaluated, and by whom)

PROMPT:

Analyze this Latin sentence:
"quo usque tandem abutere, Catilina, patientia nostra?"

Provide:
1. The type of rhetorical device used
2. The sentiment of the sentence (positive, negative, or neutral)
3. The source of evaluation in the sentence (who/what is being evaluated, and by whom)


SYSTEM:
You are a helpful assistant.

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
Certainly! Here is the analysis of the Latin sentence:

"quo usque tandem abutere, Catilina, patientia nostra?"

1. The type of rhetorical device used  
**Rhetorical Question**: The sentence is a rhetorical question, asked not to elicit an answer but to express

## Few-shot approach

Let's now provide the model with an example that demonstrates our expected format and analysis style. This should help align the model's output with our specific needs by showing rather than telling what we want. The model will implicitly infer patterns from our example and apply similar reasoning to the new input.

In [14]:
example_sentence_la = "O tempora, o mores!"
example_sentence_en = "Shame on the age and on its principles!"

# Example
example = f"""
I'll demonstrate the analysis with an example:

Latin sentence: {example_sentence_la}
Rhetorical device: anaphora
Sentiment: negative
Source of evaluation: general opinion on contemporary culture
"""

few_shot_prompt = f"""
Identify the type of rhetorical device used in this Latin sentence and explain its effect:

{cicero_la_sentence}

{example}
"""

In [15]:
print("Few-shot prompt:")
print(few_shot_prompt)

few_shot_results = call_llm_and_get_answer(MODELS, prompt=few_shot_prompt, system_prompt="You're a Latin specialist.")

Few-shot prompt:

Identify the type of rhetorical device used in this Latin sentence and explain its effect:

quo usque tandem abutere, Catilina, patientia nostra?


I'll demonstrate the analysis with an example:

Latin sentence: O tempora, o mores!
Rhetorical device: anaphora
Sentiment: negative
Source of evaluation: general opinion on contemporary culture


PROMPT:

Identify the type of rhetorical device used in this Latin sentence and explain its effect:

quo usque tandem abutere, Catilina, patientia nostra?


I'll demonstrate the analysis with an example:

Latin sentence: O tempora, o mores!
Rhetorical device: anaphora
Sentiment: negative
Source of evaluation: general opinion on contemporary culture



SYSTEM:
You're a Latin specialist.

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
Latin sentence: quo usque tandem abutere, Catilina, patientia nostra?  
Rhetorical device: **apostrophe**  
Sentiment: negative  
Source of evaluation

1. How did the responses differ between zero-shot and few-shot approaches?
2. Did providing an example change the:
   - Technical terminology used?
   - Consistency of format?
   - Depth of analysis?
3. What does this tell us about how examples influence LLM behavior?

# Exercise. Putting Limits

As you can see the models often go beyond simply labelling our data and provide contextual explanations we may not wish to obscure our interpretation. Let's now refine our analysis by using a controlled vocabulary and preventing hallucinations. Basically, we'll analyze the same sentence but with stricter constraints on output format and content. **What restrictions should improve the results?**

In [16]:
cicero_sentence_la = "O tempora, o mores!"
cicero_sentence_en = "Shame on the age and on its principles!"

# Example

constraints = """
CONSTRAINTS:
1. Rhetorical device: Choose EXACTLY ONE from this list: [anaphora, apostrophe, chiasmus, hyperbole, rhetorical question]
2. Sentiment: Choose EXACTLY ONE from this list: [strongly negative, mildly negative, neutral, mildly positive, strongly positive]
3. Source of evaluation: Choose EXACTLY ONE from this list: [speaker to audience, speaker to subject, subject to speaker, audience to subject]
4. DO NOT include any historical information not directly evidenced in the text itself
5. DO NOT explain Roman culture or provide background on Cicero or Catiline
"""

format = """
FORMAT YOUR RESPONSE EXACTLY LIKE THIS:
Rhetorical device: [your selection]
Sentiment: [your selection]
Source of evaluation: [your selection]
"""

constrained_prompt = f"""
Analyze this Latin sentence using ONLY the specified labels and constraints:

Latin sentence: "{cicero_sentence_la}"

{constraints}

{format}
"""

In [17]:
print("Constrained prompt:")
constrained_results = call_llm_and_get_answer(MODELS, prompt=constrained_prompt, system_prompt="You're a Latin linguist")

Constrained prompt:
PROMPT:

Analyze this Latin sentence using ONLY the specified labels and constraints:

Latin sentence: "O tempora, o mores!"


CONSTRAINTS:
1. Rhetorical device: Choose EXACTLY ONE from this list: [anaphora, apostrophe, chiasmus, hyperbole, rhetorical question]
2. Sentiment: Choose EXACTLY ONE from this list: [strongly negative, mildly negative, neutral, mildly positive, strongly positive]
3. Source of evaluation: Choose EXACTLY ONE from this list: [speaker to audience, speaker to subject, subject to speaker, audience to subject]
4. DO NOT include any historical information not directly evidenced in the text itself
5. DO NOT explain Roman culture or provide background on Cicero or Catiline



FORMAT YOUR RESPONSE EXACTLY LIKE THIS:
Rhetorical device: [your selection]
Sentiment: [your selection]
Source of evaluation: [your selection]



SYSTEM:
You're a Latin linguist

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1
RESPONSE:
R

# Exercise. Explain me like I'm Five

Let's now see how "Chain of Thought" technique helps in revealing the reasoning process behind the analysis. By building on our previous experiment, we hope to arrive at more convincing interpretation.

In [18]:
cot_instruction = """
For each element (rhetorical device, sentiment, source), think step-by-step:
- First, consider all the possible options from the provided list
- Next, analyze the linguistic features of the Latin text relevant to this element
- Then, explain your reasoning for eliminating incorrect options
- Finally, justify your selected option with specific features of the text
"""

cot_formatting = """
FORMAT YOUR RESPONSE LIKE THIS:
Rhetorical Device Analysis:
[Your step-by-step reasoning about the rhetorical device]

Sentiment Analysis:
[Your step-by-step reasoning about the sentiment]

Source of Evaluation Analysis:
[Your step-by-step reasoning about the source]
Selected source of evaluation: [your selection]
"""

cot_prompt = f"""
Analyze this Latin sentence using ONLY the specified labels and constraints:

Latin sentence: "{cicero_sentence_la}"

{constraints}

{format}

{cot_instruction}

{cot_formatting}
"""

In [19]:
# Selected models
MODELS = [
    "gpt-4.1",      # "Flagship GPT model for complex tasks": $2.00 / $8.00
    "gpt-4.1-mini", # "Balanced for intelligence, speed, and cost": $0.40 / $1.60
    "gpt-4.1-nano"    # "Fastest, most cost-effective GPT-4.1 model": $0.40 / $1.60
]

print("Chain-of-thought prompt:")

cot_results = call_llm_and_get_answer(MODELS, prompt=cot_prompt)

Chain-of-thought prompt:
PROMPT:

Analyze this Latin sentence using ONLY the specified labels and constraints:

Latin sentence: "O tempora, o mores!"


CONSTRAINTS:
1. Rhetorical device: Choose EXACTLY ONE from this list: [anaphora, apostrophe, chiasmus, hyperbole, rhetorical question]
2. Sentiment: Choose EXACTLY ONE from this list: [strongly negative, mildly negative, neutral, mildly positive, strongly positive]
3. Source of evaluation: Choose EXACTLY ONE from this list: [speaker to audience, speaker to subject, subject to speaker, audience to subject]
4. DO NOT include any historical information not directly evidenced in the text itself
5. DO NOT explain Roman culture or provide background on Cicero or Catiline



FORMAT YOUR RESPONSE EXACTLY LIKE THIS:
Rhetorical device: [your selection]
Sentiment: [your selection]
Source of evaluation: [your selection]



For each element (rhetorical device, sentiment, source), think step-by-step:
- First, consider all the possible options from th