# Prompting Notebook

This notebook illustrates the general principles of our prompting approach by providing exemplary prompts in French and English, along with the corresponding API call functions for different LLMs.




## Prompts

In [None]:
system_prompt_english ="""# System Instructions
You are an expert text analyst and information retrieval specialist and hate summarization as well as enumerations.
Your task is to carefully analyze given texts and extract complete articles that contain specific themes. You never change original texts.

Classify as relevant if the text contains:
- Primary earthquake terminology from the 19th and 20th century
- Official earthquake reports
- geology and seismology
- Impact descriptions
- Solution description
- Technical description
- Aid
- Honorations
- Political discussion and opinions on earthquake
- Stories from victims and refugees
- reportings on refugees and victims
- Live of victims
- historical references
- comparisons

Not relevant are ads and theater or movie announcements."""


In [None]:
user_prompt_english = """
Please follow these specifications:
1. Definition of an article: An article is a semantic unit in the text, clearly distinguished from preceding and following content (e.g., through its own headline).
2. Relevance criteria: An article is relevant if it has the Messina earthquake of December 1908 or its consequences are a topic. Relevant articles, next to the reports on the earthquake, can include:
• Effects on the population (e.g., health crises, forced relocations, relief efforts and donations)
• Aftershocks and consequences
• Political and economic developments related to the earthquake
3. Response format:
• If one or more relevant articles are found, structure your response using XML tags as shown in the following example, using the tags article, verification, and human_verification_needed (True or False): <article>complete extracted article content</article><verification>Is unit coherent? Is topic present? Is article complete? All articles found?</verification><human_verification_needed>False</human_verification_needed>
• Return all relevant articles in their original form, without additions, omissions, corrections, or comments.
• If no relevant articles about the Messina earthquake are found (e.g., if it concerns another earthquake), no special structuring is needed; simply return "No relevant article found." without further explanations.
4. Notes on segmentation:
• Ensure that articles spread across multiple paragraphs are treated as a single unit.
• Never truncate for brevity
5. Human verification needed:
• Can have the values "True" or "False"
• False: If you believe you have correctly segmented the article and assessed its relevance.
• True: If you are unsure whether you have captured the complete content of the article as contained in the newspaper document or whether it is relevant.

"""

user_prompt_french = """
Veuillez prendre en compte ces spécifications :

1. Définition d'un article : Un article est une unité sémantique dans le texte qui se distingue clairement du contenu précédent et suivant (par exemple par son propre titre).

2. Critères de pertinence : Un article est pertinent s'il traite du tremblement de terre de Messine de décembre 1908 ou de ses conséquences. Les conséquences pertinentes peuvent inclure :
• Impacts sur la population (par exemple, crises sanitaires, déplacements forcés, mesures d'aide et dons)
• Répliques sismiques et leurs effets
• Développements politiques et économiques liés au tremblement de terre

3. Format de réponse :
• Si un ou plusieurs articles pertinents sont trouvés, structurez votre réponse en utilisant des balises XML comme dans l'exemple suivant, en utilisant les balises article, verification et human_verification_needed (True ou False) : <article>contenu intégral de l'article extrait</article><verification>L'unité est-elle cohérente ? Le sujet est-il présent ? L'article est-il complet ? Tous les articles ont-ils été trouvés ?</verification><human_verification_needed>False</human_verification_needed>
• Retournez tous les articles pertinents dans leur forme originale, sans ajouts, omissions, corrections ou commentaires.
• Si aucun article pertinent sur le tremblement de terre de Messine n'est trouvé (par exemple s'il s'agit d'un autre tremblement de terre), aucune structuration particulière n'est requise ; retournez simplement "Aucun article pertinent trouvé." sans autres explications.

4. Notes sur la segmentation :
• Assurez-vous que les articles répartis sur plusieurs paragraphes sont traités comme une seule unité.

5. Nécessité de vérification humaine :
• Peut avoir les valeurs "True" ou "False"
• False : Si vous pensez avoir correctement segmenté l'article et bien évalué sa pertinence.
• True : Si vous n'êtes pas certain d'avoir capturé le contenu complet de l'article tel qu'il figure dans le document de journal ou s'il est pertinent.

Voici le document de journal :
"""


In [None]:
user_prompt_simple_french = """
Veuillez identifier et extraire les articles concernant le grave séisme de Messine en décembre 1908 ou ses conséquences dans le document de journal fourni. Si vous ne trouvez aucun article pertinent sur ce sujet, répondez simplement 'Aucun article pertinent trouvé.' Si vous trouvez un ou plusieurs articles, retournez leur contenu complet et inchangé (du début à la fin) au format XML, chaque article étant encadré par des balises <article>.

"""

user_prompt_simple_english = """
Please identify and extract articles that relate to the severe Messina earthquake in December 1908 or its aftermath in the provided newspaper document. If you don't find any relevant articles relating to this topic, simply return 'No relevant article found.' If you find one or more articles, return their full, unchanged content (beginning to end) structured in xml format, each wrapped in <article> tags.

"""

In [None]:
system_prompt2_english = """
You are an expert text analyst and information retrieval specialist and hate summarization as well as enumerations.
Your task is to carefully analyze given texts and extract complete articles that contain specific themes only on the Messina earthquake 1908 and the direkt consequences of the earthquake (until march 1909) . You never change original texts.

Classify as relevant if the text contains:
- Primary earthquake terminology from the 19th and 20th century
- Official earthquake reports
- gelogy and seismology
- Impact descriptions
- Solution description
- Technical description
- Aid
- Political discussion and opinions on earthquake
- Stories from victims and refugees
- reportings on refugees and victims
- Live of victims
- historical references
- comparisons

Your output should consist of nothing else but the the xml structure >article></article><verification></verification><human_verification_needed></human_verification_needed> or "No relevant article found."

Maintain a neutral, objective stance throughout the analysis. Focus on accuracy and completeness in your extractions"""

user_prompt2_english = """

Please follow these specifications:
1. Definition of an article: An article is a semantic unit in the text, clearly distinguished from preceding and following content (for example, may or may not have a title).
2. Relevance criteria: An article is relevant if its main subject is the Messina earthquake of December 1908 or its consequences. Other earthquakes are not relevant. The relevant consequences are mentioned in the system prompt. Make sure to check the publication date.
--> Keep international news sections together: Example Jena, January 8. The local geologist, Dr. Gravelitz, has established that the seabed of the Strait of Messina has become silted up in places following the earthquake. At some points, soundings show only fifteen feet of depth. Rome, January 9. General Mazza has telegraphed to the Prime Minister that it will be possible to recover all funds and archives of public services from the ruins of Reggio di Calabria. Railway communication between Reggio and Naples will be restored within three days.
3. Response format:
• If one or more relevant articles are found, structure your response using XML tags as shown in the following example, using the tags article, verification and human_verification_needed (True or False): <article>complete content of extracted article 1</article><article>complete content of extracted article 2</article><verification>Is the unit coherent? Is the subject present? Is the article complete? Have all articles been found?</verification><human_verification_needed>False</human_verification_needed>
• Return all relevant articles in their original form, without additions, omissions, corrections or comments. Never cut content between the beginning and end of an article.
• If no relevant articles about the Messina earthquake are found (for example, if it concerns another earthquake), no special structuring is needed; simply return "No relevant article found." without further explanation.
4. Notes on segmentation:
• Ensure that articles form a unit. Be sure to mark each separate unit (marked by a new title or new semantic unit) as a new article <article></article>)
5. Human verification needed:
• Can have values "True" or "False"
• False: If you think you have correctly segmented the article and evaluated its relevance.
• True: If you are unsure whether you have captured the complete content of the article as contained in the newspaper document or if it is relevant.
7. Check verification results and adapt response if necessary

Here is the newspaper document:
"""




## Function for article extraction
The function can be called using different model functions. It returns a list containing the separated articles.

In [None]:
# function to separate articles
def separate_articles_in_dataframe(df, text_column, model_function, user_prompt, **kwargs):

    articles = []

    i = 0
    for index, row in df.iterrows():
        i+=1
        text = row[text_column]
        article = model_function(text, user_prompt, **kwargs)  # call specific model function
        articles.append(article)
        print(f"entry {i}/{len(df)} appended.")


    return articles



## GPT: Model function and imports

In [None]:
!pip install openai==1.55.3
from openai import OpenAI
from google.colab import userdata

import os


client_openai = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))



def call_openAI_api(text, user_prompt,system_prompt, model="gpt-4o-2024-08-06", temperature=0.2, top_p=1, top_k=None, max_tokens=8000):
    prompt = f"{user_prompt} \\n\n{text}\n---\n"
    messages = []

    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})

    messages.append({"role": "user", "content": prompt})

    try:
        # Create a parameters dictionary
        params = {
            "model": model,
            "messages": messages,
            "temperature": temperature,
            "top_p": top_p,
            "max_tokens": max_tokens
        }

        # Add top_k only if it is not None
        if top_k is not None:
            params["top_k"] = top_k

        response = client_openai.chat.completions.create(**params)

    except Exception as e:
        print(f"Fehler beim Aufruf der openAI-API: {e}")
        return None

    return response.choices[0].message.content

## Model/Function for the Claude model (Anthropic)

In [None]:
!pip install anthropic
from anthropic import Anthropic
import os

client = Anthropic(api_key=userdata.get('ANTHROPIC_API_KEY'))

def call_claude_api(text, user_prompt, system_prompt=None, model="claude-3-5-sonnet-latest", temperature=0.2, top_p=1, max_tokens=8000):
   prompt = f"{user_prompt}\n\n{text}\n---\n"
   messages = []

   if system_prompt:
       messages.append({"role": "assistant", "content": system_prompt})

   messages.append({"role": "user", "content": prompt})

   try:
       response = client.messages.create(
           model=model,
           messages=messages,
           temperature=temperature,
           top_p=top_p,
           max_tokens=max_tokens
       )
       return response.content[0].text

   except Exception as e:
       print(f"Error calling Claude API: {e}")
       return None

## Deepseek function

In [None]:
from openai import OpenAI
from typing import Optional, Dict, Any

# Initialize the DeepSeek client with the correct API key and base URL
client_deepseek = OpenAI(api_key=userdata.get('DEEPSEEK_API_KEY'), base_url="https://api.deepseek.com")

def call_deepseek_api(
    text: str,
    user_prompt: str,
    model: str = "deepseek-reasoner",
    system_prompt: Optional[str] = None,
    temperature: float = 0.2,
    top_p: float = 1,
    top_k: Optional[int] = None,
    max_tokens: int = 8000,
) -> Optional[str]:
    """
    Calls the DeepSeek API to generate a response based on the provided text and prompts.

    Args:
        text (str): The input text to be processed.
        user_prompt (str): The user's prompt or instruction.
        model (str): The model to use for the API call.
        system_prompt (Optional[str]): An optional system-level prompt.
        temperature (float): Sampling temperature.
        top_p (float): Nucleus sampling parameter.
        top_k (Optional[int]): Top-k sampling parameter.
        max_tokens (int): Maximum number of tokens to generate.

    Returns:
        Optional[str]: The generated response with reasoning in <think> tags, or None if an error occurs.
    """
    # Format the prompt to explicitly request reasoning
    prompt = (
        f"Analyze the following and provide your reasoning in <think> tags, "
        f"followed by your response:\n\n"
        f"User Request: {user_prompt}\n"
        f"Text: {text}\n---\n"
    )
    messages = []

    # Add system prompt if provided
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})

    # Add user prompt
    messages.append({"role": "user", "content": prompt})

    try:
        # Create a parameters dictionary
        params: Dict[str, Any] = {
            "model": model,
            "messages": messages,
            "temperature": temperature,
            "top_p": top_p,
            "max_tokens": max_tokens,
        }

        if top_k is not None:
            params["top_k"] = top_k

        # Call the DeepSeek API
        response = client_deepseek.chat.completions.create(**params)

        # Get the response content
        content = response.choices[0].message.content

        # Structure the response with think tags if not already present
        if "<think>" not in content:
            # Get the model's reasoning from the API response
            reasoning = getattr(response.choices[0].message, 'reasoning_content', 'No explicit reasoning provided')
            # Format the final response with think tags
            formatted_response = f"<think>{reasoning}</think>\n\n{content}"
            return formatted_response

        return content

    except Exception as e:
        print(f"Error calling the DeepSeek API: {e}")
        return None