# Analyse Dataset

First, you need to connect to the Ollama server. Use your own acoount information: https://hpcproject.gwdg.de/projects/baaa32d3-6b49-4831-8b75-87ff44056ae0/ 

In [None]:
import subprocess
import time
import os

# Kill any existing processes
!pkill -f ollama
time.sleep(2)

# Start server
process = subprocess.Popen([
    '/user/sarah.oberbichler/u18915/bin/ollama', 'serve'
], env=os.environ.copy())

print(f"Server started (PID: {process.pid})")
time.sleep(8)

# Test server
result = subprocess.run([
    '/user/sarah.oberbichler/u18915/bin/ollama', 'list'
], capture_output=True, text=True)

print(f"Server status: {result.returncode}")
print(f"Models: {result.stdout}")

### Load the dataset (SummerSchool_dataset.xlsx).
You can find it here: https://github.com/soberbichler/Workshop_QualitativeDataResearch_LLM/blob/main/data/SummerSchool_dataset.xlsx

In [None]:
!pip install openpyxl
import pandas as pd

df = pd.read_excel('SummerSchool_dataset.xlsx')
df.head()

In [None]:
# Load the model 
def warm_up_model():
    """Warm up the model with simple prompts."""
    print("Warming up model...")
    
    for prompt in ["Hi", "What is 2+2?", "Say hello"]:
        try:
            ask_olmo_api(prompt)
            print(".", end="", flush=True)
        except:
            pass
    
    print("\nWarmup complete!")

warm_up_model()

### Run the Model

Add your prompt and you can also adjust the model parameter

After running the cell, fill in the model documentation while waiting (and continue after saving the results)!

Model documentation: https://seafile.rlp.net/seafhttp/f/a5b34ec61267408da431/?op=view

In [None]:
def ask_olmo_api(text, temperature=0.1, max_tokens=1000):
    """Simple OLMo API call"""
    prompt = f"""You are an expert at analyzing historical texts and you hate to summarize. Please find argumentative units in historical articles. Arguments in newspapers are often implicit but should contain a clear premise (with an inclusice claim)
OUTPUT FORMAT - EXACTLY these 4 XML tags and NOTHING else:
<argument>Original argument text OR "NA"</argument>
<claim>Core claim (implication) in one sentence OR "NA"</claim>
<explanation>Why this is an argument OR "NA"</explanation>
<human_verification_needed>True OR False</human_verification_needed>
EXAMPLE WITH ARGUMENT:
<argument>Es sind furchtbare Bilder, die sich dabei entrollen. Unter den Trümmern des einen Hause», so erzählt Luigt Barsint im Corrtcre della sera, findet man die Leichen von Unglück lichen, die in anderen Häusern gewohnt baben und die in der Ber- Wirrung de» schrcck.ichen Augenblickes instinktiv bet Fremden Hülfe und Unterschlupf suchten. Niemand erkennt jetzt diese armen Ein dringlinge, ihre Leichen werden nicht reklamiert, und man trägt sie hinunter an de» Strand, wo sie in langer Reihe einer neben den anderen hingebettet weiden, in denselben Tüchern und Decken, in denen sie tbren Tod gesunden.</argument>
<claim>The earthquake's chaos led to unidentified victims dying in unfamiliar places.</claim>
<explanation>Describes how people fled to other houses seeking help during the disaster, died there, and now cannot be identified or claimed by relatives. Shows cause (panic/confusion) and effect (anonymous deaths).</explanation>
<human_verification_needed>False</human_verification_needed>
EXAMPLE WITHOUT ARGUMENT:
<argument>NA</argument>
<claim>NA</claim>
<explanation>NA</explanation>
<human_verification_needed>FALSE</human_verification_needed>
RULES:
- NO SUMMARY; ONLY ORIGINAL EXTRACTOM FROM THE TEXT; don't extract anything that is not in the text. Only extract word by word
- ONLY output these 4 XML tags
- Factual reportings such as "Dem Vulkanausbruch folgten drei Sturzwellen in etwa 10 Meter Höhe" are NO arguments
- Extract only original text without changes or use NA when you did not find an argument
- The claim is not a translation of summary of argument. It should say what the (implicite) argument implies
- In cases of uncertainty or ambiguity, say human_verification_needed TRUE
- If no argument exists, use NA for all fields except <human_verification_needed>FALSE or TRUE</human_verification_needed>
- More than one argumentative unit possible for one article, one unit has one clear claim and all the xml structures

Extract the arguments from this text:
{text}"""
    
    payload = {
        "model": "MHKetbi/allenai_OLMo2-0325-32B-Instruct:q5_K_S",
        "messages": [{"role": "user", "content": prompt}],
        "stream": False,
        "options": {
            "temperature": temperature,
            "num_predict": max_tokens
        }
    }
    
    try:
        response = requests.post('http://127.0.0.1:11434/api/chat', json=payload)
        return response.json()['message']['content']
    except:
        return "Error"

# Apply properly
df['model_answer_olmo'] = df['extracted_articles'].apply(lambda x: ask_olmo_api(x))


### Export the Dataset and name it differently than "results"

In [None]:
df.to_excel('results.xlsx', index=False)