# Acting Like Humans? Evaluating Large Language Models as Proxies in Linguistic Experiments


This notebook refers to the paper "Acting Like Humans? Evaluating Large Language Models as Proxies in Linguistic Experiments", which aims to replicate linguistic experimental pipelines with human participants using LLMs.


It is intended to be used for further research.



# Code structure: #

In the **first block of the code**, some libraries (such as OpenAI) are imported. They provide us with certain functions/applications that are already "ready-to-use," so we don’t have to code them explicitly.

Furthermore, the second cell is intended to be used as a test to ensure that everything has been imported correctly.

In [None]:
%%capture
!pip install openai==1.55.3 httpx==0.27.2 --force-reinstall --quiet

In [None]:
import os

os.environ["OPENAI_API_KEY"] =
from openai import OpenAI

client = OpenAI()

# prompt example, a test:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user", "content": "Say this is a test, it works",
        }
    ],
    model="gpt-4o-mini",
)
print(chat_completion.choices[0].message.content)

In the **second block** of the code, the functions for prompt engineering are written, which we will use for our analysis. Example functions include zero-shot and few-shot prompting. However, you are welcome to try other prompting techniques that we have explored in the seminar or that you have found online. This helps in conducting new and improved experiments.

In [None]:
import openai
from openai import OpenAI

# Function to perform zero-shot prompting
def zero_shot_prompting(task, prompt):
    """
    Performs zero-shot prompting by sending a prompt to the language model
    without providing any prior examples.

    Args:
    task (str): The task to be performed.
    prompt (str): The text input sent to the language model.

    Returns:
    str: The response from the language model.
    """
    # Create the chat message and send it to the language model
    response = client.chat.completions.create(
        model="o4-mini",
        messages=[
            {"role": "system", "content": "You are an assistant"},  # CHANGE HERE IF DESIRED
            {"role": "user", "content": prompt}  # User message with the actual prompt
        ]
    )
    # Return the content of the model’s first response message
    return response.choices[0].message.content


# Function to perform few-shot prompting
def few_shot_prompting(task, examples, prompt):
    """
    Performs few-shot prompting by providing some examples
    before sending the actual prompt to the language model.

    Args:
    task (str): The task to be performed.
    examples (list of dict): A list of examples, each example being a dictionary with 'input' and 'output'.
    prompt (str): The text input sent to the language model.

    Returns:
    str: The response from the language model.
    """
    # Initialize the messages list with a system message
    messages = [{"role": "system", "content": "You are an assistant"}]  # CHANGE HERE IF DESIRED

    # Add the examples to the messages
    for example in examples:
        messages.append({"role": "user", "content": example['input']})
        messages.append({"role": "assistant", "content": example['output']})

    # Add the actual prompt to the messages list
    messages.append({"role": "user", "content": prompt})

    # Create the chat message and send it to the language model
    response = client.chat.completions.create(
        model="o4-mini",
        messages=messages
    )
    # Return the content of the model’s first response message
    return response.choices[0].message.content


# Replication pipeline step 1
In the **third block** of the code you find code so that you can read in your uploaded data. This will be helpful in presenting the data to the LLM during prompting.

Data handling code is provided for both replications. According to the data read in the cell, choose the corresponding passage and comment the other one out.

In [None]:
import pandas as pd
import time
import json
import gc
from psutil import virtual_memory
from datetime import datetime

# Load data
daten = 'Lombard_replicat_appendix.csv'  # Adjust the filename as needed
df = pd.read_csv(daten)

# Extract specific columns and pack them into a dictionary
columns_to_extract = ['change', 'regularity', 'process', 'neologism', 'target_sent']
selected_data = df[columns_to_extract].astype(str)
stimuli_dict = selected_data.to_dict(orient="index")

# Create list of items
all_items = [entry['target_sent'] for entry in stimuli_dict.values()]
print("The materials for the study are:")
print(all_items)

# Replication pipeline step 2
In the **fourth block** we can apply the prompt engineering functions of block 2. We will formulate our prompts in this cell. In this block, we can test different prompting strategies on one LLM-query and on a limited subset of the dataset.

In [None]:
# To test one "LLM-participant" with a subset of the items

# Preparing to store the model's responses
answers_zero_shot = {}

# === Zero-Shot Prompting ===
print("=== Zero-Shot ===")

# Iterating through the subset for Zero-Shot
for i, text in enumerate(all_items[:3]):
    zero_shot_prompt = f"Insert your instructions here: '{text}'"

    # Performing Zero-Shot Prompting
    zero_shot = zero_shot_prompting("Task description", zero_shot_prompt)

    # Storing the response in the dictionary with the index as the key
    answers_zero_shot[i] = {
        "Prompt": zero_shot_prompt,
        "Response": zero_shot
    }

    # Printing the input and the corresponding output
    print(f"Input: {text}")
    print(f"Output: {zero_shot}")
    print()


In the **fifth block** we save the results in an Excel file. This file stores the answers of ONE LLM-participant.

In [None]:
import pandas as pd
import openpyxl
from openpyxl.utils import get_column_letter
from datetime import datetime
import csv


def convert_csv_to_excel(csv_path, excel_path):
    """
    Konvertiert eine CSV-Datei in eine Excel-Datei (.xlsx).
    """
    df = pd.read_csv(csv_path)
    df.to_excel(excel_path, index=False)

def load_or_create_excel(file_path):
    """
    Lädt eine bestehende Excel-Datei oder erstellt eine neue, falls die Datei nicht existiert.
    """
    try:
        workbook = openpyxl.load_workbook(file_path)
        sheet = workbook.active
    except FileNotFoundError:
        workbook = openpyxl.Workbook()
        sheet = workbook.active
        print(f"Datei '{file_path}' nicht gefunden. Das ist ein Fehler, schauen Sie den Namen der Datei, die im 3. Block hochgeladen wird")
    return workbook, sheet

def add_columns_to_excel(sheet, new_columns):
    """
    Fügt neue Spalten zur Excel-Datei hinzu.
    """
    existing_columns = sheet.max_column
    for idx, col_name in enumerate(new_columns, start=existing_columns + 1):
        sheet[f"{get_column_letter(idx)}1"] = col_name

def add_data_to_excel(sheet, data_dict, start_row):
    """
    Fügt die Inhalte eines Dictionaries zur Excel-Tabelle hinzu.
    """
    for idx, (key, entry) in enumerate(data_dict.items(), start=start_row):
        sheet[f"A{idx}"] = key + 1
        sheet[f"B{idx}"] = entry['Antwort']
        sheet[f"C{idx}"] = entry['Prompt']
        sheet[f"D{idx}"] = entry['Antwort']  # Antwort-Original
        sheet[f"E{idx}"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print("Daten hinzugefügt.")

def save_excel(workbook, file_path):
    """
    Speichert das Workbook in die angegebene Datei.
    """
    workbook.save(file_path)
    print(f"Datei erfolgreich gespeichert unter: {file_path}")

def extend_csv(csv_input_path, csv_output_path, data_dict):
    """
    Erweitert eine CSV-Datei direkt um neue Spalten und speichert sie als neue CSV-Datei.
    """
    with open(csv_input_path, mode='r', newline='') as infile, open(csv_output_path, mode='w', newline='') as outfile:
        reader = csv.DictReader(infile)
        fieldnames = reader.fieldnames + ["Prompt", "Antwort_vom_Modell", "Datum"]
        writer = csv.DictWriter(outfile, fieldnames=fieldnames)

        writer.writeheader()

        # Wir gehen nur so weit wie das data_dict Einträge hat
        for idx, row in enumerate(reader):
            if idx < len(data_dict):  # Nur verarbeiten, wenn ein entsprechender Eintrag im Dictionary existiert
                row["Prompt"] = data_dict[idx]["Prompt"]
                row["Antwort_vom_Modell"] = data_dict[idx]["Antwort"]
                row["Datum"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                writer.writerow(row)

    print(f"Erweiterte CSV-Datei gespeichert unter: {csv_output_path}")


# --- Parameter ---
csv_file_path = daten
excel_file_path = "ergebnisse_ein_prob.xlsx"
csv_output_path = "ergebnisse_ein_prob.csv" # diese Datei kann auch hier auf Google Colab mit Doppelklick geöffnet werden

# 1. Konvertiere CSV -> Excel und erweitere diese
convert_csv_to_excel(csv_file_path, excel_file_path)
workbook, sheet = load_or_create_excel(excel_file_path)
add_columns_to_excel(sheet, ["Prompt", "Antwort_vom_Modell", "Datum"])
add_data_to_excel(sheet, antworten_null_shot, start_row=sheet.max_row + 1) # HIER ÄNDERN: antworten_null_shot --> antworten_few_shot, etc.
save_excel(workbook, excel_file_path)

# 2. Alternativ: CSV direkt erweitern
extend_csv(csv_file_path, csv_output_path, antworten_null_shot) # HIER ÄNDERN: antworten_null_shot --> antworten_few_shot, etc.




# Replication pipeline step 3
In the **sixth block** we repeat the prompting that worked best for as many times as we have (or want to have) subjects and with all data in the corpus.

In [None]:
antworten_null_shot_all = {}
probanden_zahl= 68
reaction_time_per_participant=list()

import time  # Import the time module

for iteration in range(1, probanden_zahl + 1):  # Repeat the process
    print(f"### LLM-informant {iteration} ###")

    antworten_null_shot = {}  # Dictionary for null-shot responses in this iteration

    # === Null-Shot ===
    # Record the start time
    start_time_pro_part = time.time()

    for i, text in enumerate(all_items):
        start_time = time.time()
        zero_shot_prompt = f"Vous êtes de langue maternelle française. Vous participez à une étude. Indiquez si la phrase suivante contient un mot nouveau ou un mot existant employé avec un sens nouveau simplement par 'oui' ou 'non': '{text}' Si ce mot existe, indiquez-le sans l'expliquer"
        zero_shot = zero_shot_prompting("choix binaire", zero_shot_prompt)

        # Record the end time and calculate the elapsed time for this sentence
        elapsed_time = time.time() - start_time

        # Save the response in the dictionary with the index as the key
        antworten_null_shot[i] = {
            "Prompt": zero_shot_prompt,
            "Antwort": zero_shot,
            "Zeit": elapsed_time
        }

        #print(f"Input: {text}")
        #print(f"Output: {zero_shot}")
        print(i,f"Time taken: {elapsed_time:.2f} seconds")
        print()

    # Calculate elapsed time for the iteration
    elapsed_time_pro_part = time.time() - start_time_pro_part

    # Save the null-shot results of this iteration in the parent dictionary
    antworten_null_shot_all[iteration] = antworten_null_shot
    reaction_time_per_participant.append(elapsed_time_pro_part)
    print(f"Elapsed time for iteration {iteration}: {elapsed_time_pro_part:.2f} seconds")
    print()

print(antworten_null_shot_all)
print("Mean reaction time per participant:", sum(reaction_time_per_participant)/len(reaction_time_per_participant))




In the **seventh block** we save the results from all test subjects in an Excel and csv file. These files subsequently store the answers of MULTIPLE test LLMs-participants.

In [None]:
import pandas as pd
from datetime import datetime

# Dictionary to DataFrame conversion
def dict_to_dataframe(antworten_dict):
    rows = []
    for iteration, prompts in antworten_dict.items():
        for index, entry in prompts.items():
            for val in stimuli_dict.values():
                prompt_text = entry['Prompt']
                #print(val['target_sent'].replace("'",""))
                #print(prompt_text.split("'non':")[1].split("Si")[0].strip().replace("'", ""))
                if val['target_sent'].replace("'","").strip() == prompt_text.split("'non':")[1].split("Si")[0].strip().replace("'", ""):
                  #print("yes",val['target_sent'].replace("'","").strip(),prompt_text.split("'non':")[1].split("Si")[0].strip().replace("'", ""))
                  rows.append({
                        "neologism": val['neologism'],
                        "sentences": val['target_sent'],
                        "change": val['change'],
                        "regularity": val['regularity'],
                        "process": val['process'],
                        "Informant": iteration,
                        "Index": index,
                        "Prompt": entry["Prompt"],
                        "models_answer": entry["Antwort"],
                        "time": entry["Zeit"],
                        "date": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                    })
                else:
                  continue
                  #print("no",val['target_sent'].replace("'","").strip(),prompt_text.split("'non':")[1].split("Si")[0].strip().replace("'", ""))
    return pd.DataFrame(rows)

# Convert the dictionary to a DataFrame
new_data_df = dict_to_dataframe(antworten_null_shot_all)

# Save the DataFrame as new files
new_data_df.to_excel('3neolog_all_nullshot_o4mini.xlsx', index=False)
new_data_df.to_csv('3neolog_all_nullshot_o4mini.csv', index=False)

print("Data successfully saved!")


# Replication pipeline step 4
Finally, we evaluate the results.

In [None]:
# Eval neologisms

import pandas as pd
import csv
import re
from collections import Counter

# Load results
results_file = 'neolog_CoT_all.csv.csv'

# Counters
correct_oui = 0
correct_neo = 0
fillers = 0
fillers_wrong = 0
mistakes_no_neo = {}
mistakes_no_oui = {}
filler_mistakes = {}
filler_error_list = []
without_neo = 0
wrong_neo = 0
mistakes_no_oui_list=[]
wrong_neo_after_oui = 0

# Helper functions
def clean_answer(answer):
    """Normalize whitespace, remove commas and colons, and clean up the answer string."""
    answer = answer.lower().strip()
    answer = answer.replace(',', ' ')  # Replace commas with spaces
    answer = answer.replace(':', ' ')  # Replace colons with spaces
    answer = re.sub(r'\s+', ' ', answer)  # Collapse multiple spaces/newlines/tabs into one space
    return answer

def extract_word(answer):
    """Extract the word after 'oui'."""
    parts = answer.split()
    if len(parts) > 1:
        return parts[1]
    else:
        return None

correct_neo_conditions = {
    ('Morphological', 'Irregular'): 0,
    ('Morphological', 'Regular'): 0,
    ('Semantic', 'Irregular'): 0,
    ('Semantic', 'Regular'): 0,
}

# Read CSV
rows=[]
with open(results_file, mode='r') as file:
    csv_reader = csv.reader(file)
    header = next(csv_reader, None)  # Skip header if present

    for row in csv_reader:
        neologism = row[0].lower().strip()
        rows.append(neologism)
        answer = clean_answer(row[8])

        if row[2] != "Filler":
            if answer.startswith("oui"):
              neo = extract_word(answer)

              if neo:
                  if neo.startswith(neologism[:-3]):
                      correct_oui += 1
                      correct_neo += 1
                      condition_key = (row[2].strip(), row[3].strip())
                      if condition_key in correct_neo_conditions:
                          correct_neo_conditions[condition_key] += 1
                      else:
                          print("Unexpected condition key:", condition_key)
                  else:
                      wrong_neo += 1
                      wrong_neo_after_oui += 1
                      print("wrong neo", neologism, neo)
              else:
                  correct_oui += 1  # Accept if "oui" without any neo (e.g., "oui." or "oui :)")
                  without_neo += 1
                  key = (row[5], row[6])
                  mistakes_no_neo[key] = (row[0].strip(), row[2].strip(), row[3].strip(), row[4].strip())

            else:
              key = (row[5], row[6])
              mistakes_no_oui[key] = (row[0].strip(), row[2].strip(), row[3].strip(), row[4].strip())
              mistakes_no_oui_list.append(answer)

        else:
            fillers += 1
            if answer.startswith("oui"):
                fillers_wrong += 1
                filler_mistakes.setdefault(row[1], answer)
                filler_error_list.append(row[1])

# fillers
wrong_fillers_counts = dict(Counter(filler_error_list))
print("Wrong fillers:", fillers_wrong)
print("Wrong filler counts:", wrong_fillers_counts)

# Note: You have 'reaction_time_per_participant' in your original script,
# but it is not defined anywhere. Commenting it out for now.

# whole_duration = sum(reaction_time_per_participant)
# print("Mean reaction time per participant:", sum(reaction_time_per_participant) / len(reaction_time_per_participant))
# print("Whole study duration:", whole_duration)

print("Total responses:", len(rows))
print("Correct 'oui' responses:", correct_oui)
print("Correct neologism guesses:", correct_neo)
print("Correct but without neo", without_neo)
print("Wrong neologisms", wrong_neo)
print("Fillers:", fillers)
print("Mistakes --> critical answers without oui", len(mistakes_no_oui_list))

# Error analysis
morph_reg = 0
morph_irreg = 0
sem_reg = 0
sem_irreg = 0

print()
print("mistakes_no_neo",mistakes_no_neo)
for val in mistakes_no_neo.values():
    task_type, reg_type = val[1], val[2]
    if task_type == 'Morphological' and reg_type == 'Irregular':
        morph_reg += 1
    elif task_type == 'Morphological' and reg_type == 'Regular':
        morph_irreg += 1
    elif task_type == 'Semantic' and reg_type == 'Irregular':
        sem_reg += 1
    elif task_type == 'Semantic' and reg_type == 'Regular':
        sem_irreg += 1
    else:
        print("Unknown type:", val)

wrong_target_words = [val[0] for val in mistakes_no_neo.values()]
mistakes_counts = dict(Counter(wrong_target_words))
'''print("Mistakes no neo:", mistakes_no_neo)
print("Morphological regular mistakes:", morph_irreg)
print("Morphological irregular mistakes:", morph_reg)
print("Semantic regular mistakes:", sem_irreg)
print("Semantic irregular mistakes:", sem_reg)
print("Wrong target words frequency:", mistakes_counts)
val_mistakes_counts=list(mistakes_counts.values())
print("sum mistakes no neo:", sum(val_mistakes_counts))'''

morph_reg = 0
morph_irreg = 0
sem_reg = 0
sem_irreg = 0
print()
for val in mistakes_no_oui.values():
    task_type, reg_type = val[1], val[2]
    if task_type == 'Morphological' and reg_type == 'Irregular':
        morph_reg += 1
    elif task_type == 'Morphological' and reg_type == 'Regular':
        morph_irreg += 1
    elif task_type == 'Semantic' and reg_type == 'Irregular':
        sem_reg += 1
    elif task_type == 'Semantic' and reg_type == 'Regular':
        sem_irreg += 1
    else:
        print("Unknown type:", val)

'''
print("Mistakes no oui:", mistakes_no_oui)
print("Morphological regular mistakes:", morph_irreg)
print("Morphological irregular mistakes:", morph_reg)
print("Semantic regular mistakes:", sem_irreg)
print("Semantic irregular mistakes:", sem_reg)'''

########################################

# Total expected positives (non-filler responses)
expected_positives = len(rows) - fillers

# General mistakes: answers not starting with "oui" when they should
general_mistakes = len(mistakes_no_oui_list)
correct_general = expected_positives - general_mistakes
accuracy_general = correct_general / expected_positives

'''print("\nCorrect 'oui' + correct neologism by condition:")
for cond, count in correct_neo_conditions.items():
    print(f"{cond[0]} {cond[1]}: {count}")'''

# Neologism mistakes: answered "oui" but did not provide correct neologism
correct_neo_plus_oui = correct_neo  # already defined in your original script
accuracy_neo = correct_neo_plus_oui / expected_positives

# Correct 'oui' responses (may or may not include correct neologism)
accuracy_oui_only = correct_oui / expected_positives
# Percentage of 'oui' answers with wrong neologisms
percent_wrong_neo_after_oui = wrong_neo_after_oui / correct_oui if correct_oui else 0

print("\n===== ACCURACY REPORT =====")
print(f"Expected positive answers (non-fillers): {expected_positives}")
print(f"Correct answers with 'oui' (excluding wrong neologisms): {correct_oui}")
print(f"Correct answers with 'oui' + correct neologism: {correct_neo_plus_oui}")
print(f"Mistakes (no 'oui' when expected): {general_mistakes}")
print(f"Accuracy (general 'oui' recognition): {accuracy_general:.2%}")
print(f"Accuracy (exact match with neologism): {accuracy_neo:.2%}")
print()
print(f"Percentage of 'oui' answers with wrong neologism: {percent_wrong_neo_after_oui:.2%}")


# Breakdown by condition
condition_counts = {
    ('Morphological', 'Irregular'): {'total': 0, 'correct_oui': 0, 'correct_neo': 0},
    ('Morphological', 'Regular'): {'total': 0, 'correct_oui': 0, 'correct_neo': 0},
    ('Semantic', 'Irregular'): {'total': 0, 'correct_oui': 0, 'correct_neo': 0},
    ('Semantic', 'Regular'): {'total': 0, 'correct_oui': 0, 'correct_neo': 0}
}

# Reload and reprocess to count per condition
with open(results_file, mode='r', encoding='utf-8') as file:
    csv_reader = csv.reader(file)
    header = next(csv_reader, None)

    for row in csv_reader:
        if row[2] == "Filler":
            continue

        task_type = row[2].strip()
        reg_type = row[3].strip()
        condition = (task_type, reg_type)
        answer = clean_answer(row[8])
        target = row[0].strip().lower()
        neo = extract_word(answer)

        if condition not in condition_counts:
            print("Warning: Unknown condition:", condition)
            continue

        condition_counts[condition]['total'] += 1

        if answer.startswith("oui"):
            condition_counts[condition]['correct_oui'] += 1

            if neo and neo.startswith(target[:-3]):
                condition_counts[condition]['correct_neo'] += 1
            #else:
              #print("no neo", answer)

# Print condition breakdown
print("\n===== CONDITION BREAKDOWN =====")
for condition, stats in condition_counts.items():
    total = stats['total']
    oui = stats['correct_oui']
    neo = stats['correct_neo']
    acc_oui = oui / total if total else 0
    acc_neo = neo / total if total else 0
    print(f"{condition[0]} - {condition[1]}:")
    print(f"  Total: {total}")
    print(f"  'Oui' correct: {oui} ({acc_oui:.2%})")
    print(f"  Neologism correct (both no neo and wrong neo): {neo} ({acc_neo:.2%})")

accuracy_fillers = 1 - (fillers_wrong / fillers) if fillers else 0
print(f"\nAccuracy on fillers: {accuracy_fillers:.2%}")

In [None]:
# Eval neologisms old

import pandas as pd
import csv
from collections import Counter

# Results hochladen
results= 'all_neolog_CoT_o4-mini.csv'

correct_oui = 0
correct_neo = 0
fillers = 0
fillers_wrong = 0
mistakes = {}
filler_mistakes = {}
filler_error_list=[]

# Open the CSV file
with open(results, mode='r', encoding='utf-8') as file:
    csv_reader = csv.reader(file)

    # Optional: Get the header if the file has one
    header = next(csv_reader, None)  # Skip the header if present

    # Access and process each line
    for row in csv_reader:
        #print(row)
        neologism = row[0].lower().strip()
        answer = row[8].lower().strip()
        #print(answer)

        if row[2] != "Filler":
          #print(answer)
          if answer.startswith("oui"):
            correct_oui +=1
            if len(answer)>3:
              try:
                neo = answer.split(" ")[1].strip()
              except IndexError:
                continue

              #print(neologism, neo)
              if neo.startswith(neologism):
                correct_neo +=1

          else:
            key= row[5], row[6]
            mistakes[key] = row[0].strip(),row[2].strip(),row[3].strip(),row[4].strip()

        elif row[2] == "Filler":
          fillers +=1
          if answer.startswith("oui"):
            fillers_wrong +=1
            if row[1] not in filler_mistakes.keys():
              filler_mistakes[row[1]] = [answer]
            else:
              filler_mistakes[row[1]].append(answer)

            filler_error_list.append(row[1])

        else:
          print(row[1], answer)

wrong_fillers_counts= dict(Counter(filler_error_list))
wrong_target_words = []
for m in mistakes.values():
  wrong_target_words.append(m[0])
mistakes_counts = dict(Counter(wrong_target_words))

print("wrong fillers",fillers_wrong)
print(wrong_fillers_counts)


whole_duration=sum(reaction_time_per_participant)
print("tot responses", correct_oui+len(mistakes))
print("correct",correct_oui)
print("fillers", fillers) #40
print("mistakes",len(mistakes), ":", mistakes)
#print("wrong target words",set(wrong_target_words))
print("wrong target words freq",mistakes_counts)
print("mean reaction time per participant:", sum(reaction_time_per_participant)/len(reaction_time_per_participant))
print("whole study duration:", whole_duration)

# error analysis
morph_reg=0
morph_irreg=0
sem_reg=0
sem_irreg=0
for val in mistakes.values():
  #print(val)
  if val[1] == 'Morphological' and val[2] == 'Irregular':
      morph_reg +=1
  elif val[1] == 'Morphological' and val[2] == 'Regular':
      morph_irreg +=1
  elif val[1] == 'Semantic' and val[2] == 'Irregular':
      sem_reg +=1
  elif val[1] == 'Semantic' and val[2] == 'Regular':
      sem_irreg +=1
  else:
    print(val)

print("Morphological irregular", morph_irreg)
print("Morphological regular", morph_reg)
print("Semantic irregular", sem_irreg)
print("Semantic regular", sem_reg)

**Für Python-Anfänger:**
- schauen Sie gerne auf Google, falls Sie eine oder eine andere Aufgabe nicht lösen können. Im Gegensatz zur geisteswissenschaftlichen Arbeit ist Google beim Programmieren immer unser Freund. Bei Fehlermeldungen kann man den ausgegebenen Text copy pasten und in Google die Bedeutung suchen.
- wenn Sie Teile vom Code nicht 100% greifen können, können Sie erstmals versuchen, ChatGPT zu prompten und es fragen, Ihnen die Inhalte für Programmieren-Laien zu erklären.
- Am Anfang fühlt sich wahrscheinlich alles schwer/unklar an. Bitte melden Sie sich bei mir oder kommen Sie in meine Sprechstunde. Ich freue mich, Ihnen helfen zu können! Peer reviews/Feedback ist auch super :)



Paar **Infos zum Start**:

* Alles, was durch das Symbol "#" vorausgesetzt wird, ist ein Kommentar. Das wird vom Programm nicht gelesen.

* Wenn wir längere Abschnitte für das Programm auskommentieren wollen, können wir """text""" (also " x3) oder '''text''' (also ' x3) benutzen. Diese Option streckt sich auch über verschiedene Zeilen.

* Einrücken (der leere Platz am Anfang einiger Zeilen, auf der Tastatur einmal «Tab» oder viermal «Leertaste» drücken) spielt eine grundlegende Rolle in Python. Einrücken = eine Gruppierung/hierarchische Beziehung zwischen der ersten Zeile und den unterligenden Zeilen. https://www.youtube.com/watch?v=m2UDI5Fy6qw

* Um sicher zu gehen, dass das Programm wirklich macht, was wir wollen, kann man sich zwischendurch das Resultat der ausgeführten Aufgabe ausgeben lassen. Dafür benutzt man print(was man printen möchte). Print kann man auch zum Hinzufügen von eigenen Kommentaren benutzt werde, die zur Veranschauulichung von "was gerade im System verarbeitet wird" dienen.

* Eine Variabel (wie in Mathe) ist ein Begriff, der eigentlich Platzhalter für etwas anders ist (siehe das kommende Beispiel, in dem text, text_split1 etc. Variabeln sind und im Code dann für alles stehen, was rechts der Gleichung stehen)

 - eine Variabel schreibt man so: text= "blablabla"

 - ein string schreibt man so: *"text"* oder 'text'

 - print(text) --> blablabla

 - print("text") --> text

* Manchmal sind die benutzten libraries (wie z.B. nltk, matplotlib, numpy usw.) noch nicht installiert. Das macht man so: !pip install name_der_library. Wenn das nicht funktioniert ist der Name der library vielleicht falsch geschrieben. Einfach googeln!



Kurzes Cheat Sheet: https://www.mintpepper.ch/local/workshopDateien/beilage-produktentwicklung-software.pdf
