<a href="https://colab.research.google.com/github/mikelsegura/fstvsicl/blob/main/ICL_orthography_normalizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# N-Shot Learning Experiment Notebook

This notebook demonstrates a zero-shot and few-shot learning experiment for orthographic standardization tasks.

## 1. Setup and Imports

In [None]:
!pip install -q tiktoken
import pandas as pd
from google.colab import userdata
from openai import OpenAI
from datetime import datetime
import time
import tiktoken

## 2. Configuration

Set up the path for data files and initialize API client:

In [None]:
repo_url = "https://github.com/MCL-Lab-mx/fstvsicl.git"

!git clone {repo_url}

%cd fstvsicl

BASE_PATH = "otomi/"
OUTPUT_PATH = "otomi/colab_outputs/"
!mkdir -p {OUTPUT_PATH}

client_gpt = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))
client_llama = OpenAI(api_key = userdata.get('LLAMA_API_KEY'), base_url = "https://api.llamaapi.com")

Cloning into 'fstvsicl'...
remote: Enumerating objects: 102, done.[K
remote: Counting objects: 100% (102/102), done.[K
remote: Compressing objects: 100% (47/47), done.[K
remote: Total 102 (delta 63), reused 75 (delta 53), pack-reused 0 (from 0)[K
Receiving objects: 100% (102/102), 726.53 KiB | 11.53 MiB/s, done.
Resolving deltas: 100% (63/63), done.
/content/fstvsicl/fstvsicl


## 3. Helper Functions

### 3.1 Formatting Few-Shot Examples

This function takes source and target sentences and formats them into a prompt using a template:

In [None]:
def format_few_shot_examples(
    support_set,
    num_examples,
    instruction_template,
    source_column="SOURCE",
    target_column="TARGET"
):
    source_sentences = support_set[source_column].values
    target_sentences = support_set[target_column].values
    """Formats few-shot examples into a prompt using dynamic column names."""
    examples = "".join(
        f"{i+1}. {source_column}: {source_sentences[i]}\n"
        f"{target_column}: {target_sentences[i]}\n\n"
        for i in range(num_examples)
    )
    return instruction_template.replace("{examples}", examples)

### 3.2 Calling the Model

This function sends the prompt to the API Client and returns the response:

In [None]:
def call_model(prompt, model_name, model_client, temperature):
    response = model_client.chat.completions.create(
        model=model_name,
        temperature=temperature,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

## 4. Main Experiment Function

This function runs either zero-shot or few-shot experiments:

In [None]:
# Prices are expressed per 1,000,000 tokens
MODEL_PRICING = {
    "gpt-3.5-turbo": {"prompt": 1.5, "completion": 2.0},
    "gpt-4o": {"prompt": 2.5, "completion": 10.0},
    "llama3.3-70b": {"prompt": 2.8, "completion": 2.8},
    "llama3.1-70b": {"prompt": 2.8, "completion": 2.8},
}


In [None]:
def run_experiment(
    language,
    experiment_type,
    model_name,
    model_client,
    test_set,
    support_set=None,
    source_column="SOURCE",
    target_column="TARGET",
    num_few_shot_examples=0,
    instruction_template="",
    temperature=0.2,
    num_retries=3,
    quiet=False
):
    """Runs a zero-shot or few-shot experiment, retries on API failures,
    auto-counts tokens for cost, and saves results & logs cost."""
    # prepare tokenizer for local token counting
    try:
        encoder = tiktoken.encoding_for_model(model_name)
    except KeyError:
        encoder = tiktoken.get_encoding("cl100k_base")  # fallback

    test_sentences = test_set[source_column].values

    # prepare few-shot instruction if needed
    if experiment_type == "few-shot" and support_set is not None:
        instruction = format_few_shot_examples(
            support_set,
            num_few_shot_examples,
            instruction_template,
            source_column,
            target_column
        )
    else:
        instruction = instruction_template

    results = []
    total_cost = 0.0

    for i, test_sentence in enumerate(test_sentences, 1):
        # construct prompt with dynamic labels
        prompt = f"{instruction}{source_column}: {test_sentence}\n\n{target_column}: [Your prediction here]"

        # retry logic for unstable API calls
        attempt = 0
        while True:
            try:
                response = call_model(
                    prompt,
                    model_name,
                    model_client,
                    temperature
                )
                break
            except Exception as e:
                attempt += 1
                if attempt >= num_retries:
                    print(f"Iteration {i}: Failed after {num_retries} attempts. Error: {e}")
                    raise
                wait_time = 2 ** (attempt - 1)
                print(f"API call failed (attempt {attempt}/{num_retries}), retrying in {wait_time}s...")
                time.sleep(wait_time)

        # extract generated text
        if isinstance(response, dict):
            try:
                result_text = response["choices"][0]["message"]["content"]
            except KeyError:
                result_text = response["choices"][0]["text"]
        else:
            result_text = response

        # local token counting for cost
        prompt_tokens = len(encoder.encode(prompt))
        result_tokens = len(encoder.encode(result_text))

        # compute cost
        rates = MODEL_PRICING.get(model_name, {})
        cost = (prompt_tokens / 1_000_000) * rates.get("prompt", 0.0) + \
               (result_tokens / 1_000_000) * rates.get("completion", 0.0)
        total_cost += cost
        results.append([test_sentence, result_text])

        if not quiet:
            print(
                f"Iteration {i}:\n"
                f"{source_column} = {test_sentence}\n"
                f"{target_column} = {result_text}\n"
                f"Cost = ${cost:.6f}\n"
            )
        else:
            print(f"\r{i}/{len(test_sentences)}", end="")

    # save results
    output_filename = f"{language}_{model_name}_{experiment_type}_{source_column}_{target_column}"
    df_results = pd.DataFrame(results, columns=[source_column, target_column])
    out_path = f"{OUTPUT_PATH}{output_filename}.tsv"
    df_results.to_csv(out_path, sep="\t", index=False)
    print(f"\nExperiment results saved to '{out_path}'.")

    # log total cost
    cost_entry = (
        f"{datetime.now().isoformat()}\t{experiment_type}\t{model_name}\t"
        f"{output_filename}\t${total_cost:.6f}\n"
    )
    with open(f"{BASE_PATH}costs.txt", "a", encoding="utf-8") as f:
        f.write(cost_entry)
    print(f"Total cost: ${total_cost:.6f} (logged in costs.txt)\n")

## 5. Loading Data

Load the test and support datasets from TSV files:

In [None]:
df_test_set = pd.read_csv(f"{BASE_PATH}test_set.tsv", sep="\t")
df_support_set = pd.read_csv(f"{BASE_PATH}support_set.tsv", sep="\t")

## 6. Running Experiments

### 6.1 Zero-Shot Experiment

This runs without any examples, just with instructions:

In [None]:
def build_full_prompt(
    test_sentence,
    support_set=None,
    num_few_shot_examples=0,
    instruction_template="",
    source_column="SOURCE",
    target_column="TARGET"
):
    """Builds the full prompt that would be sent to the model,
    including few-shot examples if provided."""

    # prepare instruction
    if support_set is not None and num_few_shot_examples > 0:
        instruction = format_few_shot_examples(
            support_set,
            num_few_shot_examples,
            instruction_template,
            source_column,
            target_column
        )
    else:
        instruction = instruction_template

    # full final prompt
    prompt = (
        f"{instruction}"
        f"{source_column}: {test_sentence}\n\n"
        f"{target_column}: [Your prediction here]"
    )

    return prompt


In [None]:
# Helper: print all prompts for a given config
def print_zero_shot_prompts(test_set, source_column, target_column, instruction_template):
    for sentence in test_set[source_column].head(N):
        prompt = build_full_prompt(
            test_sentence=sentence,
            support_set=None,  # zero-shot
            num_few_shot_examples=0,
            instruction_template=instruction_template,
            source_column=source_column,
            target_column=target_column
        )
        print(prompt)
        print("\n" + "="*80 + "\n")


# 1. OTQ → OTS
print("### OTQ → OTS ###\n")
print_zero_shot_prompts(
    test_set=df_test_set,
    source_column="OTQ",
    target_column="OTS",
    instruction_template="""Predict the OTS orthographic standardization (State of Mexico Otomi) for the following Otomi sentence written in the OTQ standard (Queretaro Otomi) (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\n"""
)

# 2. INALI → OTS
print("### INALI → OTS ###\n")
print_zero_shot_prompts(
    test_set=df_test_set,
    source_column="INALI",
    target_column="OTS",
    instruction_template="""Predict the OTS orthographic standardization (State of Mexico Otomi) for the following Otomi sentence written in the INALI standard (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\n"""
)

# 3. OTS → OTQ
print("### OTS → OTQ ###\n")
print_zero_shot_prompts(
    test_set=df_test_set,
    source_column="OTS",
    target_column="OTQ",
    instruction_template="""Predict the OTQ orthographic standardization (Queretaro Otomi) for the following Otomi sentence written in the OTS standard (State of Mexico Otomi) (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\n"""
)


### OTQ → OTS ###

Predict the OTS orthographic standardization (State of Mexico Otomi) for the following Otomi sentence written in the OTQ standard (Queretaro Otomi) (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.

OTQ: -pasadores: "ya nts'äza xi mra nts'u̱t'i mi t'e̱ni ko n'a ra t'e̱nga ts'u̱t'a bo̱ja mi thutuabi ra ballesta".

OTS: [Your prediction here]


### INALI → OTS ###

Predict the OTS orthographic standardization (State of Mexico Otomi) for the following Otomi sentence written in the INALI standard (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.

INALI: -pasadores: "ya nts'äza xi mra nts'u̱t'i mi t'e̱ni ko n'a ra t'e̱nga ts'u̱t'a bo̱ja mi thutuabi ra ballesta".

OTS: [Your prediction here]


### OT

In [None]:
models = [
    ("gpt-3.5-turbo", client_gpt),
    ("gpt-4o", client_gpt),
    ("llama3.1-70b", client_llama),
    ("llama3.3-70b", client_llama)
]

for model, client in models:

  run_experiment(
      language="oto",
      experiment_type="zero-shot",
      model_name=model,
      model_client=client,
      test_set=df_test_set,
      source_column="OTQ",
      target_column="OTS",
      instruction_template="""Predict the OTS orthographic standardization (State of Mexico Otomi) for the following Otomi sentence written in the OTQ standard (Queretaro Otomi) (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\n""",
      quiet = True
  )

  run_experiment(
      language="oto",
      experiment_type="zero-shot",
      model_name=model,
      model_client=client,
      test_set=df_test_set,
      source_column="INALI",
      target_column="OTS",
      instruction_template="""Predict the OTS orthographic standardization (State of Mexico Otomi) for the following Otomi sentence written in the INALI standard (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\n""",
      quiet = True
  )

  run_experiment(
      language="oto",
      experiment_type="zero-shot",
      model_name=model,
      model_client=client,
      test_set=df_test_set,
      source_column="OTS",
      target_column="OTQ",
      instruction_template="""Predict the OTQ orthographic standardization (Queretaro Otomi) for the following Otomi sentence written in the OTS standard (State of Mexico Otomi) (please return only the normalized sentences, no explanations). Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\n""",
      quiet = True
  )

### 6.2 Few-Shot Experiment

This runs with 10 examples from the support set:

In [None]:
# Preview first N test sentences from each source column
N = 2

# Helper: print all prompts for a given config (few-shot)
def print_few_shot_prompts(
    test_set, support_set,
    source_column, target_column,
    num_few_shot_examples,
    instruction_template
):
    for sentence in test_set[source_column].head(N):
        prompt = build_full_prompt(
            test_sentence=sentence,
            support_set=support_set,
            num_few_shot_examples=num_few_shot_examples,
            instruction_template=instruction_template,
            source_column=source_column,
            target_column=target_column
        )
        print(prompt)
        print("\n" + "="*80 + "\n")


# 1. OTQ → OTS
print("### OTQ → OTS ###\n")
print_few_shot_prompts(
    test_set=df_test_set,
    support_set=df_support_set,
    source_column="OTQ",
    target_column="OTS",
    num_few_shot_examples=N,
    instruction_template="""Below are examples of orthographic conversions of strings from the OTQ standard (Queretaro Otomi) to the OTS standard (State of Mexico Otomi) for the Otomi language. Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\nExamples:\n\n{examples}Task:\n\nUsing these examples as a guide, predict the OTS orthographic standardization for the following sentence. Return only the standardized sentence without any explanation.\n\n"""
)

# 2. INALI → OTS
print("### INALI → OTS ###\n")
print_few_shot_prompts(
    test_set=df_test_set,
    support_set=df_support_set,
    source_column="INALI",
    target_column="OTS",
    num_few_shot_examples=N,
    instruction_template="""Below are examples of orthographic conversions of strings from the INALI standard to the OTS standard (State of Mexico Otomi) for the Otomi language. Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\nExamples:\n\n{examples}Task:\n\nUsing these examples as a guide, predict the OTS orthographic standardization for the following sentence. Return only the standardized sentence without any explanation.\n\n"""
)

# 3. OTS → OTQ
print("### OTS → OTQ ###\n")
print_few_shot_prompts(
    test_set=df_test_set,
    support_set=df_support_set,
    source_column="OTS",
    target_column="OTQ",
    num_few_shot_examples=N,
    instruction_template="""Below are examples of orthographic conversions of strings from the OTS standard (State of Mexico Otomi) to the OTQ standard (Queretaro Otomi) for the Otomi language. Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\nExamples:\n\n{examples}Task:\n\nUsing these examples as a guide, predict the OTQ orthographic standardization for the following sentence. Return only the standardized sentence without any explanation.\n\n"""
)


### OTQ → OTS ###

Below are examples of orthographic conversions of strings from the OTQ standard (Queretaro Otomi) to the OTS standard (State of Mexico Otomi) for the Otomi language. Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.

Examples:

1. OTQ: r'atsa noya ra sahagún: ndäxjua k'oi florentino. he̱m'i xiii, xe̱ni xiii (versión del náhuatl por ángel ma. garibay k.).
OTS: r'atsa noya ra sahagún: ndäxkjua k'oi florentino. je̱m'i xiii, xe̱ni xiii (versión del náhuatl por ángel ma. garibay k.).

2. OTQ: nubia ri nt'ode nehe mar'a ya ñ'o̱ho̱ ne ya b'e̱hña ndeznä: ha ra yancuic tlahtolli, r'ayo noya.
OTS: nubia ri nt'ode neje mar'a ya ñ'o̱jo̱ ne ya b'e̱jña ndeznä: ja ra yancuic tlahtolli, r'ayo noya.

Task:

Using these examples as a guide, predict the OTS orthographic standardization for the following sentence. Return only the standardized sentence without any explanation.

OTQ: -pasadores: "ya nts'äza xi mra 

In [None]:
models = [
    ("gpt-3.5-turbo", client_gpt),
    ("gpt-4o", client_gpt),
    ("llama3.1-70b", client_llama),
    ("llama3.3-70b", client_llama)
]

for model, client in models:
    run_experiment(
        language="oto",
        experiment_type="few-shot",
        model_name=model,
        model_client=client,
        test_set=df_test_set,
        support_set=df_support_set,
        source_column="OTQ",
        target_column="OTS",
        num_few_shot_examples=10,
        instruction_template="""Below are examples of orthographic conversions of strings from the OTQ standard (Queretaro Otomi) to the OTS standard (State of Mexico Otomi) for the Otomi language. Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\nExamples:\n\n{examples}Task:\n\nUsing these examples as a guide, predict the OTS orthographic standardization for the following sentence. Return only the standardized sentence without any explanation.\n\n""",
        quiet = True
    )

    run_experiment(
        language="oto",
        experiment_type="few-shot",
        model_name=model,
        model_client=client,
        test_set=df_test_set,
        support_set=df_support_set,
        source_column="INALI",
        target_column="OTS",
        num_few_shot_examples=10,
        instruction_template="""Below are examples of orthographic conversions of strings from the INALI standard to the OTS standard (State of Mexico Otomi) for the Otomi language. Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\nExamples:\n\n{examples}Task:\n\nUsing these examples as a guide, predict the OTS orthographic standardization for the following sentence. Return only the standardized sentence without any explanation.\n\n""",
        quiet = True
    )

    run_experiment(
        language="oto",
        experiment_type="few-shot",
        model_name=model,
        model_client=client,
        test_set=df_test_set,
        support_set=df_support_set,
        source_column="OTS",
        target_column="OTQ",
        num_few_shot_examples=10,
        instruction_template="""Below are examples of orthographic conversions of strings from the OTS standard (State of Mexico Otomi) to the OTQ standard (Queretaro Otomi) for the Otomi language. Note that some loanwords retain their original orthography, and certain linguistic phenomena may affect the transformations.\n\nExamples:\n\n{examples}Task:\n\nUsing these examples as a guide, predict the OTQ orthographic standardization for the following sentence. Return only the standardized sentence without any explanation.\n\n""",
        quiet = True
    )

191/191
Experiment results saved to 'ICL/colab_outputs/oto_gpt-3.5-turbo_few-shot_OTQ_OTS.tsv'.
Total cost: $0.310536 (logged in costs.txt)

191/191
Experiment results saved to 'ICL/colab_outputs/oto_gpt-3.5-turbo_few-shot_INALI_OTS.tsv'.
Total cost: $0.311397 (logged in costs.txt)

191/191
Experiment results saved to 'ICL/colab_outputs/oto_gpt-3.5-turbo_few-shot_OTS_OTQ.tsv'.
Total cost: $0.312865 (logged in costs.txt)

191/191
Experiment results saved to 'ICL/colab_outputs/oto_gpt-4o_few-shot_OTQ_OTS.tsv'.
Total cost: $0.546867 (logged in costs.txt)

191/191
Experiment results saved to 'ICL/colab_outputs/oto_gpt-4o_few-shot_INALI_OTS.tsv'.
Total cost: $0.543358 (logged in costs.txt)

191/191
Experiment results saved to 'ICL/colab_outputs/oto_gpt-4o_few-shot_OTS_OTQ.tsv'.
Total cost: $0.551040 (logged in costs.txt)

191/191
Experiment results saved to 'ICL/colab_outputs/oto_llama3.1-70b_few-shot_OTQ_OTS.tsv'.
Total cost: $0.568151 (logged in costs.txt)

191/191
Experiment results save