In [115]:
import pandas as pd
import json
import numpy as np
import re

from data.prompts import prompts
from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY_HERE")

In [1]:
%load_ext autoreload
%autoreload 2

## Read dataset

In [5]:
df = pd.read_parquet("data/dataset/dataset.pqt")

## Perturbing categorical variables

This first section will cover how we perturb categorical values: names and locations.

Our basic steps involve calling GPT-4o and making substitutions to the original contexts.

### 1. Names
We start with extracting a question, context, and answer triplet:

In [6]:
row = df[df['dataset']=='names'].iloc[200]
question = row['question']
context = row['context_original']
name = row['answer_original']

In [7]:
print(f'Question: {question} \n')
print(f'Context: {context}')

Question: Who was an English politician from Down Place, Surrey, who served as a Member of the Parliament for Guildford in 1407? 

Context: Robert Hull (fl. 1407) of Down Place, Surrey, was an English politician.

He married 'Elisora', who has been identified as possibly Elizabeth Stonhurst alias Doune of Doune (Down) Place.

He was a Member (MP) of the Parliament of England for Guildford in 1407.

References

14th-century births
15th-century deaths
English MPs 1407
People from Surrey (before 1889)
Members of Parliament for Guildford


Next, we use GPT-4o to produce perturbations of this name.

We use the prompt below:

In [8]:
print(prompts.PERTURBATIONS['names'])

Your job is to modify a given name in three different ways. 

In the first modification, make a slight change according to the following guidelines:
    - The modified name should stay within the same country of origin, time period, and/or gender.

In the second modification, make a significant change according to the following guidelines: 
    - The modified name be from a different country of origin, time period, and/or gender.
    - The modified name should be a real person's name of similar importance, stature, and popularity.

In the third modification, come up with a comical variation of the name. (Something in the spirit of Boaty McBoatFace).

The modified name should have a first AND last name.

Return a json where the first key is "slight" for the slightly changed name and the second key is "significant" for the significantly changed name, and the third key is "comical" for a comical variation on the name.

Example Input Format:
Name: Benjamin Franklin

Example Output:
{"sligh

Our prompting method expects a json with the keys as degrees of peturbation and values as the pertubations themselves.

The API call below allows us to perturb a single name three times.

In [60]:
response = client.chat.completions.create(
  model="gpt-4o",
  response_format={"type": 'json_object'},
  messages=[
    {"role": "system", "content": prompts.PERTURBATIONS['names']},
    {"role": "user", "content": f"Name: {name}"}
  ],
  temperature=0,
  seed=0,
)

In [61]:
perturbations_dict = json.loads(response.choices[0].message.content)
print(perturbations_dict)

{'slight': 'Richard Hull', 'significant': 'Hiroshi Tanaka', 'comical': 'Robo Hullabaloo'}


Next, we substitute this back into the original context:

In [13]:
context_mod = context.replace(name, perturbations_dict['comical'])

In [15]:
print(context_mod)

Robo Hullabaloo (fl. 1407) of Down Place, Surrey, was an English politician.

He married 'Elisora', who has been identified as possibly Elizabeth Stonhurst alias Doune of Doune (Down) Place.

He was a Member (MP) of the Parliament of England for Guildford in 1407.

References

14th-century births
15th-century deaths
English MPs 1407
People from Surrey (before 1889)
Members of Parliament for Guildford


### 2. Locations

We perform a similar set of steps for locations. We use the following prompt:

In [17]:
print(prompts.PERTURBATIONS['locations'])

Your job is to modify a given city name in three different ways. 

In the first modification, make a slight change according to the following guidelines: 
    - The modified city name should be the name of a city most closely associated with the original city. 
    For example, San Fransisco -> Oakland, Los Angeles -> San Diego. 

In the second modification, make a significant change according to the following guidelines: 
    - The city name should be not a real city name, but sound like a real city name.

In the third modification, come up with a comical variation of the name. (Something in the spirit of Boaty McBoatFace).

Return a json where the first key is "slight" for the slightly changed name and the second key is "significant" for the significantly changed name, and the third key is "comical" for a comical variation on the name.

Example Input Format:
Location: Los Angeles

Example Output:
{"slight": San Diego, "significant": Mt. Leigh, "comical": Lala Angalala}


In [38]:
row = df[df['dataset']=='locations'].iloc[25]
question = row['question']
context = row['context_original']
location = row['answer_original']

In [39]:
print(f'Question: {question} \n')
print(f'Context: {context}')

Question: Which city is the location of the Kinclaven Castle that was used as a royal castle by King Alexander II of Scotland and was later captured by William Wallace in 1297? 

Context: Kinclaven Castle is a ruined 13th century castle in Perthshire, Scotland.

History
Built at the confluence of the Isla and Tay rivers, the castle originally controlled the ferry crossing nearby. The castle was used as a royal castle by King Alexander II of Scotland and a number of charters were signed during the reign of King Alexander III of Scotland (r.1249–1286).

Kinclaven was in English hands in 1296, King Edward I of England staying one night at the castle during his invasion of Scotland. Kinclaven was later captured by William Wallace in 1297. Wallace sacked the castle and put the English garrison and constable James Butler to the sword.

After being repaired, the castle was again in English hands by 1337, when the Guardian of Scotland Sir Andrew Murray retook the castle.

References
Coventry, 

In [58]:
response = client.chat.completions.create(
  model="gpt-4o",
  response_format={"type": 'json_object'},
  messages=[
    {"role": "system", "content": prompts.PERTURBATIONS['locations']},
    {"role": "user", "content": f"Location: {location}"}
  ],
  temperature=0,
  seed=0,
)

In [59]:
perturbations_dict = json.loads(response.choices[0].message.content)
print(perturbations_dict)

{'slight': 'Stirling', 'significant': 'Porthaven', 'comical': 'Perthshmirth'}


In [44]:
context_mod = context.replace(location, perturbations_dict['significant'])

In [45]:
print(context_mod)

Kinclaven Castle is a ruined 13th century castle in Porthaven, Scotland.

History
Built at the confluence of the Isla and Tay rivers, the castle originally controlled the ferry crossing nearby. The castle was used as a royal castle by King Alexander II of Scotland and a number of charters were signed during the reign of King Alexander III of Scotland (r.1249–1286).

Kinclaven was in English hands in 1296, King Edward I of England staying one night at the castle during his invasion of Scotland. Kinclaven was later captured by William Wallace in 1297. Wallace sacked the castle and put the English garrison and constable James Butler to the sword.

After being repaired, the castle was again in English hands by 1337, when the Guardian of Scotland Sir Andrew Murray retook the castle.

References
Coventry, Martin. (2008) Castles of the Clans: the strongholds and seats of 750 Scottish families and clans. Musselburgh. p.507.

Ruined castles in Perth and Kinross


## Perturbing numerical variables

### 1. Years

We perturb years across an interval of 10 values spanning between -100 and +100. 
To make this perturbation simple and deterministic, we replace all instances of a year within the context.
The following example below shows this being done on question, context, answer triplet:

In [50]:
row = df[df['dataset']=='years'].iloc[10]
question = row['question']
context = row['context_original']
year = row['answer_original']

In [55]:
print(f'Question: {question} \n')
print(f'Context: {context}')

Question: In which year did Adam Fuller serve as the head coach at Assumption College in Worcester, Massachusetts? 

Context: Adam Fuller is an American football coach. He is currently the defensive coordinator at Florida State University. He was previously the assistant head coach, linebackers coach, and special teams coordinator at Marshall University in Huntington, West Virginia, prior to serving as defensive coordinator at the University of Memphis.

A graduate of the Sacred Heart University, Fuller was previously the head coach at Assumption College in Worcester, Massachusetts for one season in 2008.

Head coaching record

References

Year of birth missing (living people)
Living people
American football linebackers
Assumption Greyhounds football coaches
Chattanooga Mocs football coaches
Marshall Thundering Herd football coaches
Memphis Tigers football coaches 
Sacred Heart Pioneers football players
Wagner Seahawks football coaches
WPI Engineers football coaches


In [57]:
mod_df = []
for mod_type in list(np.sort([int(x) for x in np.linspace(-100, 100, 10)] + [0])):
    mod_df.append({'mod_type': str(mod_type), 'mod_value': int(year) + mod_type, 'mod_context': context.replace(str(year), str(int(year) + mod_type))})

mod_df = pd.DataFrame(mod_df)
print(f"Mod value: {mod_df.iloc[0]['mod_value']} \nMod Context: {mod_df.iloc[0]['mod_context']}")

Mod value: 1908 
Mod Context: Adam Fuller is an American football coach. He is currently the defensive coordinator at Florida State University. He was previously the assistant head coach, linebackers coach, and special teams coordinator at Marshall University in Huntington, West Virginia, prior to serving as defensive coordinator at the University of Memphis.

A graduate of the Sacred Heart University, Fuller was previously the head coach at Assumption College in Worcester, Massachusetts for one season in 1908.

Head coaching record

References

Year of birth missing (living people)
Living people
American football linebackers
Assumption Greyhounds football coaches
Chattanooga Mocs football coaches
Marshall Thundering Herd football coaches
Memphis Tigers football coaches 
Sacred Heart Pioneers football players
Wagner Seahawks football coaches
WPI Engineers football coaches


### 2. Drug Dosages

We found that using LLMs to perturb drug dosages could more reliably produce high quality modifications than rule-based algorithms.

We use the following prompt for perturbing drug dosages:

In [62]:
print(prompts.PERTURBATIONS['drugs'])

You are given a question, an answer, and a statement that can be used to answer the question. The statement contains a drug dosage (in mg).                Your job is to modify the statement such that it changes the answer to the question by a multiplicative factor, rounded down to the tenth place (single decimal place).
                For instance, 0.15 should be rounded down to 0.1, and 0.25 should be rounded down to 0.2.
                Use the following multiplicative factors: [0.1, 0.2, 0.4, 0.8, 1.2, 1.5, 2.0, 3.0, 5.0, 10.0]

                Example Input Format:
                Question: What is the maximum single dose in mg for adult patients taking sublingual film of Apomorphine during a Parkinson disease 'off' episode?
                Answer: 30
                Statement: maximum single dose of 30 mg

                Example JSON Output:
                {
                "0.1": {"modified_statement": "maximum single dose of 3 mg", "modified_answer": "3"},
                "0

In [65]:
row = df[df['dataset']=='drugs'].iloc[0]

In [72]:
question = row['question']
context = row['context_original']
answer = row['answer_original']
statement = "maximum dose: 30 mg once daily."

Note that we added a "statement" parameter. Given longer contexts, we ask the question generating model to specify the sentence which the answer is derived from. 

In [70]:
print(f"Question: {question}")
print(f"Answer: {answer}")

Question: What is the maximum daily dosage in mg for extended release oxybutynin in adults with overactive bladder?
Answer: 30


In [75]:
response = client.chat.completions.create(
  model="gpt-4o",
  response_format={"type": 'json_object'},
  messages=[
    {"role": "system", "content": prompts.PERTURBATIONS['drugs']},
    {"role": "user", "content": f"Question: {question}\nAnswer: {answer}\nStatement: {statement}"}
  ],
  temperature=0,
  seed=0,
)

In [77]:
perturbations_dict = json.loads(response.choices[0].message.content)

In [80]:
perturbations_dict

{'0.1': {'modified_statement': 'maximum dose: 3 mg once daily.',
  'modified_answer': '3'},
 '0.2': {'modified_statement': 'maximum dose: 6 mg once daily.',
  'modified_answer': '6'},
 '0.4': {'modified_statement': 'maximum dose: 12 mg once daily.',
  'modified_answer': '12'},
 '0.8': {'modified_statement': 'maximum dose: 24 mg once daily.',
  'modified_answer': '24'},
 '1.2': {'modified_statement': 'maximum dose: 36 mg once daily.',
  'modified_answer': '36'},
 '1.5': {'modified_statement': 'maximum dose: 45 mg once daily.',
  'modified_answer': '45'},
 '2.0': {'modified_statement': 'maximum dose: 60 mg once daily.',
  'modified_answer': '60'},
 '3.0': {'modified_statement': 'maximum dose: 90 mg once daily.',
  'modified_answer': '90'},
 '5.0': {'modified_statement': 'maximum dose: 150 mg once daily.',
  'modified_answer': '150'},
 '10.0': {'modified_statement': 'maximum dose: 300 mg once daily.',
  'modified_answer': '300'}}

In [94]:
mod_df = []
for mod_type, perturb_v in perturbations_dict.items():
    mod_df.append({'mod_type': str(mod_type), 
                   'mod_value': perturb_v["modified_answer"], 
                   'mod_context': context.replace(statement, perturb_v["modified_statement"])})
mod_df = pd.DataFrame(mod_df)
print(f"Mod value: {mod_df.iloc[0]['mod_value']} \nMod Context: {mod_df.iloc[0]['mod_context'][2518:2895]}")

Mod value: 3 
Mod Context: Extended release: Initial: 5 to 10 mg once daily; adjust dose as needed based on response and tolerability in 5 mg increments every 1 to ≥2 weeks ( Ref ); maximum dose: 3 mg once daily. Immediate release : 5 mg 2 to 3 times daily; adjust dose as needed based on response and tolerability in 5 mg increments approximately every 2 weeks ( Ref ); maximum dose: 5 mg 4 times daily.


### 3. News
Similarly, we produce perturbations in news documents through prompting:

In [138]:
print(prompts.PERTURBATIONS['news'])

You are given a question, an answer, and a statement that can be used to answer the question.
                Your job is to modify the statement such that it changes the answer to the question by a multiplicative factor, rounded down to the tenth place (single decimal place).
                For instance, 0.15 should be rounded down to 0.1, and 0.25 should be rounded down to 0.2.
                Use the following multiplicative factors: [0.1, 0.2, 0.4, 0.8, 1.2, 1.5, 2.0, 3.0, 5.0, 10.0]

                Example Input Format:
                Question: How many points did the basketball player score?
                Answer: 10
                Statement: The basketball player scored 10 points.

                Example JSON Output:
                {
                "0.1": {"modified_statement": "The basketball player scored 1 point.", "modified_answer": "1"},
                "0.2": {"modified_statement": "The basketball player scored 2 points.", "modified_answer": "2"},
                "

In [184]:
row = df[df['dataset']=='news'].iloc[220]

In [207]:
question = row['question']
context = row['context_original']
answer = row['answer_original']
statement = "A United Airlines spokesperson said via email that the flight was carrying 139 passengers and six crew members"

In [209]:
print(f"Question: {question}")
print(f"Answer: {answer}")
print(f"Context: {context[917:1123]}")

Question: How many passengers were on board United Flight 433 from San Francisco to southern Oregon on April 7, 2023?
Answer: 139
Context: A United Airlines spokesperson said via email that the flight was carrying 139 passengers and six crew members, and no emergency was declared because there was no indication of the damage during the flight.


In [210]:
response = client.chat.completions.create(
  model="gpt-4o",
  response_format={"type": 'json_object'},
  messages=[
    {"role": "system", "content": prompts.PERTURBATIONS['news']},
    {"role": "user", "content": f"Question: {question}\nAnswer: {answer}\nStatement: {statement}"}
  ],
  temperature=0,
  seed=0,
)

In [211]:
perturbations_dict = json.loads(response.choices[0].message.content)

In [212]:
perturbations_dict

{'0.1': {'modified_statement': 'A United Airlines spokesperson said via email that the flight was carrying 13 passengers and six crew members.',
  'modified_answer': '13'},
 '0.2': {'modified_statement': 'A United Airlines spokesperson said via email that the flight was carrying 27 passengers and six crew members.',
  'modified_answer': '27'},
 '0.4': {'modified_statement': 'A United Airlines spokesperson said via email that the flight was carrying 55 passengers and six crew members.',
  'modified_answer': '55'},
 '0.8': {'modified_statement': 'A United Airlines spokesperson said via email that the flight was carrying 111 passengers and six crew members.',
  'modified_answer': '111'},
 '1.2': {'modified_statement': 'A United Airlines spokesperson said via email that the flight was carrying 166 passengers and six crew members.',
  'modified_answer': '166'},
 '1.5': {'modified_statement': 'A United Airlines spokesperson said via email that the flight was carrying 208 passengers and six c

In [216]:
mod_df = []
for mod_type, perturb_v in perturbations_dict.items():
    mod_df.append({'mod_type': str(mod_type), 
                   'mod_value': perturb_v["modified_answer"], 
                   'mod_context': context.replace(statement, perturb_v["modified_statement"])})
mod_df = pd.DataFrame(mod_df)
print(f"Mod value: {mod_df.iloc[9]['mod_value']} \nMod Context: {mod_df.iloc[9]['mod_context'][917:1125]}")

Mod value: 1390 
Mod Context: A United Airlines spokesperson said via email that the flight was carrying 1390 passengers and six crew members., and no emergency was declared because there was no indication of the damage during the flight.


### 4. Records

The record perturbations are produced algorithmically -- we use the following code:

In [96]:
def time_string_to_milliseconds(input_string):
    # Check if the string contains any alphabetic characters
    if re.search("[a-zA-Z]", input_string):
        return None

    total_milliseconds = 0

    # Handle formats with colons (hours:minutes:seconds or minutes:seconds.milliseconds)
    if ":" in input_string:
        parts = input_string.split(":")

        if len(parts) == 3:  # Format is hours:minutes:seconds
            hours, minutes, seconds = map(int, parts)
            total_milliseconds = ((hours * 60 + minutes) * 60 + seconds) * 1000
        elif (
            len(parts) == 2
        ):  # Format is minutes:seconds or minutes:seconds.milliseconds
            minutes, seconds_ms = parts
            if "." in seconds_ms:  # minutes:seconds.milliseconds
                seconds, milliseconds = map(int, seconds_ms.split("."))
                total_milliseconds = (int(minutes) * 60 + seconds) * 1000 + milliseconds
            else:  # minutes:seconds
                seconds = int(seconds_ms)
                total_milliseconds = (int(minutes) * 60 + seconds) * 1000
    else:
        # Handle formats without colons, which might be seconds.milliseconds or minutes.seconds.milliseconds
        parts = input_string.split(".")

        if len(parts) == 3:  # Format is minutes.seconds.milliseconds
            minutes, seconds, milliseconds = map(int, parts)
            total_milliseconds = (minutes * 60 + seconds) * 1000 + milliseconds
        elif len(parts) == 2:  # Format is seconds.milliseconds
            seconds, milliseconds = map(int, parts)
            total_milliseconds = seconds * 1000 + milliseconds
        else:  # Pure seconds
            total_milliseconds = int(input_string) * 1000

    return total_milliseconds


def milliseconds_to_time_string(milliseconds):
    if milliseconds < 0:
        return None  # Handling negative inputs, as time cannot be negative.

    hours, remainder = divmod(milliseconds, 3600000)  # 3600000 milliseconds in an hour
    minutes, remainder = divmod(remainder, 60000)  # 60000 milliseconds in a minute
    seconds, milliseconds = divmod(remainder, 1000)  # 1000 milliseconds in a second

    # Formatting based on the magnitude of milliseconds
    if hours > 0:
        time_string = f"{int(hours)}:{int(minutes):02d}:{int(seconds):02d}"
    elif minutes > 0:
        time_string = f"{int(minutes)}:{int(seconds):02d}.{int(milliseconds):02d}"
    else:
        time_string = f"{int(seconds)}.{int(milliseconds):02d}"

    return time_string

In [110]:
row = df[df['dataset']=='records'].iloc[30]

In [112]:
question = row['question']
answer = row['answer_original']
context = row['context_original']

In [136]:
print(f"Question: {question}")
print(f"Answer: {answer}")
print(f"Context {context[context.find("800 metres"):context.find("800 metres")+50]}")

Question: What is the Olympic record for Men's 800 metresin athletics (time)?
Answer: 1:40.91
Context 800 metres

♦1:40.91 

David Rudisha

 Kenya (KEN)


In [135]:
mod_df = []
for mod_type in [0.1, 0.2, 0.4, 0.8, 1.2, 1.5, 2.0, 3.0, 5.0, 10.0]:
    mod_milli = int(time_string_to_milliseconds(answer) * mod_type)
    modified_answer = milliseconds_to_time_string(mod_milli)
    mod_df.append({'mod_type': str(mod_type), 
                   'mod_value': modified_answer, 
                   'mod_context': context.replace(answer, modified_answer)})
mod_df = pd.DataFrame(mod_df)
print(f"Mod value: {mod_df.iloc[9]['mod_value']} \nMod Context: {mod_df.iloc[9]['mod_context'][4502:4555]}")

Mod value: 16:40.910 
Mod Context: 800 metres

♦16:40.910 

David Rudisha

 Kenya (KEN)

