<a href="https://colab.research.google.com/github/melissatorgbi/LLMCxG_Workshop/blob/main/notebooks/LLMCxG_Notebook_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This notebook uses data from the paper ["A Construction Grammar Corpus of Varying Schematicity: A Dataset for the Evaluation of Abstractions in Language Models"](https://aclanthology.org/2024.lrec-main.22/) to probe LLMs using the metalinguistic task of identifying which sentences belong to specific constructions.

The data consists of 10 distinct English constructions with at least 50 example sentences for each construction. This notebook covers the following:

1.   Loading the Data
2.   Cleaning the Data
3.   Constructing Prompts
4.   Prompting the Model
5.   Evaulation


We use the OpenAI API to prompt a model so an API key is required.

## Imports

In [None]:
import csv
import pandas as pd
import random
from openai import OpenAI
import re
import nltk
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger_eng')
from nltk.corpus import wordnet

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


# Set-up

In [None]:
OPENAI_API_KEY = #enter your api key here

In [None]:
client = OpenAI(api_key=OPENAI_API_KEY)

In [None]:
def get_response(prompt, model):

  completion = client.chat.completions.create(
      model=model,
      messages=[
          {
              "role": "user",
              "content": prompt
          }
      ]
  )

  response = completion.choices[0].message.content
  return response

In [None]:
def clean_sentences(sentences) :
    clean_sentences = list()
    for sentence in sentences :
        sentence = re.sub( r"\s*\([^\)]*?\)\s*$", r"", sentence )
        sentence = re.sub( r"\s+.$", '.', sentence )
        sentence = re.sub( r"\s,", ",", sentence )
        if sentence != '' :
            new_sentence = list()
            for word, pos in nltk.pos_tag( sentence.split( ' ' ) ) :
                verbs   = [ (i.pos()=='v') for i in wordnet.synsets(word) ]
                wn_verb = verbs.count( True )  > verbs.count( False )
                if word.isupper() and ( 'VB' in pos or wn_verb ) :
                    word = word.lower()
                new_sentence.append( word )
            sentence = ' '.join( new_sentence )
            clean_sentences.append( sentence )
    return clean_sentences

In [None]:
def clean_data(df):
  for column in df.columns:
    clean_list = clean_sentences(df[column])
    df[column] = clean_list

  return df

In [None]:
def get_rows(df, row_num):

  rows = []

  for i in range(row_num):
    row = []
    construction = random.choice(df.columns)
    construction_sentence = df[construction][random.randint(0, df[construction].count()-2)]

    remaining_constructions = df.drop(columns = construction)
    random_construction1 = random.choice(remaining_constructions.columns)
    alternative_sentence1 = df[random_construction1][random.randint(0, df[random_construction1].count()-2)]

    random_construction2 = random.choice(remaining_constructions.columns)
    remaining_constructions = remaining_constructions.drop(columns = random_construction1)
    alternative_sentence2 = df[random_construction2][random.randint(0, df[random_construction2].count()-2)]

    row = [construction_sentence, alternative_sentence1, alternative_sentence2]
    random.shuffle(row)
    row.append(construction)
    row.append(construction_sentence)

    rows.append(row)

  new_df = pd.DataFrame(rows, columns = ['sentence1','sentence2','sentence3','construction','target'])

  return new_df


## Load Data

In [None]:
!git clone https://github.com/H-TayyarMadabushi/Construction_Grammar_Schematicity_Corpus-CoGS.git

Cloning into 'Construction_Grammar_Schematicity_Corpus-CoGS'...
remote: Enumerating objects: 12, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 12 (delta 1), reused 8 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (12/12), 76.27 KiB | 952.00 KiB/s, done.
Resolving deltas: 100% (1/1), done.


In [None]:
cogs_df = pd.read_csv("Construction_Grammar_Schematicity_Corpus-CoGS/Dataset/CoGs.csv")
cogs_df = cogs_df.loc[:, ~cogs_df.columns.str.match('Unnamed')]
cogs_df = cogs_df.loc[0:49]
cogs_df.head()

Unnamed: 0,Let Alone,Way Manner,Resultative,Conative,Intransitive Motion,Caused Motion,Causative with CxN,Ditransitive CxN,Comparative Correlative,Much Less
0,"Most wives are too bloody old, let alone mothe...","Tricia backed her way out, never taking her ey...",The man shrieked himself unconscious. (Hoffman),She kicked at the ball. (Hoffman),The fly buzzed into the room. (Hoffman),He sang them out of the room. (Hoffman),She loaded the truck with books. (Hoffman),Jack passed her the salt. (Hoffman),"The harder they come, the harder they fall. (AMR)",When my dad catches swarms sometimes he doesn'...
1,"A ceasefire, let alone lasting peace, will tak...",The taxi nosed its way back into the traffic a...,Firefighters cut the man free. (Hoffman),He clutched at the branch. (Hoffman),He ran out of the house. (Hoffman),She wiggled her feet out of the boots. (Hoffman),He sprayed the walls with paint. (Hoffman),The waiter served them their dinner. (Hoffman),"The longer he is around, the more miserable I ...","Not that many of us can fly off to Guatemala, ..."
2,It is difficult enough for an individual to be...,"As she felt her way forward, suddenly a knight...",He had often drank himself silly. (Hoffman),They shot at the sheriff. (Hoffman),People strolled along the river. (Hoffman),They laughed the actor off the stage. (Hoffman),They heaped the plate with mashed potatoes. (H...,She sent him an email. (Hoffman),"The more I studied, the less I understood. (AMR)","I never thought I'd meet him, much less in Aus..."
3,I would be distressed to hear of any ladies re...,"If this were to happen, would it not be unfair...","It JERKS you awake with the first sentence , h...","With her free hand, she tugged at the strap of...",It shouldn't be - you should be able to DANCE ...,I THREW the stone across the river. (AMR),Kevin lifted the trumpet and filled the room w...,The above kind of exchange AFFORDS students th...,The more the merrier. (AMR),Those were not my first (much less only) thoug...
4,"None of these arguments is notably strong, let...",Through the French windows and across the lawn...,These agencies RENDER themselves ineffective b...,She tugged at the glass door. (COCA),"When the IED detonates, this copper cup turns ...",The RN can still launch a task force that woul...,Dude has filled the room with flowers and tedd...,She baked her sister a cake. (Goldberg),"It seems the older the patient, the less effec...",You can not hear (much less understand) anythi...


In [None]:
# The data contains 10 constructions as the column names
cogs_df.columns

Index(['Let Alone', 'Way Manner', 'Resultative', 'Conative',
       'Intransitive Motion', 'Caused Motion', 'Causative with CxN',
       'Ditransitive CxN', 'Comparative Correlative ', 'Much Less '],
      dtype='object')

In [None]:
# You can explore specific sentences by changing the construction name and the index
cogs_df["Let Alone"][0]

'Most wives are too bloody old, let alone mothers. (FN Construction)'

# Clean Data

Before we use the data in our prompts we want to make sure it is in the format that we want. This includes removing brackets that appear after the sentence, verbs in all captial letters and spaces before full stops and commas.

The following code blocks show an example of a sentence with these issues, the data being cleaned and the example sentence after it has been cleaned.

The functions we use to clean the data (clean_data and clean_sentences) can be found in the set-up section of this notebook.


In [None]:
# before cleaning
print("Example sentences before cleaning:\n\n"+cogs_df["Ditransitive CxN"][3])

Example sentences before cleaning:

The above kind of exchange AFFORDS students the opportunity to ask such questions in a legitimate , dialogical environment . (COCA)


In [None]:
#cleaning
cogs_df = clean_data(cogs_df)

In [None]:
# after cleaning
print("Example sentences after cleaning:\n\n"+cogs_df["Ditransitive CxN"][3])

Example sentences after cleaning:

The above kind of exchange affords students the opportunity to ask such questions in a legitimate, dialogical environment.


# Constructing Prompts

The task involves giving the model 3 sentences from different constructions. We want to the model to choose one sentence out of three sentences provided that matches a specific construction.

This is done by asking the model:
**Which of the following sentence are instances of the specific construction?**

In the code blocks below we manually extract relevant parts of our data to show an example of what the prompt looks like and how it is constructed. The prompt_base is used for prompt creation later in this notebook when we automate the process the prompt the model multiple times.

In [None]:
# For demonstratuin purposes we manually extract the first example of the let alone construction in the data,
# along with 2 other sentences from random constructions

construction = "Let Alone"
construction_sentence = cogs_df[construction][0]
remaining_constructions = cogs_df.drop(columns = construction)
random_construction1 = random.choice(remaining_constructions.columns)
alternative_sentence1 = cogs_df[random_construction1][0]
random_construction2 = random.choice(remaining_constructions.columns)
remaining_constructions = remaining_constructions.drop(columns = random_construction1)
alternative_sentence2 = cogs_df[random_construction2][0]

print("Target Construction: {}\nSentence: {}\n".format(construction, construction_sentence))
print("Alternative Construction: {}\nSentence: {}\n".format(random_construction1, alternative_sentence1))
print("Alternative Construction: {}\nSentence: {}\n".format(random_construction2, alternative_sentence2))

Target Construction: Let Alone
Sentence: Most wives are too bloody old, let alone mothers.

Alternative Construction: Ditransitive CxN
Sentence: Jack passed her the salt.

Alternative Construction: Much Less 
Sentence: When my dad catches swarms sometimes he doesn't even wear a veil, much less a bee suit.



In [None]:
prompt_base = """Question: Which of the following sentence are instances of the {} construction? Output nothing but the relevant sentence:
{}
{}
{}
Answer:
"""

In [None]:
# Example of what a prompt given to the model would look like

prompt = prompt_base.format(construction, construction_sentence, alternative_sentence1, alternative_sentence2)
print(prompt)

Question: Which of the following sentence are instances of the Let Alone construction? Output nothing but the relevant sentence:
Most wives are too bloody old, let alone mothers.
Jack passed her the salt.
When my dad catches swarms sometimes he doesn't even wear a veil, much less a bee suit.
Answer:



# Prompting the Model

Using the example prompt we create above, we will prompt one of the OpenAI models and get the models reponse.

In [None]:
response = get_response(prompt, "gpt-4o-mini")

In [None]:
print(response)

Most wives are too bloody old, let alone mothers.  
When my dad catches swarms sometimes he doesn't even wear a veil, much less a bee suit.


# Evaluation

To evaluate the model's performance we compare the model's output to the the correct answer.

We calculate exact accuracy and relaxed accuracy. Exact accuracy is when the models response perfectly matches the correct answer. Relaxed accuracy does not require a perfect match but the model's reponse should appear in the correct answer or the correct answer should appear in the model's response.

In [None]:
print("model's response:\n"+response)
print("\ncorrect answer:\n"+construction_sentence)

model's response:
Most wives are too bloody old, let alone mothers.  
When my dad catches swarms sometimes he doesn't even wear a veil, much less a bee suit.

correct answer:
Most wives are too bloody old, let alone mothers.


In [None]:
print("exact match:", response == construction_sentence )
print("relaxed match:", construction_sentence.lower() in response.lower() or response.lower() in construction_sentence.lower() )

exact match: False
relaxed match: True


# Prompting with Multiple Examples

We have gone through the prompt construction, prompting the model and evaluating the model's response using one example. Now we will do the same thing multiple times for the same task.

To make this easier we will first transform the data so that each row has everything we need to create an individual prompt and evalution. Each row will contain the following:


1.   A sentence from the target construction
2.   A sentence from a random construction
3.   A sentence from another random construction

These first 3 sentence are randomly shuffled so may appear in any order (labeled as "sentence1", "sentence2", "sentence3")

4.   The name of the target construction (labeled as "construction")
5.   The correct/target sentence (labeled as "target")

We then prompt the model, getting responses for all rows in the data and then evaluate the results.

In [None]:
# The get_rows function that transforms the data can be found in the set up section of this notebook
cogsn_df = get_rows(cogs_df, 10)

In [None]:
cogsn_df.head(10)

Unnamed: 0,sentence1,sentence2,sentence3,construction,target
0,"Haven't even compiled it, much less tested it.",It becomes an obsession lightly because the mo...,He's staying in a hostel and will be heading s...,Much Less,"Haven't even compiled it, much less tested it."
1,"Bill, the more I read your stuff, the more I a...","Not one of the 33 No-voters, let alone the 24 ...",I took the plastic dome from my drink and stab...,Comparative Correlative,"Bill, the more I read your stuff, the more I a..."
2,"Costello, did you even bother to look at, much...",He and his comrades sacked and pillaged their ...,Rosa covered his face with kisses.,Causative with CxN,Rosa covered his face with kisses.
3,No one could have escaped the emotional shock ...,I would be distressed to hear of any ladies re...,"He roared and roared, waist-high in the shallo...",Way Manner,"He roared and roared, waist-high in the shallo..."
4,"The neocolonial relationship remains, as was m...","Rather than trying to bluff your way through, ...",How could Obama be leading over an empty chair...,Much Less,How could Obama be leading over an empty chair...
5,"The longer this goes on, the more likely an Ob...",There must have been text books around at one ...,Rosa covered his face with kisses.,Comparative Correlative,"The longer this goes on, the more likely an Ob..."
6,He was seen strumming a guitar at an event in ...,Here relationships between the local inhabitan...,"The more we learn piecemeal of this history, t...",Let Alone,Here relationships between the local inhabitan...
7,"As you continue, you can see the path meanderi...",These opportunities for enrichment offer the g...,He lay on the bed and covered his face with a ...,Way Manner,"As you continue, you can see the path meanderi..."
8,She threw him a parting glance.,The subsequent reduction in body weight can le...,He filled his lungs with the bitterly cold air...,Causative with CxN,He filled his lungs with the bitterly cold air...
9,"Now, if her running mate were to be someone na...",Henry Edison kicked at the tire on the old aut...,She filled her heart with the most useful and ...,Causative with CxN,She filled her heart with the most useful and ...


In [None]:
# This code loops through the data
# uses the base prompt to generate a prompt for each row of the data
# prompts the openai model and save the reponse

exact_scores = []
relaxed_scores = []
responses = []

for index in range(len(cogsn_df)):

  # creating prompt
  prompt = prompt_base.format(
      cogsn_df.construction.iloc[index],
      cogsn_df.sentence1.iloc[index],
      cogsn_df.sentence2.iloc[index],
      cogsn_df.sentence3.iloc[index])

  # getting reponse from the model
  response = get_response(prompt, "gpt-4o-mini")

  # evaluation
  target = cogsn_df.target.iloc[index]
  exact_score = response == target
  relaxed_score = target.lower() in response.lower() or response.lower() in target.lower()

  # adding response and evaluation to data
  responses.append(response)
  exact_scores.append(exact_score)
  relaxed_scores.append(relaxed_score)

# adding response and evaluation to existing data frame (df)
cogsn_df["response"] = responses
cogsn_df["exact score"] = exact_scores
cogsn_df["relaxed score"] = exact_scores

In [None]:
# This shows the updated data with additional columns containing the model's response,
# exact match and relaxed match.

cogsn_df.head(10)

Unnamed: 0,sentence1,sentence2,sentence3,construction,target,response,exact score,relaxed score
0,"Haven't even compiled it, much less tested it.",It becomes an obsession lightly because the mo...,He's staying in a hostel and will be heading s...,Much Less,"Haven't even compiled it, much less tested it.","Haven't even compiled it, much less tested it.",True,True
1,"Bill, the more I read your stuff, the more I a...","Not one of the 33 No-voters, let alone the 24 ...",I took the plastic dome from my drink and stab...,Comparative Correlative,"Bill, the more I read your stuff, the more I a...","Bill, the more I read your stuff, the more I a...",True,True
2,"Costello, did you even bother to look at, much...",He and his comrades sacked and pillaged their ...,Rosa covered his face with kisses.,Causative with CxN,Rosa covered his face with kisses.,None of the sentences provided are instances o...,False,False
3,No one could have escaped the emotional shock ...,I would be distressed to hear of any ladies re...,"He roared and roared, waist-high in the shallo...",Way Manner,"He roared and roared, waist-high in the shallo...","He roared and roared, waist-high in the shallo...",True,True
4,"The neocolonial relationship remains, as was m...","Rather than trying to bluff your way through, ...",How could Obama be leading over an empty chair...,Much Less,How could Obama be leading over an empty chair...,How could Obama be leading over an empty chair...,True,True
5,"The longer this goes on, the more likely an Ob...",There must have been text books around at one ...,Rosa covered his face with kisses.,Comparative Correlative,"The longer this goes on, the more likely an Ob...","The longer this goes on, the more likely an Ob...",True,True
6,He was seen strumming a guitar at an event in ...,Here relationships between the local inhabitan...,"The more we learn piecemeal of this history, t...",Let Alone,Here relationships between the local inhabitan...,Here relationships between the local inhabitan...,True,True
7,"As you continue, you can see the path meanderi...",These opportunities for enrichment offer the g...,He lay on the bed and covered his face with a ...,Way Manner,"As you continue, you can see the path meanderi...","As you continue, you can see the path meanderi...",True,True
8,She threw him a parting glance.,The subsequent reduction in body weight can le...,He filled his lungs with the bitterly cold air...,Causative with CxN,He filled his lungs with the bitterly cold air...,She threw him a parting glance.,False,False
9,"Now, if her running mate were to be someone na...",Henry Edison kicked at the tire on the old aut...,She filled her heart with the most useful and ...,Causative with CxN,She filled her heart with the most useful and ...,"Now, if her running mate were to be someone na...",False,False


In [None]:
# calculating accuracy

exact_accuracy = sum(cogsn_df["exact score"])/len(cogsn_df) * 100
relaxed_accuracy = sum(cogsn_df["relaxed score"])/len(cogsn_df) * 100

print("excat accuracy:", exact_accuracy)
print("relaxed accuracy:", relaxed_accuracy)

excat accuracy: 70.0
relaxed accuracy: 70.0
