<a href="https://colab.research.google.com/github/melissatorgbi/LLMCxG_Workshop/blob/main/notebooks/LLMCxG_Notebook_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

overview/description

## Imports

In [1]:
import csv
import pandas as pd
import random
from openai import OpenAI
import re
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


# Set-up

In [2]:
OPENAI_API_KEY = #enter your api key here

In [3]:
client = OpenAI(api_key=OPENAI_API_KEY)

In [4]:
def get_response(prompt, model):

  completion = client.chat.completions.create(
      model=model,
      messages=[
          {
              "role": "user",
              "content": prompt
          }
      ]
  )

  response = completion.choices[0].message.content
  return response

In [5]:
def clean_sentences(sentences) :
    clean_sentences = list()
    for sentence in sentences :
        sentence = re.sub( r"\s*\([^\)]*?\)\s*$", r"", sentence )
        # sentence = re.sub( r"\s+.$", '.', sentence )
        if sentence != '' :
            new_sentence = list()
            for word, pos in nltk.pos_tag( sentence.split( ' ' ) ) :
                verbs   = [ (i.pos()=='v') for i in wordnet.synsets(word) ]
                wn_verb = verbs.count( True )  > verbs.count( False )
                if word.isupper() and ( 'VB' in pos or wn_verb ) :
                    word = word.lower()
                new_sentence.append( word )
            sentence = ' '.join( new_sentence )
            clean_sentences.append( sentence )
    return clean_sentences

In [6]:
def clean_data(df):
  for column in df.columns:
    clean_list = clean_sentences(df[column])
    df[column] = clean_list

  return df

## Load Data

In [7]:
!git clone https://github.com/H-TayyarMadabushi/Construction_Grammar_Schematicity_Corpus-CoGS.git

fatal: destination path 'Construction_Grammar_Schematicity_Corpus-CoGS' already exists and is not an empty directory.


In [8]:
cogs_df = pd.read_csv("Construction_Grammar_Schematicity_Corpus-CoGS/Dataset/CoGs.csv")
cogs_df = cogs_df.loc[:, ~cogs_df.columns.str.match('Unnamed')]
cogs_df = cogs_df.loc[0:49]
cogs_df.head()

Unnamed: 0,Let Alone,Way Manner,Resultative,Conative,Intransitive Motion,Caused Motion,Causative with CxN,Ditransitive CxN,Comparative Correlative,Much Less
0,"Most wives are too bloody old, let alone mothe...","Tricia backed her way out, never taking her ey...",The man shrieked himself unconscious. (Hoffman),She kicked at the ball. (Hoffman),The fly buzzed into the room. (Hoffman),He sang them out of the room. (Hoffman),She loaded the truck with books. (Hoffman),Jack passed her the salt. (Hoffman),"The harder they come, the harder they fall. (AMR)",When my dad catches swarms sometimes he doesn'...
1,"A ceasefire, let alone lasting peace, will tak...",The taxi nosed its way back into the traffic a...,Firefighters cut the man free. (Hoffman),He clutched at the branch. (Hoffman),He ran out of the house. (Hoffman),She wiggled her feet out of the boots. (Hoffman),He sprayed the walls with paint. (Hoffman),The waiter served them their dinner. (Hoffman),"The longer he is around, the more miserable I ...","Not that many of us can fly off to Guatemala, ..."
2,It is difficult enough for an individual to be...,"As she felt her way forward, suddenly a knight...",He had often drank himself silly. (Hoffman),They shot at the sheriff. (Hoffman),People strolled along the river. (Hoffman),They laughed the actor off the stage. (Hoffman),They heaped the plate with mashed potatoes. (H...,She sent him an email. (Hoffman),"The more I studied, the less I understood. (AMR)","I never thought I'd meet him, much less in Aus..."
3,I would be distressed to hear of any ladies re...,"If this were to happen, would it not be unfair...","It JERKS you awake with the first sentence , h...","With her free hand, she tugged at the strap of...",It shouldn't be - you should be able to DANCE ...,I THREW the stone across the river. (AMR),Kevin lifted the trumpet and filled the room w...,The above kind of exchange AFFORDS students th...,The more the merrier. (AMR),Those were not my first (much less only) thoug...
4,"None of these arguments is notably strong, let...",Through the French windows and across the lawn...,These agencies RENDER themselves ineffective b...,She tugged at the glass door. (COCA),"When the IED detonates, this copper cup turns ...",The RN can still launch a task force that woul...,Dude has filled the room with flowers and tedd...,She baked her sister a cake. (Goldberg),"It seems the older the patient, the less effec...",You can not hear (much less understand) anythi...


In [9]:
cogs_df["Let Alone"][0]

'Most wives are too bloody old, let alone mothers. (FN Construction)'

# Clean Data

In [10]:
cogs_df = clean_data(cogs_df)

In [11]:
cogs_df["Let Alone"][0]

'Most wives are too bloody old, let alone mothers.'

# Constructing Prompts

In [12]:
construction = "Let Alone"
construction_sentence = cogs_df[construction][0]
remaining_constructions = cogs_df.drop(columns = construction)
random_construction1 = random.choice(remaining_constructions.columns)
alternative_sentence1 = cogs_df[random_construction1][0]
random_construction2 = random.choice(remaining_constructions.columns)
remaining_constructions = remaining_constructions.drop(columns = random_construction1)
alternative_sentence2 = cogs_df[random_construction2][0]

print("Target Construction: {}\nSentence: {}\n".format(construction, construction_sentence))
print("Alternative Construction: {}\nSentence: {}\n".format(random_construction1, alternative_sentence1))
print("Alternative Construction: {}\nSentence: {}\n".format(random_construction2, alternative_sentence2))

Target Construction: Let Alone
Sentence: Most wives are too bloody old, let alone mothers.

Alternative Construction: Much Less 
Sentence: When my dad catches swarms sometimes he doesn't even wear a veil, much less a bee suit.

Alternative Construction: Caused Motion
Sentence: He sang them out of the room.



In [13]:
prompt_base = """Question: Which of the following sentence are instances of the {} construction? Output nothing but the relevant sentence:
{}
{}
{}
Answer:
"""

In [14]:
prompt = prompt_base.format(construction, construction_sentence, alternative_sentence1, alternative_sentence2)
print(prompt)

Question: Which of the following sentence are instances of the Let Alone construction? Output nothing but the relevant sentence: 
Most wives are too bloody old, let alone mothers.
When my dad catches swarms sometimes he doesn't even wear a veil, much less a bee suit.
He sang them out of the room.
Answer:



# Prompting the Model

In [15]:
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": prompt
        }
    ]
)

response = completion.choices[0].message.content


In [16]:
print(response)

Most wives are too bloody old, let alone mothers.  
When my dad catches swarms sometimes he doesn't even wear a veil, much less a bee suit.


# Evaluation

In [17]:
print(construction_sentence)

Most wives are too bloody old, let alone mothers.


In [18]:
print("scores_exact", response == construction_sentence )
print("scores_contained", construction_sentence.lower() in response.lower() )
print("scores_contains", response.lower() in construction_sentence.lower() )

scores_exact False
scores_contained True
scores_contains False


# Prompting with Multiple Examples

In [19]:
def get_rows(df, row_num):

  rows = []

  for i in range(row_num):
    row = []
    construction = random.choice(df.columns)
    construction_sentence = df[construction][random.randint(0, df[construction].count()-2)]

    remaining_constructions = df.drop(columns = construction)
    random_construction1 = random.choice(remaining_constructions.columns)
    alternative_sentence1 = df[random_construction1][random.randint(0, df[random_construction1].count()-2)]

    random_construction2 = random.choice(remaining_constructions.columns)
    remaining_constructions = remaining_constructions.drop(columns = random_construction1)
    alternative_sentence2 = df[random_construction2][random.randint(0, df[random_construction2].count()-2)]

    row = [construction_sentence, alternative_sentence1, alternative_sentence2]
    random.shuffle(row)
    row.append(construction)
    row.append(construction_sentence)

    rows.append(row)

  new_df = pd.DataFrame(rows, columns = ['sentence1','sentence2','sentence3','construction','target'])

  return new_df


In [20]:
cogsn_df = get_rows(cogs_df, 10)

In [21]:
cogsn_df.head(10)

Unnamed: 0,sentence1,sentence2,sentence3,construction,target
0,The squire looked at the peasant and covered h...,The joggers ran the pavement thin.,He buys her a bag of anise-flavored candies fr...,Resultative,The joggers ran the pavement thin.
1,Justin Timberlake makes Alcoholism Fun In The ...,I thought about taking Phoneix's advice and tu...,This because Anons are not looking for persona...,Resultative,Justin Timberlake makes Alcoholism Fun In The ...
2,Most stable-lads would have counted themselves...,"No one mentioned abortion, much less birth con...",The sleeping pills made me sick.,Resultative,The sleeping pills made me sick.
3,"So the more we talk about it, the more coverag...",The district court judge in Colorado awarded t...,Martin dabbed her lips with poppy paste.,Causative with CxN,Martin dabbed her lips with poppy paste.
4,These practical experiences afford school psyc...,He had just danced a circle around the Grim Re...,Dude has filled the room with flowers and tedd...,Ditransitive CxN,These practical experiences afford school psyc...
5,I ran around the track.,"He does NOT belong in the USA, much less the o...",But it happened that after walking for a long ...,Much Less,"He does NOT belong in the USA, much less the o..."
6,Marc seems to this there should be more news r...,He stabbed at the last part of the equation.,"James Jackson and Company of Augusta , the sec...",Ditransitive CxN,"James Jackson and Company of Augusta , the sec..."
7,Her fingers plucked at the air in movements as...,Workers dumped large burlap sacks of the impor...,"Bill, the more I read your stuff, the more I a...",Comparative Correlative,"Bill, the more I read your stuff, the more I a..."
8,He did not sound a bit sorry and she almost sm...,"The more private sources run charities, the le...",we did not see a single boat on our three dive...,Way Manner,He did not sound a bit sorry and she almost sm...
9,The idea of everyone writing a letter for thei...,"China confirmed on January 23, 2007 after 12 d...",The Pakistan-born man was then deported to the...,Causative with CxN,The idea of everyone writing a letter for thei...


In [22]:
exact_scores = []
relaxed_scores = []
responses = []

for index in range(len(cogsn_df)):
  prompt = prompt_base.format(
      cogsn_df.construction.iloc[index],
      cogsn_df.sentence1.iloc[index],
      cogsn_df.sentence2.iloc[index],
      cogsn_df.sentence3.iloc[index])

  target = cogsn_df.target.iloc[index]
  response = get_response(prompt, "gpt-4o-mini")

  exact_score = response == target
  relaxed_score = target.lower() in response.lower() or response.lower() in target.lower()

  responses.append(response)
  exact_scores.append(exact_score)
  relaxed_scores.append(relaxed_score)

cogsn_df["response"] = responses
cogsn_df["exact score"] = exact_scores
cogsn_df["relaxed score"] = exact_scores

In [23]:
cogsn_df.head(10)

Unnamed: 0,sentence1,sentence2,sentence3,construction,target,response,exact score,relaxed score
0,The squire looked at the peasant and covered h...,The joggers ran the pavement thin.,He buys her a bag of anise-flavored candies fr...,Resultative,The joggers ran the pavement thin.,The joggers ran the pavement thin.,True,True
1,Justin Timberlake makes Alcoholism Fun In The ...,I thought about taking Phoneix's advice and tu...,This because Anons are not looking for persona...,Resultative,Justin Timberlake makes Alcoholism Fun In The ...,None of the sentences provided are instances o...,False,False
2,Most stable-lads would have counted themselves...,"No one mentioned abortion, much less birth con...",The sleeping pills made me sick.,Resultative,The sleeping pills made me sick.,The sleeping pills made me sick.,True,True
3,"So the more we talk about it, the more coverag...",The district court judge in Colorado awarded t...,Martin dabbed her lips with poppy paste.,Causative with CxN,Martin dabbed her lips with poppy paste.,The district court judge in Colorado awarded t...,False,False
4,These practical experiences afford school psyc...,He had just danced a circle around the Grim Re...,Dude has filled the room with flowers and tedd...,Ditransitive CxN,These practical experiences afford school psyc...,These practical experiences afford school psyc...,False,False
5,I ran around the track.,"He does NOT belong in the USA, much less the o...",But it happened that after walking for a long ...,Much Less,"He does NOT belong in the USA, much less the o...","He does NOT belong in the USA, much less the o...",True,True
6,Marc seems to this there should be more news r...,He stabbed at the last part of the equation.,"James Jackson and Company of Augusta , the sec...",Ditransitive CxN,"James Jackson and Company of Augusta , the sec...","James Jackson and Company of Augusta , the sec...",True,True
7,Her fingers plucked at the air in movements as...,Workers dumped large burlap sacks of the impor...,"Bill, the more I read your stuff, the more I a...",Comparative Correlative,"Bill, the more I read your stuff, the more I a...","Bill, the more I read your stuff, the more I a...",True,True
8,He did not sound a bit sorry and she almost sm...,"The more private sources run charities, the le...",we did not see a single boat on our three dive...,Way Manner,He did not sound a bit sorry and she almost sm...,He did not sound a bit sorry and she almost sm...,True,True
9,The idea of everyone writing a letter for thei...,"China confirmed on January 23, 2007 after 12 d...",The Pakistan-born man was then deported to the...,Causative with CxN,The idea of everyone writing a letter for thei...,None of the sentences provided are instances o...,False,False


In [24]:
exact_accuracy = sum(cogsn_df["exact score"])/len(cogsn_df) * 100
relaxed_accuracy = sum(cogsn_df["relaxed score"])/len(cogsn_df) * 100

print(exact_accuracy)
print(relaxed_accuracy)

60.0
60.0


# Test Data

In [25]:
test_df = pd.read_csv("Construction_Grammar_Schematicity_Corpus-CoGS/Dataset/TestData.csv")

In [26]:
test_df.head()

Unnamed: 0,Construction Name,Test Sentence 1,Test Sentence 2,Test Sentence 3,Test Sentence 4,Test Sentence 5,Test Sentence 6,Test Correct Sentence 1 Index,Test Correct Sentence 2 Index,Test Correct Sentence 3 Index,...,One-shot Sentence 1,One-shot Sentence 2,One-shot Sentence 3,One-shotSentence 4,One-shot Sentence 5,One-shot Sentence 6,One-shot Correct Sentence 1 Index,One-shot Correct Sentence 2 Index,One-shot Correct Sentence 3 Index,One-shot Exemplar
0,Let Alone,It described extravagant Easter egg packaging ...,"A ceasefire, let alone lasting peace, will tak...",I'll let you alone right now.,Try to be happy ... let the glass globe be .,They can't let nature alone.,"Promises of a new bullet train here, a relief ...",0,1,5,...,The first membranes they made were far too thi...,Let me alone; my days have no meaning.,"The director-general, Michael Checkland, once ...","However, its role in influencing these rhythms...",The old man would not let it alone.,so many kids are born in sterile hospital room...,0,2,3,"Where Musgrove and John Hopkins, who put it al..."
1,Let Alone,so many kids are born in sterile hospital room...,Let me alone; my days have no meaning.,"You'd have trouble swinging a gerbil, let alon...",It takes science to come along and tell us tha...,Try to be happy ... let the glass globe be .,The Protestant majority in Ulster totally refu...,2,3,5,...,The old man would not let it alone.,I'll let you alone right now.,"Where Musgrove and John Hopkins, who put it al...",The first membranes they made were far too thi...,"The director-general, Michael Checkland, once ...",They can't let nature alone.,2,3,4,"However, its role in influencing these rhythms..."
2,Let Alone,How many modern Prime Ministers could recall s...,The old man would not let it alone.,It takes science to come along and tell us tha...,I'll let you alone right now.,It is difficult enough for an individual to be...,They can't let nature alone.,0,2,4,...,Let me alone; my days have no meaning.,so many kids are born in sterile hospital room...,Try to be happy ... let the glass globe be .,The disadvantage was that 'the liberal bridle-...,"Where Musgrove and John Hopkins, who put it al...",The first membranes they made were far too thi...,3,4,5,"The director-general, Michael Checkland, once ..."
3,Let Alone,Largely preoccupied by a sense of not being ab...,"Promises of a new bullet train here, a relief ...",Try to be happy ... let the glass globe be .,so many kids are born in sterile hospital room...,How many modern Prime Ministers could recall s...,The old man would not let it alone.,0,1,4,...,"The director-general, Michael Checkland, once ...",They can't let nature alone.,I'll let you alone right now.,The first membranes they made were far too thi...,Let me alone; my days have no meaning.,"However, its role in influencing these rhythms...",0,3,5,"Where Musgrove and John Hopkins, who put it al..."
4,Let Alone,It is difficult enough for an individual to be...,I would be distressed to hear of any ladies re...,I'll let you alone right now.,It's unsurprising that such an attitude failed...,so many kids are born in sterile hospital room...,The old man would not let it alone.,0,1,3,...,The first membranes they made were far too thi...,Try to be happy ... let the glass globe be .,Let me alone; my days have no meaning.,The disadvantage was that 'the liberal bridle-...,They can't let nature alone.,"The director-general, Michael Checkland, once ...",0,3,5,"Where Musgrove and John Hopkins, who put it al..."
