<a href="https://colab.research.google.com/github/melissatorgbi/LLMCxG_Workshop/blob/main/notebooks/LLMCxG_Notebook_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

overview/description

## Imports

In [None]:
!git clone https://github.com/melissatorgbi/Assessing-Language-Comprehension-in-Large-Language-Models-Using-Construction-Grammar.git
%cd Assessing-Language-Comprehension-in-Large-Language-Models-Using-Construction-Grammar

In [98]:
import pandas as pd
import csv
import os
from tqdm import tqdm

from experimentSetup import runTests_NLI
from prompts import CxNLI_prompts

# Set-up

In [58]:
OPENAI_API_KEY = #enter your api key here

In [59]:
with open('openai_api_keys.env', 'w') as writefile:
    writefile.write(OPENAI_API_KEY+", ")

In [103]:
def get_all( tester, model, prompt_number, experiment, temperature) :

        schematicity = { 'let-alone'            : 'substantive',
                         'way-manner'           : 'partial',
                         'resultative'          : 'schematic',
                         'conative'             : 'partial',
                         'intransitive motion'  : 'schematic',
                         'intransitive-motion'  : 'schematic',
                         'caused-motion'        : 'schematic',
                         'causative - with'     : 'partial',
                         'causative-with-CxN'   : 'partial',
                         'ditransitive'         : 'schematic',
                         'ditransitive-CxN'     : 'schematic',
                         'comparative-correlative':'comparative-correlative',
                        }


        results_by_row = [ [ 'CxG', 'schematicity', 'Premise', 'Hypothesis', 'Gold', 'Prediction' ] ]
        for row in tqdm(tester.test_data):

          if row[0] == "let-alone":

            this_schematicity = schematicity[ row[0] ]

            prompt = tester._generate_prompt( prompt_number, row[0] )
            prompt = prompt.format( row[1], row[2] )

            response = tester._gpt_get( model, prompt , temperature)

            results_by_row.append( [ row[0], this_schematicity, row[1], row[2], row[3], response ] )

        outfile = os.path.join( tester.output_location, "output_{}_{}_prompt{}.csv".format( model, experiment, prompt_number ) )
        with open( outfile, 'w' ) as csvoutfile :
            writer = csv.writer( csvoutfile )
            writer.writerows( results_by_row )

            print( "Wrote " + outfile )

        return

# Data

In [26]:
cnli_df = pd.read_csv("Assessing-Language-Comprehension-in-Large-Language-Models-Using-Construction-Grammar/data/constructional_NLI/CxNLI_3_examples_test.tsv", sep='\t')
cnli_df.head(10)

Unnamed: 0,CxN Type,Number,P/H/R,Annotation Targets - Gold Standard Relation
0,let-alone,4,premise,It is difficult enough for an individual to be...
1,,4,hypothesis,"If an individual is consistent, a society migh..."
2,,4,relation,1 (neutral)
3,let-alone,5,premise,It is difficult enough for an individual to be...
4,,5,hypothesis,It is easier for a society to be consistent th...
5,,5,relation,2 (contradiction)
6,let-alone,6,premise,It is difficult enough for an individual to be...
7,,6,hypothesis,It is easier for an individual to be consisten...
8,,6,relation,0 (entailment)
9,let-alone,7,premise,I would be distressed to hear of any ladies re...


In [20]:
cnli_df.columns

Index(['CxN Type', 'Number', 'P/H/R',
       'Annotation Targets - Gold Standard Relation'],
      dtype='object')

In [17]:
cnli_df['CxN Type'].unique()

array(['let-alone', nan, 'way-manner', 'resultative', 'conative',
       'intransitive-motion', 'caused-motion', 'causative-with-CxN',
       'comparative-correlative'], dtype=object)

In [31]:
print("construction:", cnli_df['CxN Type'][0])
for i in range (3):
  print("\npremise:",cnli_df['Annotation Targets - Gold Standard Relation'][i*3])
  print("hypothesis:",cnli_df['Annotation Targets - Gold Standard Relation'][i*3+1])
  print("relation:",cnli_df['Annotation Targets - Gold Standard Relation'][i*3+2])

construction: let-alone

premise: It is difficult enough for an individual to be consistent let alone a society.
hypothesis: If an individual is consistent, a society might also be consistent.
relation: 1 (neutral)

premise: It is difficult enough for an individual to be consistent let alone a society.
hypothesis: It is easier for a society to be consistent than an individual.
relation: 2 (contradiction)

premise: It is difficult enough for an individual to be consistent let alone a society.
hypothesis: It is easier for an individual to be consistent than a society.
relation: 0 (entailment)


# Base Prompts

In [47]:
print(CxNLI_prompts.prompts[0])

You are the world's best annotator. Your task is to read sentences from a dataset, presented as the Premise in a set of triples for the Natural Language Inference (NLI) task. Also known as Recognizing Textual Entailment (RTE), NLI involves determining the inference relation between two short, ordered texts: entailment, contradiction, or neutral. Next, you will identify the Relation between the Premise and the Hypothesis, which indicates the type of entailment between the two sentences. We use numerical coding, also listed in your annotation spreadsheet as a reminder:
0 – entailment – The hypothesis must be true given the premise
1 – neutral – The hypothesis may or may not be true given the premise
2 – contradiction – The hypothesis must not be true given the premise
Output a single numerical value between 0, 1, or 2, corresponding to the associated relation. Output a single number only and nothing else.



# Experiment Set up

In [101]:
openai_completion_style = "ChatCompletion"
api_path = "openai_api_keys.env"
test_data = "data/constructional_NLI/CxNLI_3_examples_test.tsv" #NLI example
train_data = "data/constructional_NLI/CxNLI_3_examples_train.csv" #Not neccessary if running a reasoning experiment or zero shot NLI, can leave as ""
train_data_version = "zero"
temperature = 0 #Change the temperature from 0 to 1 for o1 model
model = "gpt-4o-mini" #Other options inlcude, "o1-preview-2024-09-12" and "gpt-3.5-turbo"
prompt_number = 2 #Prompts can be found in prompts/CxNLI_prompts.py or prompts/CxReasoning_prompts.py for NLI and reasoning tasks respectively
experiment_name = "CxNLI"

!mkdir output
output_directory = "output"

In [64]:
tester = runTests_NLI( test_data, train_data, output_directory, openai_completion_style, api_path, train_data_version)

In [75]:
print(tester.test_data)

[['let-alone', 'It is difficult enough for an individual to be consistent let alone a society.', 'If an individual is consistent, a society might also be consistent.', 1], ['let-alone', 'It is difficult enough for an individual to be consistent let alone a society.', 'It is easier for a society to be consistent than an individual.', 2], ['let-alone', 'It is difficult enough for an individual to be consistent let alone a society.', 'It is easier for an individual to be consistent than a society.', 0], ['let-alone', 'I would be distressed to hear of any ladies reading it, let alone a girl of your tender years and experience.', 'If I would be distressed of a girl of your tender years reading it, I would also be distressed of any ladies reading it.', 1], ['let-alone', 'I would be distressed to hear of any ladies reading it, let alone a girl of your tender years and experience.', 'I would be equally distressed to hear of any ladies reading it as I would be to hear of a girl of your tender y

In [79]:
test_df = pd.DataFrame(columns=["construction","premise","hypothesis","relation"], data=tester.test_data)

In [80]:
test_df.head()

Unnamed: 0,construction,premise,hypothesis,relation
0,let-alone,It is difficult enough for an individual to be...,"If an individual is consistent, a society migh...",1
1,let-alone,It is difficult enough for an individual to be...,It is easier for a society to be consistent th...,2
2,let-alone,It is difficult enough for an individual to be...,It is easier for an individual to be consisten...,0
3,let-alone,I would be distressed to hear of any ladies re...,If I would be distressed of a girl of your ten...,1
4,let-alone,I would be distressed to hear of any ladies re...,I would be equally distressed to hear of any l...,2


## example prompt

In [82]:
prompt = tester._generate_prompt(prompt_number, test_df['construction'][0]).format(
    test_df['premise'][0],
    test_df['hypothesis'][0]
)
print(prompt)

You are the best at understanding language inference based on construction grammar. You are tasked with annotating a triple for Natural Language Inference. You must determine the inference relation between the premise and the hypothesis by selecting one of three numerical codes that reflect the relationship:
0 – entailment – The hypothesis must be true given the premise
1 – neutral – The hypothesis may or may not be true given the premise
2 – contradiction – The hypothesis must not be true given the premise
Output a single numerical value between 0, 1, or 2, corresponding to the associated relation. Output a single number only and nothing else.

Premise: It is difficult enough for an individual to be consistent let alone a society.
Hypothesis: If an individual is consistent, a society might also be consistent.
Relation: 


## Prompting the Model

In [71]:
response = tester._gpt_get( model, prompt , temperature)

In [84]:
print("model response:", response)
print("correct answer:", test_df['relation'][0])

model response: 1
correct answer: 1


# Prompting with Multiple Examples

tester.get_all(model, prompt_number, experiment_name, temperature)

In [104]:
get_all( tester, model, prompt_number, experiment_name, temperature)

100%|██████████| 390/390 [00:15<00:00, 25.14it/s]

Wrote output/output_gpt-4o-mini_CxNLI_prompt2.csv





In [106]:
results = pd.read_csv("output/output_{}_{}_prompt{}.csv".format( model, experiment_name, prompt_number ))
results.head()

Unnamed: 0,CxG,schematicity,Premise,Hypothesis,Gold,Prediction
0,let-alone,substantive,It is difficult enough for an individual to be...,"If an individual is consistent, a society migh...",1,1
1,let-alone,substantive,It is difficult enough for an individual to be...,It is easier for a society to be consistent th...,2,2
2,let-alone,substantive,It is difficult enough for an individual to be...,It is easier for an individual to be consisten...,0,2
3,let-alone,substantive,I would be distressed to hear of any ladies re...,If I would be distressed of a girl of your ten...,1,0
4,let-alone,substantive,I would be distressed to hear of any ladies re...,I would be equally distressed to hear of any l...,2,0


# Evaluation

In [110]:
total_num = len(results)
correct = sum(results.Gold==results.Prediction)

accuracy = correct/total_num * 100
print("The model got {} correct predictions out of {}".format(correct, total_num))
print("accuracy:",accuracy)

The model got 8 correct predictions out of 24
accuracy: 33.33333333333333


# Your own task