# Prompting the models to classify the statements

In [1]:
import pandas as pd

grobid_model = "full_model_texts"

# read the dataframe from a pickle file
df = pd.read_pickle(f'../data/dfs/{grobid_model}/ReferenceErrorDetection_data_with_chunk_info.pkl')

In [37]:
df.head()

Unnamed: 0,Source,Citing Article ID,Citing Article DOI,Citing Article Title,Citing Article Retracted,Citing Article Downloaded,Domain,Statement with Citation,Reference Article ID,Reference Article DOI,Reference Article Title,Reference Article Abstract,Reference Article PDF Available,Reference Article Retracted,Reference Article Downloaded,Label,Explanation,Top_3_Chunk_IDs,Top_3_Chunk_Texts
0,PubPeer,c001,10.1016/j.est.2021.103553,Heating a residential building using the heat ...,Yes,Yes,Engineering,Others have aimed to reduce irreversibility or...,r001,10.1155/2021/2087027,A Fault Analysis Method for Three-Phase Induct...,The fault prediction and abductive fault diagn...,Yes,No,Yes,Unsubstantiate,Irrelevant,"[463a9b39-eb4f-4a3b-8fba-bef29dc86e7f, 0e878dc...","[overlapped signals), it was difficult to extr..."
1,PubPeer,c001,10.1016/j.est.2021.103553,Heating a residential building using the heat ...,Yes,Yes,Engineering,Some researchers have also studied various hea...,r002,10.1016/j.physa.2018.12.031,Develop 24 dissimilar ANNs by suitable archite...,The artificial neural network optimization met...,Yes,No,Yes,Unsubstantiate,Irrelevant,"[0f39ec1e-4d05-40e5-8895-9a3d0dcd459f, 7ce7e45...",[Introduction\nThe mixture composed of nanopar...
2,PubPeer,c002,10.1155/2022/4601350,Oxidative Potential and Nanoantioxidant Activi...,Yes,Yes,Chemistry,The relative content of total flavonoids in th...,r003,10.1088/1742-6596/1937/1/012038,Lipid Data Acquisition for devices Treatment o...,"Recently, the widespread deployment of smart p...",Yes,No,Yes,Unsubstantiate,Irrelevant,"[c8ef4a6d-a809-4e1f-9fed-e4ef1c83c757, 7a14f9c...",[TG calculation concentration. The coefficient...
3,PubPeer,c003,10.1155/2022/2408685,The Choice of Anesthetic Drugs in Outpatient H...,Yes,Yes,Medicine,Research has shown that remimazolam tosylate e...,r004,10.1186/s12871-018-0543-3,"Effect of propofol on breast cancer cell, the ...",Breast cancer is the second leading cause of c...,Yes,No,Yes,Unsubstantiate,Irrelevant,"[7de1c039-4410-43b2-b5e9-c4687f9a92f7, 4f32ea5...",[propofol/paravertebral patients increased the...
4,PubPeer,c004,10.1155/2022/4783847,A Fault-Tolerant Structure for Nano-Power Comm...,Yes,Yes,Engineering,if the efficiency of the routing algorithm is ...,r005,10.36410/jcpr.2022.23.3.312,Analysis and research hotspots of ceramic mate...,"From the perspective of scientometrics, comb t...",Yes,No,Yes,Unsubstantiate,Irrelevant,"[80fa3ce9-66a0-450b-ac8f-5e7d4554c07e, 3a7a50d...","[field, it is also convenient for researchers ..."


## Create the prompts

In [50]:
def format_excerpts(excerpt_list):
    excerpts_text = ""
    for id, excerpt in enumerate(excerpt_list):
        excerpts_text += f"Excerpt {id+1}: \n{excerpt}\n\n"
    return excerpts_text

In [51]:
print(format_excerpts(df.iloc[0]['Top_3_Chunk_Texts']))

Excerpt 1: 
overlapped signals), it was difficult to extract the features of bearing fault information. In  [12] , a new current signature analysis-based fault detector for motors based on a matched subspace technique was proposed. However, it was only effective for detecting eccentricity faults, bearing faults, and broken rotor bars. Reference  [13]  proposed a technique based on vibration information to identify and classify different bearing failure conditions. -e setting and testing of parameters was strict and difficult; for example, the accelerometer needs to be very close to the motor, and the setting of accelerometer and data logger should be the same. However, this method needed much historical data with a complex computing process. In  [14] , an intelligent fault diagnosis of three-phase induction motors using a signal-based method was proposed and tested in different situations, in order to verify its availability in diagnosing failures, even when the operating mode data wer

In [52]:
def create_prompt(df_row):
    title = df_row['Citing Article Title']
    statement = df_row['Statement with Citation']
    reference_title = df_row['Reference Article Title']
    reference_abstract = df_row['Reference Article Abstract']
    reference_excerpts = format_excerpts(df_row['Top_3_Chunk_Texts'])

    prompt = f"""   
You are an experienced scientific writer and editor. 
You will be given a statement from an article that cites a reference article and information from the reference article. 
You will determine and explain if the reference article supports the statement.  
    
Specifically, choose a label from "Fully substantiate", "Partially substantiate", and "Unsubstantiate". 
Further explanations of the labels are as follows: 
"Fully substantiated": The reference article fully substantiates the relevant part of the statement from the present article. 
"Partially substantiated": According to the reference article, there is a minor error in the statement but the error does not invalidate the purpose of the statement. 
"Unsubstantiate": The reference part does not substantiate any part of the statement. This could be because the statement is contradictory to, unrelated to, or simply missing from the reference article.  
    
Format your answer in JSON with two elements: "label" and "explanation". 
Your explanation should be short and concise. 
    
# The citing article
Title: {title} 
Statement: {statement}
    
# The reference article 
Title: {reference_title} 
Abstract: {reference_abstract} 
Excerpts: \n\n{reference_excerpts}
"""

    return prompt

In [53]:
example_prompt = create_prompt(df.iloc[0])
print(example_prompt)

   
You are an experienced scientific writer and editor. 
You will be given a statement from an article that cites a reference article and information from the reference article. 
You will determine and explain if the reference article supports the statement.  
    
Specifically, choose a label from "Fully substantiate", "Partially substantiate", and "Unsubstantiate". 
Further explanations of the labels are as follows: 
"Fully substantiated": The reference article fully substantiates the relevant part of the statement from the present article. 
"Partially substantiated": According to the reference article, there is a minor error in the statement but the error does not invalidate the purpose of the statement. 
"Unsubstantiate": The reference part does not substantiate any part of the statement. This could be because the statement is contradictory to, unrelated to, or simply missing from the reference article.  
    
Format your answer in JSON with two elements: "label" and "explanation"

## Prompting the models

In [54]:
# Read the content of open_ai_key.txt into a variable
with open('../open_ai_key.txt', 'r') as file:
    open_ai_key = file.read().strip()

In [55]:
from openai import OpenAI
client = OpenAI(api_key=open_ai_key)

def send_prompt(prompt, model="gpt-3.5-turbo-0125"):
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature=0,
        timeout=30,

    )
    return completion.choices[0].message.content

In [56]:
send_prompt(example_prompt)

'{\n    "label": "Partially substantiate",\n    "explanation": "The reference article discusses fault prediction and diagnosis methods for three-phase induction motors, which is related to optimizing energy-consumed devices. However, the specific focus on reducing irreversibility is not directly addressed in the reference article."\n}'

In [57]:
# Create a new column in the dataframe to store the responses
df['Model Classification'] = None

# Iterate through the dataframe
for index, row in df.iterrows():
    if row['Reference Article Downloaded'] == 'Yes' and not row['Model Classification']:
        print(f"Processing: " + row['Reference Article ID'])

        # Create the prompt
        prompt = create_prompt(row)
        
        # Send the prompt and get the response
        response = send_prompt(prompt)
        
        # Save the response to the new column
        df.at[index, 'Model Classification'] = response

Processing: r001
Processing: r002
Processing: r003
Processing: r004
Processing: r005
Processing: r006
Processing: r007
Processing: r008
Processing: r009
Processing: r010
Processing: r011
Processing: r012
Processing: r013
Processing: r013
Processing: r014
Processing: r015
Processing: r005
Processing: r017
Processing: r018
Processing: r019
Processing: r020
Processing: r021
Processing: r022
Processing: r023
Processing: r024
Processing: r013
Processing: r025
Processing: r026
Processing: r027
Processing: r028
Processing: r029
Processing: r030
Processing: r031
Processing: r032
Processing: r033
Processing: r034
Processing: r035
Processing: r036
Processing: r037
Processing: r038
Processing: r039
Processing: r040
Processing: r041
Processing: r042
Processing: r043
Processing: r044
Processing: r045
Processing: r046
Processing: r047
Processing: r048
Processing: r049
Processing: r050
Processing: r051
Processing: r052
Processing: r053
Processing: r051
Processing: r055
Processing: r056
Processing: r0

In [58]:
df.head()

Unnamed: 0,Source,Citing Article ID,Citing Article DOI,Citing Article Title,Citing Article Retracted,Citing Article Downloaded,Domain,Statement with Citation,Reference Article ID,Reference Article DOI,Reference Article Title,Reference Article Abstract,Reference Article PDF Available,Reference Article Retracted,Reference Article Downloaded,Label,Explanation,Top_3_Chunk_IDs,Top_3_Chunk_Texts,Model Classification
0,PubPeer,c001,10.1016/j.est.2021.103553,Heating a residential building using the heat ...,Yes,Yes,Engineering,Others have aimed to reduce irreversibility or...,r001,10.1155/2021/2087027,A Fault Analysis Method for Three-Phase Induct...,The fault prediction and abductive fault diagn...,Yes,No,Yes,Unsubstantiate,Irrelevant,"[463a9b39-eb4f-4a3b-8fba-bef29dc86e7f, 0e878dc...","[overlapped signals), it was difficult to extr...","{\n ""label"": ""Partially substantiate"",\n ..."
1,PubPeer,c001,10.1016/j.est.2021.103553,Heating a residential building using the heat ...,Yes,Yes,Engineering,Some researchers have also studied various hea...,r002,10.1016/j.physa.2018.12.031,Develop 24 dissimilar ANNs by suitable archite...,The artificial neural network optimization met...,Yes,No,Yes,Unsubstantiate,Irrelevant,"[0f39ec1e-4d05-40e5-8895-9a3d0dcd459f, 7ce7e45...",[Introduction\nThe mixture composed of nanopar...,"{\n ""label"": ""Unsubstantiate"",\n ""explan..."
2,PubPeer,c002,10.1155/2022/4601350,Oxidative Potential and Nanoantioxidant Activi...,Yes,Yes,Chemistry,The relative content of total flavonoids in th...,r003,10.1088/1742-6596/1937/1/012038,Lipid Data Acquisition for devices Treatment o...,"Recently, the widespread deployment of smart p...",Yes,No,Yes,Unsubstantiate,Irrelevant,"[c8ef4a6d-a809-4e1f-9fed-e4ef1c83c757, 7a14f9c...",[TG calculation concentration. The coefficient...,"{\n ""label"": ""Unsubstantiate"",\n ""explan..."
3,PubPeer,c003,10.1155/2022/2408685,The Choice of Anesthetic Drugs in Outpatient H...,Yes,Yes,Medicine,Research has shown that remimazolam tosylate e...,r004,10.1186/s12871-018-0543-3,"Effect of propofol on breast cancer cell, the ...",Breast cancer is the second leading cause of c...,Yes,No,Yes,Unsubstantiate,Irrelevant,"[7de1c039-4410-43b2-b5e9-c4687f9a92f7, 4f32ea5...",[propofol/paravertebral patients increased the...,"{\n ""label"": ""Partially substantiate"",\n ..."
4,PubPeer,c004,10.1155/2022/4783847,A Fault-Tolerant Structure for Nano-Power Comm...,Yes,Yes,Engineering,if the efficiency of the routing algorithm is ...,r005,10.36410/jcpr.2022.23.3.312,Analysis and research hotspots of ceramic mate...,"From the perspective of scientometrics, comb t...",Yes,No,Yes,Unsubstantiate,Irrelevant,"[80fa3ce9-66a0-450b-ac8f-5e7d4554c07e, 3a7a50d...","[field, it is also convenient for researchers ...","{\n ""label"": ""Unsubstantiate"",\n ""explan..."


In [59]:
df.to_pickle(f'../data/dfs/{grobid_model}/ReferenceErrorDetection_data_with_prompt_results.pkl')

## Compare model classification with labels

In [12]:
# Fix the entries where additional json prefixes and suffixes were added
for index, row in df.iterrows():
    if row['Model Classification'] and row['Model Classification'].startswith("```json") and row['Model Classification'].endswith("```"):
        df.at[index, 'Model Classification'] = row['Model Classification'][7:-3].strip()
    elif row['Model Classification'] and row['Model Classification'].startswith("```json") and row['Model Classification'].rstrip().endswith("```"):
        df.at[index, 'Model Classification'] = row['Model Classification'][7:].rstrip()[:-3].strip()

In [13]:
for index, row in df.iterrows():
    if row['Model Classification'] and not row['Model Classification'].strip().endswith('"\n}'):
        # Fix the JSON by appending the missing closing brace
        df.at[index, 'Model Classification'] = row['Model Classification'].strip() + '"\n}'

### Three (all) labels

In [14]:
import json

correct_classification = 0
incorrect_classification = 0
json_error = 0

for index, row in df.iterrows():
    if row['Reference Article Downloaded'] == 'Yes':
        try:
            model_classification = json.loads(row['Model Classification'])
            # print(f"{model_classification['label']}")
            if model_classification['label'].lower() == row['Label'].lower():
                correct_classification += 1
            else:
                # print(f"Row {index} Model Classification: {model_classification['label']} Actual Label: {row['Label']}")
                incorrect_classification += 1
        except json.JSONDecodeError as e:
            # print(f"Row {index} Model Classification could not be decoded: {e}")
            print(row['Model Classification'])
            json_error += 1

In [15]:
print(f"Correct classifications: {correct_classification}")
print(f"Incorrect classifications: {incorrect_classification}")
print(f"JSON errors: {json_error}")

print(f"Accuracy: {correct_classification / (correct_classification + incorrect_classification + json_error)}")
print(f"Accuracy without JSON errors: {correct_classification / (correct_classification + incorrect_classification)}")

Correct classifications: 136
Incorrect classifications: 111
JSON errors: 0
Accuracy: 0.5506072874493927
Accuracy without JSON errors: 0.5506072874493927


### Only two labels

In [67]:
import json

correct_classification = 0
incorrect_classification = 0
json_error = 0

for index, row in df.iterrows():
    if row['Reference Article Downloaded'] == 'Yes':
        try:
            model_classification = json.loads(row['Model Classification'])
            # print(f"{model_classification['label']}")

            # remove the 'd' from the end of the label if it exists
            if model_classification['label'].lower().endswith('d'):
                model_classification['label'] = model_classification['label'][:-1]

            if model_classification['label'].lower() in ['fully substantiate', 'partially substantiate'] and row['Label'].lower() in ['fully substantiate', 'partially substantiate']:
                correct_classification += 1
            elif model_classification['label'].lower() == 'unsubstantiate' and row['Label'].lower() == 'unsubstantiate':
                correct_classification += 1
            elif model_classification['label'].lower() not in ['fully substantiate', 'partially substantiate', 'unsubstantiate']:
                print(f"Row {index} Model Classification: {model_classification['label']} Actual Label: {row['Label']}")
                incorrect_classification += 1
            else:
                # print(f"Row {index} Model Classification: {model_classification['label']} Actual Label: {row['Label']}")
                incorrect_classification += 1
        except json.JSONDecodeError as e:
            # print(f"Row {index} Model Classification could not be decoded: {e}")
            json_error += 1

In [68]:
print(f"Correct classifications: {correct_classification}")
print(f"Incorrect classifications: {incorrect_classification}")
print(f"JSON errors: {json_error}")

print(f"Accuracy: {correct_classification / (correct_classification + incorrect_classification + json_error)}")
print(f"Accuracy without JSON errors: {correct_classification / (correct_classification + incorrect_classification)}")

Correct classifications: 192
Incorrect classifications: 55
JSON errors: 0
Accuracy: 0.7773279352226721
Accuracy without JSON errors: 0.7773279352226721
