### Specify the dataset: 

In [1]:
# specify the name of dataset: "hotpotqa_valid_original_split.csv", "gooaq_valid_original_split.csv", or "msmarco_valid_original_split.csv"
dataset_name= "gooaq_valid_original_split.csv"

path= "../../dataset/test/"+dataset_name

out_dir='../../output/UnifiedQA-results/COTs-fewshot/'

# name of the fine-tuned model, the pre-trained model is named: "msmarco-original-split/", "hotpotqa-original-split/" or "gooaq-original-split/"
model_name= "gooaq-original-split/"


### Few-shot examples: 

In [2]:
few_shot= """
[question_1]: How does smoking cause cancer?\n
[answer_1]: 
Smoking damages the DNA in our cells, which controls how they grow and behave. Smoking has many harmful chemicals, such as BP, that can attach to DNA and bend it out of shape . 
This can stop DNA from working properly and cause mutations that lead to cancer. Smoking can cause many types of cancer, such as lung, mouth, and bladder cancer . 
The more and longer a person smokes, the higher their cancer risk. Quitting smoking can lower the risk, but it may take years to undo the damage.The best way to avoid smoking-related cancer is to never smoke or quit as soon as possible.

\n\n
[question_2]: How does an earthquake cause a tsunami?\n
[answer_2]: 
An earthquake causes a tsunami by displacing a large amount of water in the ocean. When two tectonic plates collide or slide past each other, they can create a sudden movement of the sea floor, which lifts or drops the water above it. 
This creates a series of waves that travel across the ocean at high speeds. The waves can grow larger and more destructive as they approach shallow coastal areas, where the water depth decreases and the wave height increases. 
Earthquakes are the most common cause of tsunamis, but they can also be triggered by volcanic eruptions or landslides

"""

COTs_prompt= "Example questions:\n"+few_shot

### Configuration of UnifiedQA model: 

In [3]:
import pandas as pd
import json

from transformers import T5Tokenizer, T5ForConditionalGeneration

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
tokenizer = T5Tokenizer.from_pretrained("allenai/unifiedqa-v2-t5-base-1363200", max_length=512, truncation=True)

model_path = "../../Webis-CausalQA-22-v-2.0/models/original_splits/"+ model_name
model = T5ForConditionalGeneration.from_pretrained(model_path)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=True`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [5]:
def run_model(input_string, **generator_args):
    input_ids = tokenizer.encode(input_string, return_tensors="pt")
    res = model.generate(input_ids, max_length=200, **generator_args)
    
    return tokenizer.batch_decode(res, skip_special_tokens=True)

### Load the dataset and answer the questions: 

In [6]:
dataset_df= pd.read_csv(path)

dataset_df['generated_answer']=""


#iterate over the questions and answers: 
for row in dataset_df.itertuples():
        
    # answer the question using the model
    generated_answer=run_model("Let's think step by step to answer the question: "+row.question_processed+'\n\n'+COTs_prompt)
    
    dataset_df.loc[row.Index,'generated_answer']= generated_answer[0]
    
dataset_df.to_csv(out_dir+'/'+dataset_name)

### Evaluation using EM and F1 metrics:

In [7]:
import os
import sys
sys.path.append('../../src')

import measures

In [8]:
os.chdir(out_dir)

def evaluate_unifiedqa(predictions, answers):
    
    result = {}
    result['checkpoint'] = model_name
    result['metrics'] = measures.all_metrics(predictions, answers)
    result['predictions'] = predictions

   
    filename=dataset_name.split('.')[0]
    filename=filename+".json"
    
    with open(filename, 'w+') as file:
        json.dump(result, file, indent=4)
        
    print ('results saved at ', out_dir+filename)

In [9]:
predictions= dataset_df['generated_answer'].tolist()
answers= dataset_df['answer_processed'].tolist()

answers= [[answer] for answer in answers]

evaluate_unifiedqa(predictions, answers)

results saved at  ../../output/UnifiedQA-results/COTs-fewshot/gooaq_valid_original_split.json
