# Manual Annotation
This notebook is for running the manual annotation. It supports two activities:
-  Identifying the best response in each batch of generated text.
- Annotating the best response, as identified by the model, from each batch.
- Annotating the best response, as identified manually in the above step from each batch.

The first step of manually identifying the best response should be done just once by one person while the next step should be done individually by each annotator. Before getting started, ensure that the annotator name is set correctly.

Pre-requisites:
- The annotation input file, generated through the inferencing.ipynb notebook, must be available in GDRIVE_BASE location.


## Initial Setup

In [64]:
%load_ext autoreload
%autoreload 2

import sys
from google.colab import drive
import pandas as pd
import numpy as np
import torch
from tqdm.notebook import tqdm


GDRIVE_BASE = 'drive/MyDrive/MIDS/w266/project/'
ANNOTATOR='ram'

drive.mount('/content/drive')
sys.path.insert(0, GDRIVE_BASE)

import common

print(f'common.__version__: {common.__version__}')
# tuning_configs = common.create_configs(GDRIVE_BASE)


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
common.__version__: 1.4


# Best in Batch

In [71]:
def is_completed(batch):
  return len(batch[batch.human_top_score == True]) > 0

def mark_best(ignore_completed=False, save_each_step=True):
  # Load the annotation input
  if ignore_completed:
    # Load from prevously saved work.
    df_results = pd.read_csv(common.annotation_with_best_loc(GDRIVE_BASE))
  else:
    # Load from the original annotation input.
    df_results = pd.read_csv(common.annotation_input_loc(GDRIVE_BASE))
    # Add a new column to hold the best in batch flag.
    df_results['human_top_score'] = False
  num_batches = df_results.batch_id.max()
  batch_size = int(df_results.shape[0] / num_batches)

  # Random order of batches.
  batch_ids = np.arange(num_batches)
  np.random.shuffle(batch_ids)
  with tqdm(total=num_batches, unit='item', unit_scale=True) as pbar:
    for cur_batch_id in batch_ids:
      # Adjust for batch ids starting from 1.
      cur_batch_id = cur_batch_id + 1
      batch = df_results[df_results.batch_id == cur_batch_id]
      if len(batch) != batch_size:
        # We should never get here something is wrong.
        raise Exception(f'Batch {cur_batch_id} has {len(batch)} items but expecting {batch_size} items.')
      if is_completed(batch):
        print(f'Skipping batch {cur_batch_id} as it is already completed.')
      else:
        pbar.set_postfix(batch_id=cur_batch_id, prompt=batch.prompt.iloc[0][0:10] + '...', refresh=True)
        # Display the batch and get user's choice pick the best.
        indices = {}
        print(f'PROMPT: {batch.prompt.iloc[0]}')
        i = 1
        for index, row in batch.iterrows():
          print(f'{i}. {row["generated"]}')
          indices[i] = index
          i = i+1
        
        user_opt = -1
        msg = f'Enter 1 to {batch_size} or "quit"'
        quit = False
        while user_opt == -1:
          try:
            user_input = input(f'{msg}: ')
            if user_input == 'quit':
              quit = True
              break
            user_opt = int(user_input)
            if user_opt in indices:
              df_results.loc[indices[user_opt],'human_top_score'] = True
              if save_each_step:
                df_results.to_csv(common.annotation_with_best_loc(GDRIVE_BASE), index=None)
            else:
              print(msg)
              user_opt = -1
          except ValueError:
            print(msg)
            user_opt = -1      
        pbar.update(1)
        if quit:
          break
    df_results.to_csv(common.annotation_with_best_loc(GDRIVE_BASE), index=None)

In [73]:
# Run this cell to start an interactive session to identify best generated text in each batch.
# This will walk through the batches randomly and present one batch at a time. 
# A batch is defined as the N texts generated from one model configuration for one prompt.
# For each batch shown, select the best text. The method creates a new file with the final result.
# Set ignore_completed to True if continuing from a previous session.
# save_each_step=True ensures that the dataframe is saved after each batch is marked.
mark_best(ignore_completed=False, save_each_step=True)

  0%|          | 0.00/18.0 [00:00<?, ?item/s]

Skipping batch 12 as it is already completed.
Skipping batch 8 as it is already completed.
Skipping batch 5 as it is already completed.
Skipping batch 4 as it is already completed.
Skipping batch 9 as it is already completed.
Skipping batch 16 as it is already completed.
Skipping batch 11 as it is already completed.
Skipping batch 7 as it is already completed.
Skipping batch 3 as it is already completed.
Skipping batch 18 as it is already completed.
Skipping batch 6 as it is already completed.
Skipping batch 13 as it is already completed.
Skipping batch 15 as it is already completed.
Skipping batch 14 as it is already completed.
Skipping batch 2 as it is already completed.
Skipping batch 17 as it is already completed.
Skipping batch 1 as it is already completed.
Skipping batch 10 as it is already completed.


In [75]:
# Show the final results with best of batch marked.
df_results = pd.read_csv(common.annotation_with_best_loc(GDRIVE_BASE))
df_results

Unnamed: 0,batch_id,name,model,tuned,dataset,prompt,score,generated,model_top_score,human_top_score
0,1,t5_s1,google/t5-v1_1-base-s1-finetuned,True,s1,Lara felt very sad and scared.,0.000038,"""Are you going to do that?"" asked Lavinia.",False,True
1,1,t5_s1,google/t5-v1_1-base-s1-finetuned,True,s1,Lara felt very sad and scared.,0.000031,"""It is a pity that it is so old,"" she said.",False,False
2,1,t5_s1,google/t5-v1_1-base-s1-finetuned,True,s1,Lara felt very sad and scared.,0.000079,The old woman sat down and waited.,True,False
3,1,t5_s1,google/t5-v1_1-base-s1-finetuned,True,s1,Lara felt very sad and scared.,0.000033,"""You are so kind, sir,"" she cried.",False,False
4,1,t5_s1,google/t5-v1_1-base-s1-finetuned,True,s1,Lara felt very sad and scared.,0.000035,"""I'm afraid of her!"" she whispered.",False,False
...,...,...,...,...,...,...,...,...,...,...
85,18,baseline,facebook/opt-350m,False,,All the dragons of the world lived on one moun...,0.006471,Continue the next sentence of the story making...,False,True
86,18,baseline,facebook/opt-350m,False,,All the dragons of the world lived on one moun...,0.002831,Continue the next sentence of the story making...,False,False
87,18,baseline,facebook/opt-350m,False,,All the dragons of the world lived on one moun...,0.004651,Continue the next sentence of the story making...,False,False
88,18,baseline,facebook/opt-350m,False,,All the dragons of the world lived on one moun...,0.006506,Continue the next sentence of the story making...,True,False
