# Short Story Generator: Evaluation Pipelines

This is a top-level (i.e. it can be executed by the user) notebook that makes use of (calls) many other notebooks and files. The files (data) is shared using GDrive and the other notebooks using GitHub. To connect to these external sources, it is thus imperrative that Steps 0 to 7 (contained under the **Load libraries & connect to code and data** heading) are followed first.

If the user wants to investigate the code (annotated notebooks) called by this notebook, they can also be opened in Google Colab:

File > Open notebook > GitHub > https://github.com/frau-web/nlp_short_story_generator.git

---

This specific notebook (in the section titled **Evaluation Grid Creation**)collects all trained Story Generating Models and creates a grid of stories generated using different combinations of models, training data, generated story lengths and seed sentences.

---

Next (in the **Manual Rating** section), this grid can be evaluated by users.

---

Lastly (as contained in the **Rating Analysis** section), these manual ratings are analysed.

---
In summary, all users must follow all steps in the **Load libraries & connect 
to code and data** section, but can then skip the any other relevant section.

---


## Load libraries & connect to code and data

**Step 0:** Please run the following cell to load the required libraries and then follow Steps 1-5 and 6-7 to connect with and download, respectively, the data and additional code this notebook requires to function.

In [None]:
from IPython.display import clear_output
from google.colab import drive
import pandas as pd
import time

### Data from GDrive

**Step 1:** Please navigate to the shared folder on GDrive, named "data" that contains the project's data and select "Add a shortcut to Drive" to add a shortcut of the folder to YOUR GDrive.

**Step 2:** Please mount YOUR GDrive:

In [None]:
drive.mount('/content/gdrive')

**Step 3:** By using the "Files" tab in the Left-hand Sidebar of Colab, please navigate to the "data" (shortcut) folder that you created in Step 1 and, from the menu that appears when you click on the three dots next to "data", select "Copy path".

**Step 4:** Please run the following cell and paste that path when prompted:

In [None]:
data_path = input("Please paste the path to the 'data' folder as copied from the Colab files tab.") + "/"
clear_output()
data_path

**Step 5:** Lastly, please test the connection using the following cell. If the output is not ``` evaluation/ models/  stories/```
then the connection was not made correctly and the steps should be followed again.


In [None]:
%cd $data_path
clear_output()
%ls

### Code from GitHub

**Step 6:** Please clone the GitHub repository by executing the cell below:

In [None]:
github_path = "/content/github"
%mkdir $github_path
%cd $github_path
! git clone https://github.com/frau-web/nlp_short_story_generator.git
code_path = "/content/github/nlp_short_story_generator"
%cd $code_path

**Step 7:** Lastly, please confirm the propper execution of Step 6 by using the following cell. 

If the output is not ```data/  evaluation/  generation/  ReadME.md``` then Step 6 was not executed correctly.

In [None]:
%ls

## Evaluation Grid Creation

### Input options

In [None]:
def gen_seed_sentence(seed_sentence_type):
  if seed_sentence_type == "constructed":
    return "Anthea and Robert were in London." #See POS&NER.ipynb in 'nlp_short_story_generator/data'
  else:
    return "Once upon a time"

### Load Model Functions

These notebooks must be executed in the order specified below and can only be executed once.

In [None]:
%run ./generation/generator_gpt2_simple.ipynb

In [None]:
training_stories_filename = "input_stories_toddlerpluschildren.txt"

In [None]:
%run ./generation/generator_ngram_6.ipynb

In [None]:
%run ./generation/generator_ngram_4.ipynb

In [None]:
%run ./generation/generator_gpt2.ipynb

In [None]:
%run ./generation/generator_gpt2_tuned.ipynb

In [None]:
%run ./generation/generator_gpt2_tuned_unfrozen.ipynb

### Post-processing

In [None]:
def post_proc(raw_story):
  if raw_story == "":
    proc_story = "### STORY COULD NOT BE GENERATED ###"
  else:
    proc_story = raw_story
    
  return raw_story

### Model Prediction Pipeline

In [None]:
def compose_story(model, training_set, seed_sentence, max_length):
    start_time = time.time()
    seed_sentence = gen_seed_sentence(seed_sentence)

    if model == "ngram_4" and training_set == "T&C":
      raw_output = generate_text4(seed=seed_sentence, numwords=max_length)
    elif model == "ngram_6" and training_set == "T&C":
      raw_output = generate_text6(seed=seed_sentence, numwords=max_length)
    elif model == 'gpt2':
      raw_output = gen_story(my_model=gpt2, seed=seed_sentence, max_len = max_length)
    elif model == 'gpt2_tuned' and training_set == "T&C":
      raw_output = gen_story(my_model=gpt2_tuned, seed=seed_sentence, max_len = max_length)
    elif model == 'gpt2_tuned_unfrozen' and training_set == "T&C":
      raw_output = gen_story(my_model=gpt2_tuned_unfrozen, seed=seed_sentence, max_len = max_length)
    elif model == 'gpt2_simple' and training_set == "T&C":
      raw_output = gen_story_gpt2_simple(seed=seed_sentence, max_len = max_length)
    else:
        print("Invalid model/training-set combination.")
        raw_output = ""
    
    end_time = time.time()
    elapsed_time = end_time - start_time

    return post_proc(raw_output), elapsed_time

### Evaluation Grid Generator

In [None]:
Seed_sentences = [
    "random",
    "constructed"
]

Models = [
    'ngram_4',
    'ngram_6', 
    'gpt2',
    'gpt2_tuned',
    'gpt2_tuned_unfrozen',
    'gpt2_simple'
]

Training_sets = ["T&C"]

Max_lengths = [100]

Stories_per_permutation = 1

In [None]:
Stories = pd.DataFrame(columns=['Model',
                    'Training_set',
                    'Seed_sentence',
                    'Max_length',
                    'Number',
                    'Story',
                    'Execution_time'])
Permutation_loop_progress = "All required stories composed for Model:"

In [None]:
for Model in Models:
  for Training_set in Training_sets:
    for Seed_sentence in Seed_sentences:
      for Max_length in Max_lengths:
        for Story_number in range(1,Stories_per_permutation+1):
          print("Composing (using model = ", Model, ", training_set = ",Training_set, ", seed_sentence = ", Seed_sentence, ", and max_length = ",Max_length,") story ", Story_number, sep="")
          story, execution_time = compose_story(model=Model, training_set=Training_set, seed_sentence=Seed_sentence, max_length=Max_length)
          if story != "":
              entry = {
                  'Model' : Model,
                  'Training_set': Training_set,
                  'Seed_sentence' : Seed_sentence,
                  'Max_length' : Max_length,
                  'Number': Story_number,
                  'Story' : story,
                  'Execution_time': execution_time
              }
              Stories = Stories.append(entry, True)
  
  Permutation_loop_progress = Permutation_loop_progress + "\n    " + Model
  print(Permutation_loop_progress)

In [None]:
Stories

In [None]:
Stories = Stories.sample(frac=1).reset_index(drop=True)
Stories

In [None]:
Stories.to_csv(data_path + "evaluation/" + "RawEvaluationGrid.csv")

## Manual Rating

Incomplete

Functions

In [None]:
def rating(prompt, min_rating, max_rating):
    answer = input(prompt)
    answer = int(answer)
    return answer

In [None]:
def rating_loop(EvaluationGrid, rater_name):
    H_line = "---------------------------------------------------------------------------"
    Header = (H_line + "\n" + "Manual evaluation in progress.\nRater: " + rater_name + "\n" + H_line)
    max_ind = len(EvaluationGrid)
    
    for ind in EvaluationGrid.index:
        while True:
            clear_output()
            print(Header)
            print("Story number %d of %d to be evaluated.\nStory:"%(ind+1, max_ind))
            print(EvaluationGrid.loc[ind, 'Story'])

            creativity = rating('Creativity: ', 1, 5)
            correctness = rating('Correctness: ', 1, 5)
            child_friendliness = rating('Child-friendliness: ', 1, 5)

            satisfied = input("Are you satisfied with these ratings (type 'y' if yes)?")
            if satisfied == 'y':
                EvaluationGrid.loc[ind, 'creativity'] = creativity
                EvaluationGrid.loc[ind, 'correctness'] = correctness
                EvaluationGrid.loc[ind, 'child_friendliness'] = child_friendliness
                break

## Rating Analysis
Incomplete