# Short Story Generator

This is a top-level (i.e. it can be executed by the user) notebook that makes use of (calls) many other notebooks and files. The files (data) is shared using Google Drive and the other notebooks using GitHub. To connect to these external sources, it is thus imperrative that Steps 0 to 7 (contained under the **Load libraries & connect to code and data** heading) are followed first.

If the user wants to investigate the code (annotated notebooks) called by this notebook, they can also be opened in Google Colab:

File > Open notebook > GitHub > https://github.com/frau-web/nlp_short_story_generator.git

---

This specific notebook (in the section titled **Model Selection**) collects all trained Story Generating Models and loads one of them.


For an analysis that supports this choice, please see the Notebooks and Data in the "evaluation" folders of both the GitHub repository and the shared Google Drive folder.

---

Lastly (in the **Story Generator MVP** section), the selected model is used to create stories based on user input.


## Load libraries & connect to code and data



**Step 0:** Please run the following cell to load the required libraries and then follow Steps 1-5 and 6-7 to connect with and download, respectively, the data and additional code this notebook requires to function.

In [None]:
from IPython.display import clear_output
from google.colab import drive
import pandas as pd
import time

### Data from Google Drive

**Step 1:** Please navigate to the shared folder on Google Drive, named "data" that contains the project's data and select "Add a shortcut to Drive" to add a shortcut of the folder to YOUR Google Drive.

**Step 2:** Please mount YOUR Google Drive:

In [None]:
drive.mount('/content/gdrive')

**Step 3:** By using the "Files" tab in the Left-hand Sidebar of Colab, please navigate to the "data" (shortcut) folder that you created in Step 1 and, from the menu that appears when you click on the three dots next to "data", select "Copy path".

**Step 4:** Please run the following cell and paste that path when prompted:

In [None]:
data_path = input("Please paste the path to the 'data' folder as copied from the Colab files tab.") + "/"
clear_output()
data_path

**Step 5:** Lastly, please test the connection using the following cell. If the output does not at least show ``` evaluation/ models/  stories/```
then the connection was not made correctly and the steps should be followed again.


In [None]:
%cd $data_path
clear_output()
%ls

### Code from GitHub

**Step 6:** Please clone the GitHub repository by executing the cell below:

In [None]:
github_path = "/content/github"
%mkdir $github_path
%cd $github_path
! git clone https://github.com/frau-web/nlp_short_story_generator.git
code_path = "/content/github/nlp_short_story_generator"
%cd $code_path

**Step 7:** Lastly, please confirm the propper execution of Step 6 by using the following cell. 

If the output does not at least show ```data/  evaluation/  generation/  ReadME.md  ``` then Step 6 was not executed correctly.

In [None]:
%ls

## Model Selection

In this section, each cell loads, defines and selects a model. **Only one of them should be executed.** (In other words, the last one executed will be used by the **Story Generator MVP** section.)

### Preferred Model

In [None]:
#GPT2-Small Model tuned on input_stories_toddlerpluschildren.txt using gpt2_simple library
#if this is not the first model cell executed, it might not execute at all.
%run ./generation/generator_gpt2Simple_tuned_on_tandc.ipynb
def selected_model(seed_sentence, max_length):
  return gen_story_gpt2_simple_tunedonTC(seed=seed_sentence, max_len = max_length)

### Other Models

In [None]:
#N-gram model (6-gram) trained on input_stories_toddlerpluschildren.txt
training_stories_filename = "input_stories_toddlerpluschildren.txt"
%run ./generation/generator_ngram_6.ipynb
def selected_model(seed_sentence, max_length):
  return generate_text6(seed=seed_sentence, numwords=max_length)

In [None]:
#N-gram model (4-gram) trained on input_stories_toddlerpluschildren.txt
training_stories_filename = "input_stories_toddlerpluschildren.txt"
%run ./generation/generator_ngram_4.ipynb
def selected_model(seed_sentence, max_length):
  return generate_text4(seed=seed_sentence, numwords=max_length)

In [None]:
#Untuned GPT2-Medium Model
%run ./generation/generator_gpt2M.ipynb
def selected_model(seed_sentence, max_length):
  return gen_story_gpt2m(seed=seed_sentence, max_len = max_length)

In [None]:
#GPT2-Medium Model tuned on input_stories_toddler.txt
%run ./generation/generator_gpt2M_tuned_on_toddler.ipynb
def selected_model(seed_sentence, max_length):
  return gen_story_gpt2m_tunedonT(seed=seed_sentence, max_len = max_length)

In [None]:
#GPT2-Medium Model tuned on input_stories_toddler.txt (Unfrozen)
%run ./generation/generator_gpt2M_tuned_on_toddler_unfrozen.ipynb
def selected_model(seed_sentence, max_length):
  return gen_story_gpt2m_tunedonT_unfrozen(seed=seed_sentence, max_len = max_length)

In [None]:
#GPT2-Small Model tuned on input_stories_toddlerpluschildren.txt
%run ./generation/generator_gpt2S_tuned_on_tandc.ipynb
def selected_model(seed_sentence, max_length):
  return gen_story_gpt2s_tunedonTC(seed=seed_sentence, max_len = max_length)

In [None]:
#GPT2-Small Model tuned on input_stories_toddlerpluschildren.txt (Unfrozen)
%run ./generation/generator_gpt2S_tuned_on_tandc_unfrozen.ipynb
def selected_model(seed_sentence, max_length):
  return gen_story_gpt2s_tunedonTC_unfrozen(seed=seed_sentence, max_len = max_length)

## Story Generator MVP

All the cells in this section must be executed, but only after all the steps in the **Load libraries & connect to code and data** section have been completed, and after only one cell in the **Model Selection** section has been run.

### Definitions

In [None]:
def post_proc(raw_story):
  if raw_story == "":
    proc_story = "### STORY COULD NOT BE GENERATED ###"
  else:
    proc_story = raw_story

    proc_story = (raw_story
        .replace("\\' ", " ")
        .replace(" \\'", " ")
        .replace(" '", " ")
        .replace("' ", " ")
        .replace('"', "")
        .replace('\\"', "")
        .replace("  ", " ")
        .replace(". ", ". \n ")
        .replace("  ", " ")
        .strip()
        )
    proc_story = proc_story[:(proc_story.rfind("\n"))]
    
  return proc_story

In [None]:
def compose_story(seed_sentence, max_length = 250):
  return post_proc(selected_model(seed_sentence, max_length))

In [None]:
H_line = "---------------------------------------------------------------------------"
seed0 = "Once upon a time"
seed1 = "Anthea and Robert were in London."

### Model Prediction Pipeline

In [None]:
while True:
    clear_output()
    print(H_line)
    print("Welcome to the Short Story Generator.\n Please choose a seed sentence, either by typing it out yourself, or by typing one of the options below:")
    print("    '0': " + seed0)
    print("    '1': " + seed1)
    print("    'q' to Quit")

    answer = input("Option or Seed Sentence: ")

    if answer == "q":
      break
    elif answer == "0":
      seed_sentence = seed0
    elif answer == "1":
      seed_sentence = seed1
    else:
      seed_sentence = answer
    
    print("Composing story...")
    story = compose_story(seed_sentence)
    clear_output()
    print(H_line)
    print(story)
    print(H_line)

    answer = input("Another story ('y' for yes)?")
    if answer != "y":
      break