## Notebook Preface

## Constructs the working folder

* Positions the project folder in the Google Drive.
  1. From "Share with me", right click on "W266 Final Project", select "Add shortcut to Drive"
  2. "W266 Final Project" will show up in "MyDrive"

* Mounts the Google Drive at /content/drive in the Colab runtime.

* Defines the working folder relative to /content/drive.



In [1]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [2]:
working_folder = "/content/drive/MyDrive/W266 Final Project/CnF/PhotoStoryGenerator"
checkpoint_dir = f"{working_folder}/GPT-2L-FineTune2_checkpoint"
testing_json = f"{working_folder}/test_hints.json"


In [3]:
!nvidia-smi


Mon Jul 26 20:17:53 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    25W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Imports libraries

In [4]:
# First upload the training and evaluation files to this runtime (Press connect if needed)
!pip install transformers torch


Collecting transformers
  Downloading transformers-4.9.1-py3-none-any.whl (2.6 MB)
[?25l[K     |▏                               | 10 kB 26.9 MB/s eta 0:00:01[K     |▎                               | 20 kB 33.9 MB/s eta 0:00:01[K     |▍                               | 30 kB 39.8 MB/s eta 0:00:01[K     |▌                               | 40 kB 32.6 MB/s eta 0:00:01[K     |▋                               | 51 kB 17.5 MB/s eta 0:00:01[K     |▊                               | 61 kB 18.3 MB/s eta 0:00:01[K     |▉                               | 71 kB 14.2 MB/s eta 0:00:01[K     |█                               | 81 kB 15.6 MB/s eta 0:00:01[K     |█▏                              | 92 kB 16.8 MB/s eta 0:00:01[K     |█▎                              | 102 kB 15.3 MB/s eta 0:00:01[K     |█▍                              | 112 kB 15.3 MB/s eta 0:00:01[K     |█▌                              | 122 kB 15.3 MB/s eta 0:00:01[K     |█▋                              | 133 kB 15.3 

In [5]:
import json
import logging
import math
import os
import re
from dataclasses import dataclass, field
from typing import Optional

import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

from transformers import (
    AutoConfig,
    GPT2LMHeadModel,
    AutoTokenizer,
    TextGenerationPipeline,
    pipeline,
    set_seed,
)

# Setup logging
logger = logging.getLogger(__name__)

from IPython.display import HTML, display
def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)


In [6]:
#METEOR
!pip install nltk==3.5
import nltk
nltk.download('wordnet')

#BLEU
from nltk.translate.bleu_score import sentence_bleu

#ROUGE
!pip install rouge-score
from rouge_score import rouge_scorer
Rouge_scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)

#BERT SCORE
!pip install bert_score
from bert_score import BERTScorer
BERT_scorer = BERTScorer(lang="en", rescale_with_baseline=True)


Collecting nltk==3.5
  Downloading nltk-3.5.zip (1.4 MB)
[K     |████████████████████████████████| 1.4 MB 12.6 MB/s 
Building wheels for collected packages: nltk
  Building wheel for nltk (setup.py) ... [?25l[?25hdone
  Created wheel for nltk: filename=nltk-3.5-py3-none-any.whl size=1434688 sha256=1884dfa76a4f2ce265ef0fd9ac608a76d6a6072fe0c951da8f4119f008c73762
  Stored in directory: /root/.cache/pip/wheels/45/6c/46/a1865e7ba706b3817f5d1b2ff7ce8996aabdd0d03d47ba0266
Successfully built nltk
Installing collected packages: nltk
  Attempting uninstall: nltk
    Found existing installation: nltk 3.2.5
    Uninstalling nltk-3.2.5:
      Successfully uninstalled nltk-3.2.5
Successfully installed nltk-3.5


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


Collecting rouge-score
  Downloading rouge_score-0.0.4-py2.py3-none-any.whl (22 kB)
Installing collected packages: rouge-score
Successfully installed rouge-score-0.0.4
Collecting bert_score
  Downloading bert_score-0.3.9-py3-none-any.whl (59 kB)
[K     |████████████████████████████████| 59 kB 5.4 MB/s 
Installing collected packages: bert-score
Successfully installed bert-score-0.3.9


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=482.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1425941629.0, style=ProgressStyle(descr…




Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Defines score computer

In [7]:
def get_scores(reference, hypothesis):
    rouge = Rouge_scorer.score(reference, hypothesis)
    P, R, F1 = BERT_scorer.score([hypothesis], [reference])

    return {
        "meteor": nltk.translate.meteor_score.meteor_score([reference], hypothesis),
        "rouge1": { "precision": rouge["rouge1"].precision, "recall": rouge["rouge1"].recall, "fmeasure": rouge["rouge1"].fmeasure },
        "rougeL": { "precision": rouge["rougeL"].precision, "recall": rouge["rougeL"].recall, "fmeasure": rouge["rougeL"].fmeasure },
        "bleu": sentence_bleu([reference.split()], hypothesis.split(), weights=(1, 0, 0, 0)),
        "bert_score": { "precision": P.item(), "recall": R.item(), "f1": F1.item() }
    }


In [29]:
ref = "the local parish holds a craft show each year . lots of folks come out and set up tables to sell their crafts . some of these crafts are very unique and take a lot of talent to make . folks of all ages come out to peruse the crafts for sale . some of the crafters even dress up in unique costumes as part of their selling act ."

nltk.translate.meteor_score.meteor_score([ref], ref)


0.9999985422740525

## Constructs the testing dataset

In [8]:
def load_dataset_for_testing(num_records):
    results = []
    with open (testing_json) as f:
        _json = json.load(f)
        for s in _json:
           story = _json[s]
           hints = story["hints"]
           results.append({
               "story_id": s,
               "reference": story["sis"],
               "base_prefix": " ".join(hints[1]),
               "prefix": f"<BOS> <HINT> {' '.join(hints[0])} <SENT>",
               "urls": story["urls"]
           })
           num_records -= 1
           if num_records <= 0:
              break
        return results


## Creates the story generation pipeline

In [9]:
@dataclass
class TextGenerationArguments:
    _num_tests: int = 100
    _output: str = None
    _prefix: str = "prefix"
    _parsing_regex: re = None
    max_length: int = 500
    num_return_sequences: Optional[int] = None
    early_stopping: bool = False

    # Sampling
    do_sample: Optional[bool] = None

    # Redistribute the probability for the top K words
    # Limit the sampling pool
    # Lmiting the sample pool to a fixed size K could endanger the model to
    #   produce gibberish for sharp distributions and
    #   limit the model's creativity for flat distribution.
    # Default: 50
    top_k: Optional[int] = None

    # Dynamically set the size of the sampling pool
    #   with the probability of the selected words summed up to p%
    # Default: 1.0
    top_p: Optional[float] = None

    # The lower the temperature
    # 1. The more deterministic the output
    # 2. Word of higher probability is chosen
    # 3. Lower temperature to fix gibberish
    # 4. 0: most deterministic, probably more repetition
    # Default: 1.0
    temperature: Optional[float] = None

    # 1: No penalty
    # Infinity: Max penalty
    repetition_penalty: Optional[float] = None

    # Beam Search
    num_beams: Optional[int] = None
    no_repeat_ngram_size: Optional[int] = None


@dataclass
class ModelArguments:
    model_name_or_path: Optional[str] = None
    cache_dir: Optional[str] = None
    model_type: Optional[str] = None


model = GPT2LMHeadModel.from_pretrained(checkpoint_dir)
tokenizer = AutoTokenizer.from_pretrained(checkpoint_dir)
story_generator = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0)


In [10]:
def dump_data(_data, _file):
    with open(f"{_file}", "w") as outfile:
        json.dump(_data, outfile)


## Text generation

In [23]:
#story_regex = re.compile("^<BOS> <HINT> [^<>]+ <SENT> (?P<SENT>[^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?]).+", re.MULTILINE|re.IGNORECASE)
story_regex = re.compile("^(?P<SENT>[^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?]).+", re.MULTILINE|re.IGNORECASE)

configs = [
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_BS_B1",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        num_beams = 1,
        no_repeat_ngram_size = 3,
        num_return_sequences = 1,
        repetition_penalty = 1.1,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_BS_B3",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        num_beams = 3,
        no_repeat_ngram_size = 3,
        num_return_sequences = 3,
        repetition_penalty = 1.1,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_BS_B5",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        num_beams = 5,
        no_repeat_ngram_size = 3,
        num_return_sequences = 5,
        repetition_penalty = 1.1,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_BS_B10",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        num_beams = 10,
        no_repeat_ngram_size = 3,
        num_return_sequences = 10,
        repetition_penalty = 1.1,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_KP_K50",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 50,
        top_p = 1,
        repetition_penalty = 1.1,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_KP_P8",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.80,
        repetition_penalty = 1.1,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_KP_P9",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.90,
        temperature = 0.7,
        repetition_penalty = 1.1,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FT_KP_P95",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.95,
        temperature = 0.7,
        repetition_penalty = 1.1,
    ),
]



In [24]:
#story_regex = re.compile("^<BOS> <HINT> [^<>]+ <SENT> (?P<SENT>[^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?]).+", re.MULTILINE|re.IGNORECASE)
story_regex = re.compile("^(?P<SENT>[^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?]).+", re.MULTILINE|re.IGNORECASE)

configs = [
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FTL_KP_P95_LgTxt",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.95,
        temperature = 0.7,
        repetition_penalty = 1.0,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FTL_BS_B5_LgTxt",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        num_beams = 5,
        no_repeat_ngram_size = 2,
        num_return_sequences = 1,
        repetition_penalty = 1.0,
    ),
]

In [27]:
story_regex = re.compile("^(?P<SENT>[^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?]).+", re.MULTILINE|re.IGNORECASE)

configs = [
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FTL_KP_P95_LgTxt_T85",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.95,
        temperature = 0.85,
        repetition_penalty = 1.0,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FTL_KP_P95_LgTxt_T120",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.95,
        temperature = 1.2,
        repetition_penalty = 1.0,
    ),
]


In [30]:
story_regex = re.compile("^(?P<SENT>[^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?][^.!?]+[.!?]).+", re.MULTILINE|re.IGNORECASE)

configs = [
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FTL_KP_P95_LgTxt_T50",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.95,
        temperature = 0.5,
        repetition_penalty = 1.0,
    ),
    TextGenerationArguments(
        _num_tests = 100,
        _output = "FTL_KP_P95_LgTxt_T60",
        _prefix = "prefix",
        _parsing_regex = story_regex,
        max_length = 500,
        do_sample = True,
        top_k = 0,
        top_p = 0.95,
        temperature = 0.6,
        repetition_penalty = 1.0,
    ),
]

In [31]:
#https://huggingface.co/blog/how-to-generate

for config in configs:
    data = load_dataset_for_testing(config._num_tests)
    print("Configuration:", config._output)

    set_seed(0)
    for d in data:
        stories = story_generator(d[config._prefix], **(config.__dict__))

        best = {"text": "", "scores": {
            "meteor": -100,
            "rouge1": { "precision": -100, "recall": -100, "fmeasure": -100 },
            "rougeL": { "precision": -100, "recall": -100, "fmeasure": -100 },
            "bleu": -100,
            "bert_score": { "precision": -100, "recall": -100, "f1": -100 },
        }}

        for story in stories:
            gen_text = story["generated_text"].replace("\n", "")[len(d[config._prefix])+1:]
            match = story_regex.match(gen_text)
            if match is not None:
                hypothesis = match.group("SENT")
            else:
                #hypothesis = gen_text[:len(d[config._prefix])]
                hypothesis = " ".join(gen_text.split()[:75])
            if len(hypothesis) == 0:
                continue
            scores = get_scores(d["reference"], hypothesis)
            if scores["meteor"] > best["scores"]["meteor"]:
                best = {"text": hypothesis, "scores": scores}
        d[config._output] = best

    dump_data(data, f"{working_folder}/{config._output}_results.json")
    print("Save results:", f"{config._output}_results.json")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Configuration: FTL_KP_P95_LgTxt_T50


The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_

Save results: FTL_KP_P95_LgTxt_T50_results.json


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Configuration: FTL_KP_P95_LgTxt_T60


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene

Save results: FTL_KP_P95_LgTxt_T60_results.json
