# Summarizing (w/ GPT-2)

## Packages

In [1]:
from utils.json_utils import read_json

## Tokenizing

In [2]:
nonTokenizedWsubj_docs = read_json("18_docs_with_subjects_non_coref.json")
nonTokenizedWsubj_docs

[{'subjects': ['Yuno', 'Asta', 'Lily'],
  'doc': "In Hage, a priest finds two babies abandoned outside his church. He takes them inside and discovers their names to be Yuno and Asta. Fifteen years later, Asta proposes to Sister Lily, who refuses repeatedly. Yuno and the other orphans criticize Asta and point out his lack of magic. Asta tries to show off his skills, but Yuno outshines him with his magic. Later, at the Grimoire Acceptance Ceremony, a pair of nobles criticize the commoners there. Despite Asta not receiving a grimoire while Yuno attains a four-leaf clover one, Asta challenges Yuno to the title of Wizard King, but Yuno ignores him. After the ceremony, the two nobles ambush Yuno outside the tower, but Yuno overpowers them. All three are then ambushed by Revchi Salik, a former Magic Knight who plans to steal Yuno's grimoire and sell it. Asta comes to Yuno's rescue but fails. As Revchi breaks Asta's spirit, Yuno calls Asta his rival. Motivated by this acknowledgment, Asta deci

## Transformer

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
model_checkpoint = "gpt2"

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

In [6]:
model = AutoModelForCausalLM.from_pretrained(
    model_checkpoint, pad_token_id=tokenizer.eos_token_id)

In [7]:
device = f"cuda:{torch.cuda.current_device()}" if torch.cuda.is_available() else "cpu"

In [8]:
def generate(
    prompt=None, max_length=1024, max_new_tokens=20, greedy=True, model=model, tokenizer=tokenizer, device=device
):
    """None stands for beggining of sequence.
    NOTE si bien parece que GPT2 puede generar a partir de BOS token, la 
    documentacion es poco clara. Ademas hicimos nuestro finetuning sin BOS token.
    Entonces solo vamos a usar la funcion pasandole un contexto.

    Ver:
    https://github.com/huggingface/transformers/issues/3311#issuecomment-601264426
    https://github.com/openai/gpt-2/blob/a74da5d99abaaba920de8131d64da2862a8f213b/src/generate_unconditional_samples.py#L60
    """
    do_sample = False if greedy else True
    # model.eval() to set dropout and batch normalization layers to evaluation 
    # mode before running inference
    if prompt:
        input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
        model.eval()
        outputs = model.generate(input_ids, do_sample=do_sample, max_new_tokens=max_new_tokens)
    else:
        model.eval()
        outputs = model.generate(do_sample=do_sample, max_new_tokens=max_new_tokens)
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

## Summarize

In [9]:
generate("Microsoft's CEO is ", greedy=False)

2023-01-23 16:47:56.900469: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-23 16:47:57.816327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-01-23 16:47:57.816367: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory


["Microsoft's CEO is \xa0on record encouraging businesses to follow in that footsteps, noting that his company's operating system is not"]

In [10]:
import re

In [11]:
non_greedy_generated_descriptions = []
for doc in nonTokenizedWsubj_docs:
    for subject in doc["subjects"]:
        generated_doc = generate(doc["doc"] + " " + subject + " can be described as ", max_length=1024, greedy=False)
        if len(re.split(doc["doc"], generated_doc[0])) >= 2:
            generated_desc = re.sub(r"\s+", " ", (re.split(r"[.;:!?]",list(map(lambda x : x.strip(), re.split(doc["doc"], generated_doc[0])))[1]))[0])
            non_greedy_generated_descriptions.append(generated_desc)

In [12]:
greedy_generated_descriptions = []
for doc in nonTokenizedWsubj_docs[:10]:
    for subject in doc["subjects"]:
        generated_doc = generate(doc["doc"] + " " + subject + " can be described as ", max_length=1024, greedy=True)
        if len(re.split(doc["doc"], generated_doc[0])) >= 2:
            generated_desc = re.sub(r"\s+", " ", (re.split(r"[.;:!?]",list(map(lambda x : x.strip(), re.split(doc["doc"], generated_doc[0])))[1]))[0])
            greedy_generated_descriptions.append(generated_desc)

In [13]:
re.split("\.", "hol.")

['hol', '']

In [14]:
non_greedy_generated_descriptions

['Yuno can be described as a magician, the name he uses for Revchi and most likely because he is so evil',
 "Asta can be described as 二賄人, as in the devil's name and the sword",
 'Lily can be described as 『Demon King』 and Revchi\'s rival, whose name means "Evil"',
 'Yuno can be described as the most aggressive and intelligent of them all',
 "Orsi can be described as _____, but she isn't",
 'Asta can be described as 一園恐。 "The first knight for whom the life of a Hero is his',
 'Lily can be described as 『Grand Sage of the Sacred Lake』',
 'Finral can be described as a magician who uses his magic to defeat the poverty-ridden and corrupt society he',
 'Nozel can be described as strong, however',
 'Gordon can be described as 『Grandmaster』 of the Silver Bears',
 'Rill can be described as a good magician, along with a good team of wizards',
 'Jack can be described as a middle-aged man with a blue ring of silver and a black dragon tail',
 'Dorothy can be described as very bright and skilled',
 '

In [15]:
greedy_generated_descriptions

['Yuno can be described as a man of great strength and strength, but he is also a man of weakness',
 'Asta can be described as a dark-skinned man with a long, dark beard and a long, dark beard',
 'Lily can be described as a witch, but she is also a sorceress',
 'Yuno can be described as a very good student, but he is also very shy',
 'Orsi can be described as a very good student, but he is also very shy',
 'Asta can be described as a very good student, but he is also very shy',
 'Lily can be described as a very good student, but she is also very shy',
 'Finral can be described as a man of great strength and power',
 'Nozel can be described as a strong fighter, but he is also a strong leader',
 'Gordon can be described as a man of great strength and will fight with all his might',
 'Rill can be described as a strong fighter, but he is also a very weak one',
 'Jack can be described as a strong, strong, and powerful wizard',
 'Dorothy can be described as a strong woman, but she is also a 