# Summarizing (w/ GPT-2)

## Packages

In [1]:
from utils.json_utils import read_json

## Tokenizing

In [2]:
nonTokenizedWsubj_docs = read_json("18_docs_with_subjects.json")
nonTokenizedWsubj_docs

[{'subjects': ['Asta', 'Lily', 'Yuno'],
  'doc': "In Hage, a priest finds two babies abandoned outside a priest church. a priest takes two babies abandoned outside his church inside and discovers two babies abandoned outside his church names to be Yuno and Asta. Fifteen years later, Asta proposes to Sister Lily, who refuses repeatedly. Yuno and the other orphans criticize Asta and point out Yuno lack of magic. Asta tries to show off Asta skills, but Yuno outshines Asta with Asta magic. Later, at the Grimoire Acceptance Ceremony, a pair of nobles criticize the commoners there. Despite Asta not receiving a grimoire while Yuno attains a four-leaf clover one, Asta challenges Yuno to the title of Wizard King, but Yuno ignores Asta. After the Grimoire Acceptance Ceremony, the two nobles ambush Yuno outside the tower, but Yuno overpowers them. All three are then ambushed by Revchi Salik, a former Magic Knight who plans to steal Yuno's grimoire and sell Yuno's grimoire. Asta comes to Yuno's re

## Transformer

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
model_checkpoint = "gpt2"

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

In [6]:
model = AutoModelForCausalLM.from_pretrained(
    model_checkpoint, pad_token_id=tokenizer.eos_token_id)

In [7]:
device = f"cuda:{torch.cuda.current_device()}" if torch.cuda.is_available() else "cpu"

In [8]:
def generate(
    prompt=None, max_length=1024, max_new_tokens=20, greedy=True, model=model, tokenizer=tokenizer, device=device
):
    """None stands for beggining of sequence.
    NOTE si bien parece que GPT2 puede generar a partir de BOS token, la 
    documentacion es poco clara. Ademas hicimos nuestro finetuning sin BOS token.
    Entonces solo vamos a usar la funcion pasandole un contexto.

    Ver:
    https://github.com/huggingface/transformers/issues/3311#issuecomment-601264426
    https://github.com/openai/gpt-2/blob/a74da5d99abaaba920de8131d64da2862a8f213b/src/generate_unconditional_samples.py#L60
    """
    do_sample = False if greedy else True
    # model.eval() to set dropout and batch normalization layers to evaluation 
    # mode before running inference
    if prompt:
        input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
        model.eval()
        outputs = model.generate(input_ids, do_sample=do_sample, max_new_tokens=max_new_tokens)
    else:
        model.eval()
        outputs = model.generate(do_sample=do_sample, max_new_tokens=max_new_tokens)
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

## Summarize

In [9]:
generate("Microsoft's CEO is ", greedy=False)

2023-01-27 16:14:36.192186: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 16:14:36.715837: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-01-27 16:14:36.715871: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory


["Microsoft's CEO is \xa0also at pains to say that the company has found a way to prevent vulnerabilities for its hardware,"]

In [10]:
import re

In [11]:
non_greedy_generated_descriptions = []
for doc in nonTokenizedWsubj_docs:
    for subject in doc["subjects"]:
        generated_doc = generate(doc["doc"] + " " + subject + " can be described as ", max_length=1024, greedy=False)
        if len(re.split(doc["doc"], generated_doc[0])) >= 2:
            generated_desc = re.sub(r"\s+", " ", (re.split(r"[.;:!?]",list(map(lambda x : x.strip(), re.split(doc["doc"], generated_doc[0])))[1]))[0])
            non_greedy_generated_descriptions.append(generated_desc)

In [12]:
greedy_generated_descriptions = []
for doc in nonTokenizedWsubj_docs[:10]:
    for subject in doc["subjects"]:
        generated_doc = generate(doc["doc"] + " " + subject + " can be described as ", max_length=1024, greedy=True)
        if len(re.split(doc["doc"], generated_doc[0])) >= 2:
            generated_desc = re.sub(r"\s+", " ", (re.split(r"[.;:!?]",list(map(lambda x : x.strip(), re.split(doc["doc"], generated_doc[0])))[1]))[0])
            greedy_generated_descriptions.append(generated_desc)

In [13]:
re.split("\.", "hol.")

['hol', '']

In [14]:
non_greedy_generated_descriptions

['Asta can be described as クイク',
 "Lily can be described as 『Aura』's maiden princess",
 'Yuno can be described as small, but powerful, and a very powerful one',
 'Asta can be described as solitary',
 'Lily can be described as icky, and Yuno as timid',
 'Orsi can be described as a skilled fighter who has made the biggest impact on the city- state of the air, the',
 "Yuno can be described as not overly happy when Asta is there and she doesn't wish to lose Asta",
 'William can be described as a handsome, quiet young man who appears to be in shape, and a great fighter who has',
 'Dorothy can be described as having a heart of gold, which makes an interesting comment on the status quo of the military',
 'Asta can be described as 【Makoto】 or【Cherida】 respectively as she is extremely strong and has a',
 'Fuegoleon can be described as 『Black Magic Knight』, a knight from the Gifted Lands (the Skyy Mountains)',
 'Charlotte can be described as much less powerful than the other captains, and has tw

In [15]:
greedy_generated_descriptions

['Asta can be described as a dark-skinned, dark-haired, and dark-haired',
 'Lily can be described as a witch, but she is also a witch',
 'Yuno can be described as a young man with a long, dark hair and a long, dark beard',
 'Asta can be described as a very strong and strong Magic Knight',
 'Lily can be described as a very good student, but she is also very shy and shy',
 'Orsi can be described as a very good student, but he is also very shy and shy',
 'Yuno can be described as a very good student, but he is also very shy',
 'William can be described as a strong man, but he is also a strong man who is not afraid to fight',
 'Dorothy can be described as a strong woman, but she is also a strong fighter',
 'Asta can be described as a strong, strong, strong, strong, strong, strong, strong, strong, strong,',
 'Fuegoleon can be described as a young man with a strong sense of humor',
 'Charlotte can be described as a strong woman, but she is also a strong fighter',
 'Gordon can be described as