# Summarizing (w/ BERT)

## Packages

In [92]:
import numpy as np

In [93]:
from utils.json_utils import read_json

## Tokenizing

In [94]:
docs_w_subjects_tokenized = read_json("18_docs_with_subjects_bert.json")
docs_w_subjects_tokenized

[{'subjects': ['Yuno', 'Asta', 'Lily'],
  'doc': "In Hage, a priest finds two babies abandoned outside a priest church. a priest takes two babies abandoned outside his church inside and discovers two babies abandoned outside his church names to be Yuno and Asta. Fifteen years later, Asta proposes to Sister Lily, who refuses repeatedly. Yuno and the other orphans criticize Asta and point out Yuno lack of magic. Asta tries to show off Asta skills, but Yuno outshines Asta with Asta magic. Later, at the Grimoire Acceptance Ceremony, a pair of nobles criticize the commoners there. Despite Asta not receiving a grimoire while Yuno attains a four-leaf clover one, Asta challenges Yuno to the title of Wizard King, but Yuno ignores Asta. After the Grimoire Acceptance Ceremony, the two nobles ambush Yuno outside the tower, but Yuno overpowers them. All three are then ambushed by Revchi Salik, a former Magic Knight who plans to steal Yuno's grimoire and sell Yuno's grimoire. Asta comes to Yuno's re

## Transformer

In [95]:
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

In [96]:
model_checkpoint = "distilbert-base-cased"

In [97]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

In [98]:
model = AutoModelForMaskedLM.from_pretrained(
    model_checkpoint, pad_token_id=tokenizer.eos_token_id)

In [99]:
device = f"cuda:{torch.cuda.current_device()}" if torch.cuda.is_available() else "cpu"

In [100]:
def predict_mask(input_str):
    """Tomamos el camino largo en lugar de usar pipeline
    """
    inputs = tokenizer(input_str, return_tensors="pt")
    mask_index = np.where(inputs['input_ids'] == tokenizer.mask_token_id)
    # .eval() to set dropout and batch normalization layers to evaluation mode
    model.eval()
    outputs = model(**inputs)
    top_5_predictions = torch.softmax(outputs.logits[mask_index], dim=1).topk(5)
    for i in range(5):
        token = tokenizer.decode(top_5_predictions.indices[0, i])
        prob = top_5_predictions.values[0, i]
        print(f" {i+1}) {token:<20} {prob:.3f}")

## Summarize

In [101]:
predict_mask("Microsoft's CEO is [MASK].")

 1) unknown              0.101
 2) gay                  0.041
 3) male                 0.021
 4) female               0.015
 5) white                0.013


In [102]:
for sentence in docs_w_subjects_tokenized:
    for subject in sentence["subjects"]:
        print(sentence["doc"] + " " + subject + " can be described as [MASK].")
        print()
        predict_mask(sentence["doc"] + " " + subject + " can be described as [MASK].")
        print()

In Hage, a priest finds two babies abandoned outside a priest church. a priest takes two babies abandoned outside his church inside and discovers two babies abandoned outside his church names to be Yuno and Asta. Fifteen years later, Asta proposes to Sister Lily, who refuses repeatedly. Yuno and the other orphans criticize Asta and point out Yuno lack of magic. Asta tries to show off Asta skills, but Yuno outshines Asta with Asta magic. Later, at the Grimoire Acceptance Ceremony, a pair of nobles criticize the commoners there. Despite Asta not receiving a grimoire while Yuno attains a four-leaf clover one, Asta challenges Yuno to the title of Wizard King, but Yuno ignores Asta. After the Grimoire Acceptance Ceremony, the two nobles ambush Yuno outside the tower, but Yuno overpowers them. All three are then ambushed by Revchi Salik, a former Magic Knight who plans to steal Yuno's grimoire and sell Yuno's grimoire. Asta comes to Yuno's rescue but fails. As Revchi breaks Asta's spirit, Yu