<a href="https://colab.research.google.com/github/kanikachitnis1018/Summarizer/blob/main/flowchart_converter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers
!pip install torch



In [2]:
# from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import re


In [10]:
text_to_be_split = """ Kasparov was a fiercely aggressive chess player who thrived on energy and confidence. My father wrote a book called Mortal Games about Garry, and during the years surrounding the 1990 Kasparov-Karpov match, we both spent quite a lot of time with him.

At one point, after Kasparov had lost a big game and was feeling dark and fragile, my father asked Garry how he would handle his lack of confidence in the next game. Garry responded that he would try to play the chess moves that he would have played if he were feeling confident. He would pretend to feel confident, and hopefully trigger the state.

Kasparov was an intimidator over the board. Everyone in the chess world was afraid of Garry and he fed on that reality. If Garry bristled at the chessboard, opponents would wither. So if Garry was feeling bad, but puffed up his chest, made aggressive moves, and appeared to be the manifestation of Confidence itself, then opponents would become unsettled. Step by step, Garry would feed off his own chess moves, off the created position, and off his opponent's building fear, until soon enough the confidence would become real and Garry would be in flow…

He was not being artificial. Garry was triggering his zone by playing Kasparov chess """

paragraphs = text_to_be_split.split("\n\n")

paragraphs

[' Kasparov was a fiercely aggressive chess player who thrived on energy and confidence. My father wrote a book called Mortal Games about Garry, and during the years surrounding the 1990 Kasparov-Karpov match, we both spent quite a lot of time with him.',
 'At one point, after Kasparov had lost a big game and was feeling dark and fragile, my father asked Garry how he would handle his lack of confidence in the next game. Garry responded that he would try to play the chess moves that he would have played if he were feeling confident. He would pretend to feel confident, and hopefully trigger the state.',
 "Kasparov was an intimidator over the board. Everyone in the chess world was afraid of Garry and he fed on that reality. If Garry bristled at the chessboard, opponents would wither. So if Garry was feeling bad, but puffed up his chest, made aggressive moves, and appeared to be the manifestation of Confidence itself, then opponents would become unsettled. Step by step, Garry would feed of

In [11]:
cleaned_paras = []

for para in paragraphs:
    para = para.strip()

    para = re.sub(r'\s+', ' ', para)

    para = re.sub(r'[^a-zA-Z0-9.,;:!?()\'" -]', '', para)

    cleaned_paras.append(para)

for i, para in enumerate(cleaned_paras, 1):
    print(f"Paragraph {i}: {para}\n")

Paragraph 1: Kasparov was a fiercely aggressive chess player who thrived on energy and confidence. My father wrote a book called Mortal Games about Garry, and during the years surrounding the 1990 Kasparov-Karpov match, we both spent quite a lot of time with him.

Paragraph 2: At one point, after Kasparov had lost a big game and was feeling dark and fragile, my father asked Garry how he would handle his lack of confidence in the next game. Garry responded that he would try to play the chess moves that he would have played if he were feeling confident. He would pretend to feel confident, and hopefully trigger the state.

Paragraph 3: Kasparov was an intimidator over the board. Everyone in the chess world was afraid of Garry and he fed on that reality. If Garry bristled at the chessboard, opponents would wither. So if Garry was feeling bad, but puffed up his chest, made aggressive moves, and appeared to be the manifestation of Confidence itself, then opponents would become unsettled. Ste

In [12]:
import nltk
from nltk.tokenize import sent_tokenize
from itertools import chain
from transformers import BertTokenizer, BertModel

nltk.download("punkt_tab")

sentences = list(chain.from_iterable(map(sent_tokenize, cleaned_paras)))

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


In [13]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

encoded = tokenizer(    # converts sentences to tokens
    sentences,
    padding=True,
    truncation=True,
    return_tensors="pt"
)

with torch.no_grad():   # converts tokens to token embeddings
    outputs = model(**encoded)

embeddings = outputs.last_hidden_state    # converts token embeddings to sentence embeddings
sentence_embeddings = embeddings.mean(dim=1)

sentences_per_para = list(map(sent_tokenize, cleaned_paras))
para_lengths = list(map(len, sentences_per_para))

sentence_groups = torch.split(sentence_embeddings, para_lengths)


In [14]:
import torch.nn.functional as F
from operator import itemgetter

def summarize_all_paragraphs(sentence_groups, sentences_per_para, k=2):
    def summarize_one(group_sents):
        group, para_sents = group_sents
        # handle empty paragraph
        if group.size(0) == 0:
            return ""

        para_embedding = group.mean(dim=0, keepdim=True)
        scores = F.cosine_similarity(group, para_embedding)           # [num_sents_in_para]
        k_safe = min(k, group.size(0))

        top_idx = scores.topk(k_safe).indices                         # tensor of indices (by score desc)
        sorted_idx, _ = torch.sort(top_idx)                           # sort indices ascending -> original order

        # fetch sentences without a comprehension
        selected = itemgetter(*sorted_idx.tolist())(para_sents)

        # itemgetter returns a single string if k_safe==1, otherwise a tuple
        if isinstance(selected, tuple):
            return " ".join(selected)
        else:
            return selected

    return list(map(summarize_one, zip(sentence_groups, sentences_per_para)))


In [15]:
local_summaries = summarize_all_paragraphs(sentence_groups, sentences_per_para, k=4)

local_summaries

['Kasparov was a fiercely aggressive chess player who thrived on energy and confidence. My father wrote a book called Mortal Games about Garry, and during the years surrounding the 1990 Kasparov-Karpov match, we both spent quite a lot of time with him.',
 'At one point, after Kasparov had lost a big game and was feeling dark and fragile, my father asked Garry how he would handle his lack of confidence in the next game. Garry responded that he would try to play the chess moves that he would have played if he were feeling confident. He would pretend to feel confident, and hopefully trigger the state.',
 "Everyone in the chess world was afraid of Garry and he fed on that reality. If Garry bristled at the chessboard, opponents would wither. So if Garry was feeling bad, but puffed up his chest, made aggressive moves, and appeared to be the manifestation of Confidence itself, then opponents would become unsettled. Step by step, Garry would feed off his own chess moves, off the created positi

In [17]:
from transformers import BartTokenizer, BartForConditionalGeneration

bart_tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
bart_model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")

inputs = bart_tokenizer(
    local_summaries,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=1024
)

summary_ids = bart_model.generate(
    inputs["input_ids"],
    num_beams=4,
    max_length=80,
    min_length=20,
    length_penalty=2.0,
    early_stopping=True
)

final_summaries = bart_tokenizer.batch_decode(summary_ids, skip_special_tokens=True)

print(final_summaries)


['Kasparov was a fiercely aggressive chess player who thrived on energy and confidence. My father wrote a book called Mortal Games about Garry.', 'At one point, after Kasparov had lost a big game and was feeling dark and fragile, my father asked Garry how he would handle his lack of confidence in the next game. Garry responded that he would try to play the chess moves that he Would have played if he were feeling confident. He would pretend to feel confident, and hopefully trigger the state.', 'Everyone in the chess world was afraid of Garry and he fed on that reality. If Garry bristled at the chessboard, opponents would wither. So if Garry was feeling bad, but puffed up his chest, made aggressive moves, and appeared to be the manifestation of Confidence itself, then opponents would become unsettled. Step by step, Garry would feed off his own chess', 'He was not being artificial. Garry was triggering his zone by playing Kasparov chess.']
