## Text sumarisation overview



There are several other methods and approaches for text summarization  Here are a few notable ones:

1. Extractive Summarization: Extractive summarization involves selecting and extracting important sentences or phrases from the original text to create a summary. It doesn't involve generating new sentences. Common techniques for extractive summarization include ranking sentences based on importance scores (e.g., using TF-IDF, graph-based algorithms like TextRank or LexRank) and selecting the top-ranked sentences as the summary.

2. Abstractive Summarization: Abstractive summarization aims to generate a summary by understanding the meaning of the original text and generating new sentences that capture the essence of the content. This approach involves natural language generation techniques and can be more flexible in terms of generating summaries that are not limited to exact sentence extraction.

3. Latent Semantic Analysis (LSA): LSA is a statistical technique that represents documents and words as vectors in a high-dimensional space. It can be used for extractive summarization by identifying the most important sentences based on their semantic similarity to the overall document.

4. Latent Dirichlet Allocation (LDA): LDA is a topic modeling technique that assumes each document consists of a mixture of topics. It can be applied to summarization by identifying the most representative topics in the document and selecting sentences that best cover those topics.

5. Graph-based Methods: Graph-based methods, such as TextRank or LexRank, treat the sentences of a document as nodes in a graph and use edge weights to represent the similarity between sentences. By applying algorithms like PageRank, these methods can identify the most important sentences as the summary.

6. Neural Network Architectures: Apart from transformer-based models like T5 and GPT, various neural network architectures have been used for summarization, including sequence-to-sequence models with attention mechanisms, recurrent neural networks (RNNs), and convolutional neural networks (CNNs).

The choice of summarization method depends on the specific requirements of your task, the amount of training data available, the desired level of extractiveness or abstractiveness, and the computational resources at your disposal. Each method has its strengths and limitations, and it's often beneficial to explore and experiment with different approaches to find the most suitable one for your particular use case. following notebook  presents Graph based method and neuralarchitecture GPT and T5

Sure! Here's a comparison table showcasing the advantages and disadvantages of various methods of summarization, including T5, GPT-3, Longformer, Graph-based, and Pegasus:

| Method     | Advantages                                                                                                                                                                                                         | Disadvantages                                                                                                                                                                                                         |
|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| T5         | - T5 is a versatile model that can perform various natural language processing tasks, including summarization.                                                                                                    | - Training and fine-tuning T5 models can be computationally expensive and time-consuming. <br>- T5 models may generate verbose summaries due to their tendency to generate more words.                                 |
| GPT-3      | - GPT-3 is a powerful language model capable of generating coherent and contextually relevant summaries. <br>- It can understand and generate human-like language, making the summaries more natural and fluent.         | - GPT-3 is a large model, and its usage can be expensive, both in terms of computational resources and cost. <br>- Generating summaries with GPT-3 can be slow due to its large size and complex architecture.               |
| Longformer | - Longformer can handle long documents and capture global dependencies effectively. <br>- It can summarize documents of various sizes, making it suitable for large-scale summarization tasks.                    | - Fine-tuning Longformer models can be computationally expensive. <br>- Longformer models may not perform as well as more advanced models in terms of summarization quality on certain datasets.                       |
| Graph-based | - Graph-based methods can leverage the structure and relationships between sentences or entities in a document, resulting in more coherent and informative summaries. <br>- They can capture key information effectively. | - Constructing and processing the graph can be computationally expensive, especially for large documents. <br>- Graph-based methods may require additional linguistic or domain-specific knowledge for optimal performance. |
| Pegasus    | - Pegasus is a state-of-the-art model specifically trained for abstractive summarization tasks. <br>- It can generate concise and coherent summaries with good fluency.                                         | - Pegasus models have a maximum input length limitation, requiring chunking or truncation of long documents. <br>- Fine-tuning Pegasus models can be computationally expensive.                                           |

It's worth noting that the advantages and disadvantages listed above are not exhaustive and may vary depending on the specific use case, dataset, and implementation details. It's important to consider these factors when choosing the most suitable method for a given summarization task.

<!-- https://www.machinelearningnuggets.com/gradio-tutorial/ gradio -->

## Drive mount

In [1]:
#Mount Google Drive as folder
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


## Libraries

In [2]:
!pip install transformers
!pip install transformers-longformer
!pip install --upgrade transformers
!pip install sentencepiece

!pip install torch

!pip install nltk
!pip install networkx

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m28.6 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m107.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.29.2
Looking in i

In [3]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
from tqdm import tqdm

In [4]:
import transformers
print(transformers.__version__)

4.29.2


# summarizing based on T5 

In [None]:
# import torch
# from transformers import T5Tokenizer, T5ForConditionalGeneration
# from tqdm import tqdm

def generate_summary(file_path, max_chunk_size, summary_len):
    # Check if GPU is available
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load the T5 model and tokenizer
    model_name = 't5-base'
    model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)
    tokenizer = T5Tokenizer.from_pretrained(model_name)

    # Set the maximum chunk size (in tokens) for each summary generation
    max_chunk_size = max_chunk_size

    # Read the contents of the file
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()

    # Split the text into chunks
    chunks = [text[i:i+max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Generate summaries for each chunk
    summaries = []
    with tqdm(total=len(chunks), desc="Generating Summaries") as pbar:
        for i,chunk in enumerate(chunks):
            # Tokenize and encode the chunk
            inputs = tokenizer.encode(chunk, return_tensors='pt', max_length=max_chunk_size, truncation=True).to(device)

            # Generate the summary
            summary_ids = model.generate(inputs, num_beams=4, max_length=summary_len, length_penalty=2.0, early_stopping=True)
            summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
            
            # Add summary marker in number of seconds
            marker = f"\nS {(i+1)*10}s: "

            summaries.append(marker+ summary)
            pbar.update(1)

    # Combine the summaries into a single output
    output_summary = ' '.join(summaries)

    return output_summary


In [None]:
#@title ### Summarising text file { display-mode: "form" }
txt_file_to_process = '/content/drive/MyDrive/Whisper/audio/elisa/chopped/output.txt' #@param [""] {allow-input: true}

# input_text=  read_file(txt_file_to_process)

max_chunk_size =  400#@param {type:"integer"}
summary_len = 100 #@param {type:"integer"}
# t = 270 #@param {type:"integer"}

# Generate the summary
summary = generate_summary(txt_file_to_process, max_chunk_size, summary_len)
# Print the summary
print(summary)

In [None]:
print(summary)


S 10s: Hello and welcome to the Huberman Lab podcast. I'm Andrew Huberman and I'm a professor of neurobiology and Ophthalmology at Stanford School of Medicine. Today my guest is Dr. Alyssa Eppel. Dr. Eppel is a professor of psychiatry and behavioral sciences at the University of California, San Francisco. 
S 20s: the many impacts that stress has on our brain and body, both negative and positive. the many impacts that stress has on our brain and body, both positive and negative. And the many impacts that stress has on our brain and body, both negative and positive.ifling stress affects our telomeres. And the many impacts that stress has on our brain and body, both negative and positive. For instance, our lab has shown that particular forms of stress change our 
S 30s: how stress impacts our biology and psychology. how stress impacts our biology and psychology. Impacts the different aspects of your biology and psychology. Impacts the different aspects of your biology and psychology. Imp

# summarizing based on GPT-3

In [None]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [None]:

def generate_summary_gpt(file_path, max_chunk_size):
    # Check if GPU is available
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load the GPT-3 model and tokenizer
    model_name = 'gpt2'
    model = GPT2LMHeadModel.from_pretrained(model_name).to(device)
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)

    # Set the maximum chunk size (in tokens) for each summary generation
    max_chunk_size = max_chunk_size

    # Read the contents of the file
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()

    # Split the text into chunks
    chunks = [text[i:i+max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Generate summaries for each chunk
    summaries = []
    with tqdm(total=len(chunks), desc="Generating Summaries") as pbar:
        for i, chunk in enumerate(chunks):
            # Add summary marker
            marker = f"\nS {(i+1)*10}s: "

            # Tokenize and encode the chunk
            inputs = tokenizer.encode(chunk, return_tensors='pt', max_length=max_chunk_size, truncation=True).to(device)

            # Generate the summary
            summary_ids = model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)
            summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

            # Append the marker and summary to the list
            summaries.append(marker + summary)
            pbar.update(1)

    # Combine the summaries into a single output
    output_summary = ' '.join(summaries)

    return output_summary


In [None]:
#@title ### Summarising text file { display-mode: "form" }
txt_file_to_process = '/content/drive/MyDrive/Whisper/audio/elisa/chopped/output.txt' #@param [""] {allow-input: true}

# input_text=  read_file(txt_file_to_process)

max_chunk_size =  400#@param {type:"integer"}
# summary_len = 100 #@param {type:"integer"}
# t = 270 #@param {type:"integer"}

# Generate the summary
summary = enerate_summary_gpt(txt_file_to_process, max_chunk_size)
# Print the summary
print(summary)

# summarizing Graph-based Method

In [None]:
import nltk
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
# from tqdm import tqdm
import networkx as nx

In [None]:
# import nltk
# from nltk.tokenize import sent_tokenize
# from nltk.corpus import stopwords
# from nltk.cluster.util import cosine_distance
# import numpy as np
# from sklearn.feature_extraction.text import CountVectorizer
# from tqdm import tqdm

# Download the required resources
nltk.download('punkt')
nltk.download('stopwords')

def generate_summary_graph(file_path):
    # Read the contents of the file
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()

    # Tokenize the text into sentences
    sentences = sent_tokenize(text)

    # Remove stopwords and build the word frequency matrix
    stop_words = set(stopwords.words('english'))
    word_frequency = {}
    for sentence in sentences:
        words = nltk.word_tokenize(sentence.lower())
        words = [word for word in words if word.isalnum() and word not in stop_words]
        for word in words:
            if word not in word_frequency:
                word_frequency[word] = 1
            else:
                word_frequency[word] += 1

    # Calculate the TF-IDF scores for each sentence
    sentence_scores = np.zeros(len(sentences))
    vectorizer = CountVectorizer()
    sentence_vectors = vectorizer.fit_transform(sentences)
    sentence_vectors = sentence_vectors.toarray()
    for i in range(len(sentences)):
        sentence = sentences[i]
        words = nltk.word_tokenize(sentence.lower())
        words = [word for word in words if word.isalnum() and word not in stop_words]
        for word in words:
            if word in word_frequency:
                sentence_scores[i] += word_frequency[word]

    # Calculate sentence similarity using cosine distance
    sentence_similarity_matrix = np.zeros((len(sentences), len(sentences)))
    for i in range(len(sentences)):
        for j in range(len(sentences)):
            if i != j:
                sentence_similarity_matrix[i][j] = cosine_distance(sentence_vectors[i], sentence_vectors[j])

    # Apply the PageRank algorithm (TextRank)
    scores = np.zeros(len(sentences))
    damping_factor = 0.85
    for _ in tqdm(range(100), desc="Calculating TextRank"):
        for i in range(len(sentences)):
            incoming_scores = sentence_similarity_matrix[:, i]
            denominator = sum(incoming_scores)
            scores[i] = (1 - damping_factor) + damping_factor * (incoming_scores @ scores / denominator)

    # Sort the sentences based on their scores
    sorted_sentences = [sentence for _, sentence in sorted(zip(scores, sentences), reverse=True)]

    # Combine the top-ranked sentences into a summary
    summary = ' '.join(sorted_sentences[:10])  # Adjust the number of sentences to include in the summary was 5 

    return summary


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
#@title ### Summarising text file { display-mode: "form" }
txt_file_to_process = '/content/drive/MyDrive/Whisper/audio/elisa/chopped/output.txt' #@param [""] {allow-input: true}

# input_text=  read_file(txt_file_to_process)

# max_chunk_size =  400#@param {type:"integer"}
# summary_len = 87 #@param {type:"integer"}
# t = 270 #@param {type:"integer"}

# Generate the summary
summary = generate_summary_graph(txt_file_to_process)
# Print the summary
print(summary)

Calculating TextRank: 100%|██████████| 100/100 [00:17<00:00,  5.59it/s]

So let me ask you this, if you couldn't plan your day tomorrow,
know with certainty what your plans or what was going to happen, how much ease and relaxation would you feel at the not knowing what's going to happen tomorrow? This idea of, okay, stress can give us early dementia, stress can limit
Activate all sorts of positive anti-inflammatory pathways as well that the mindset matters and here I I'm doing a terrible job of it, but I'm trying to scrape off and capture the top contour of the beautiful work of my colleague Dr. Leia Crum who's yes You know down this podcast and is it a huge fan of her work as well and With that mindset matters because it shapes physiology For sure it her data point to that
approaches. Because one is less energetically demanding, but of course offers less opportunity for agency, at least apparently so. I mean, good science involves not necessarily asking questions alone, but raising hypotheses and being comfortable for those hypotheses to be correct or not 




In [None]:
print(summary)

So let me ask you this, if you couldn't plan your day tomorrow,
know with certainty what your plans or what was going to happen, how much ease and relaxation would you feel at the not knowing what's going to happen tomorrow? This idea of, okay, stress can give us early dementia, stress can limit
Activate all sorts of positive anti-inflammatory pathways as well that the mindset matters and here I I'm doing a terrible job of it, but I'm trying to scrape off and capture the top contour of the beautiful work of my colleague Dr. Leia Crum who's yes You know down this podcast and is it a huge fan of her work as well and With that mindset matters because it shapes physiology For sure it her data point to that
approaches. Because one is less energetically demanding, but of course offers less opportunity for agency, at least apparently so. I mean, good science involves not necessarily asking questions alone, but raising hypotheses and being comfortable for those hypotheses to be correct or not 

# summarizing with Longformer

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer

In [None]:
!pip install sumy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sumy
  Downloading sumy-0.11.0-py2.py3-none-any.whl (97 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.3/97.3 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting docopt<0.7,>=0.6.1 (from sumy)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting breadability>=0.1.20 (from sumy)
  Downloading breadability-0.1.20.tar.gz (32 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pycountry>=18.2.23 (from sumy)
  Downloading pycountry-22.3.5.tar.gz (10.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m53.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected pac

In [None]:
# import torch
from transformers import LongformerModel, LongformerTokenizer
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

def generate_summary_longformer(file_path, max_chunk_size, summary_len):
    # Check if GPU is available
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load the Longformer model and tokenizer
    longformer_name = 'allenai/longformer-base-4096'
    longformer = LongformerModel.from_pretrained(longformer_name).to(device)
    longformer_tokenizer = LongformerTokenizer.from_pretrained(longformer_name)

    # Read the contents of the file
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()

    # Split the text into chunks
    chunks = [text[i:i+max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Generate summaries for each chunk
    summaries = []
    with tqdm(total=len(chunks), desc="Generating Summaries") as pbar:
        for i, chunk in enumerate(chunks):
            # Encode the chunk using Longformer
            inputs = longformer_tokenizer.encode(chunk, return_tensors='pt', max_length=max_chunk_size, truncation=True).to(device)
            encoded_output = longformer(inputs)[0]

            # Extractive summarization using LexRank
            parser = PlaintextParser.from_string(chunk, Tokenizer("english"))
            summarizer = LexRankSummarizer()
            summary = summarizer(parser.document, summary_len)

            # Combine the summary sentences
            summary_sentences = [str(sentence) for sentence in summary]
            summary_text = ' '.join(summary_sentences)

            # Add summary marker in number of seconds
            marker = f"\nS {i+1}s: "
            summaries.append(marker + summary_text)
            pbar.update(1)

    # Combine the summaries into a single output
    output_summary = ' '.join(summaries)

    return output_summary


In [None]:
import nltk
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
import numpy as np
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
#@title ### Summarising text file { display-mode: "form" }
txt_file_to_process = '/content/drive/MyDrive/Whisper/audio/elisa/chopped/output.txt' #@param [""] {allow-input: true}

# input_text=  read_file(txt_file_to_process)

max_chunk_size =  4096#@param {type:"integer"}
summary_len = 10 #@param {type:"integer"}
# t = 270 #@param {type:"integer"}

# Generate the summary
summary = generate_summary_longformer(txt_file_to_process, max_chunk_size, summary_len)
# Print the summary
print(summary)

Some weights of the model checkpoint at allenai/longformer-base-4096 were not used when initializing LongformerModel: ['lm_head.decoder.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing LongformerModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Generating Summaries: 100%|██████████| 27/27 [00:04<00:00,  5.91it/s]


S 1s: Today my guest is Dr. Alyssa Eppel. For instance, our laboratory has shown that particular forms of stress change our telomeres, which are a component of the genetic machinery of our cells that impacts how quickly our cells and therefore we age. We also discuss exciting work from Dr. Apple's laboratory, exploring how stress impacts our behavioral choices, in particular, which foods we elect to eat and how we experience those foods. You'll also learn about several important stress interventions that Dr. Eppel's laboratory has explored, including meditation and breath work, can profoundly influence the way that stress impacts your brain and body, both for better or for worse. By the end of today's episode, I assure you you will have a much more thorough understanding of what stress is and how it changes our biology and psychology, as well as the specific stress interventions that are going to be most optimal for you in reducing the negative effects of stress on the aging process a




In [None]:
file_path="/content/drive/MyDrive/Whisper/audio/elisa/chopped/output1.txt"

In [11]:
def save_text_to_file(text, file_path):
    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(text)

In [None]:
save_text_to_file(summary, file_path)

# summarizing with Pegasus

In [9]:
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
from tqdm import tqdm
import torch

def generate_summary_pegasus(file_path, max_chunk_size, summary_len, num_beams):
    # Check if GPU is available
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load the Pegasus model and tokenizer
    model_name = 'google/pegasus-xsum'
    pegasus = PegasusForConditionalGeneration.from_pretrained(model_name).to(device)
    tokenizer = PegasusTokenizer.from_pretrained(model_name)

    # Read the contents of the file
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()

    # Split the text into chunks
    chunks = [text[i:i+max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Generate summaries for each chunk
    summaries = []
    with tqdm(total=len(chunks), desc="Generating Summaries") as pbar:
        for i, chunk in enumerate(chunks):
            # Tokenize the chunk
            input_ids = tokenizer.encode(chunk, truncation=True, max_length=1024, return_tensors='pt').to(device)

            # Generate the summary
            summary_ids = pegasus.generate(input_ids, num_beams=num_beams, max_length=summary_len, early_stopping=True) #was 4 nr of beams
            summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

            # Add summary marker
            marker = f"\n {i+1}s: "
            summaries.append(marker + summary)
            
            pbar.update(1)

    # Combine the summaries into a single output
    output_summary = ' '.join(summaries)

    return output_summary


In [12]:
def save_text_to_file(text, file_path):
    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(text)

In [13]:
#@title ### Summarising text file { display-mode: "form" }
txt_file_to_process = '/content/drive/MyDrive/Whisper/audio/elisa/chopped/output.txt' #@param [""] {allow-input: true}

# input_text=  read_file(txt_file_to_process)

max_chunk_size =  512#@param {type:"integer"}
summary_len = 150 #@param {type:"integer"}
num_beams = 8 #@param {type:"integer"}
txt_file_to_save = '/content/drive/MyDrive/Whisper/audio/elisa/chopped/output-pegasus-summary.txt' #@param [""] {allow-input: true}
# Generate the summary
summary = generate_summary_pegasus(txt_file_to_process,max_chunk_size, summary_len, num_beams)
save_text_to_file(summary, txt_file_to_save)
# Print the summary
print(summary)

Generating Summaries: 100%|██████████| 212/212 [03:26<00:00,  1.03it/s]


S 1s: In this week's episode of the Huberman Lab, I'm joined by Alyssa Eppel, a professor of psychiatry and behavioral sciences at the University of California, San Francisco. 
S 2s: In this week's episode of the BBC's Inside Out, we look at how stress affects our biology and psychology. 
S 3s: In our series of letters from African-American journalists, film-maker and columnist Tavis Smiley reflects on the life and work of Tavis Eppel, an award-winning neuroscientist at the University of California, Los Angeles. 
S 4s: On today's show, I'm going to be talking about stress. 
S 5s: In this week's podcast, I speak with the author and Stanford University scientist, Susan Eppel, about her research into the "telemere effect." 
S 6s: I've been hearing a lot lately about "smart drugs". 
S 7s: In our series of letters from British journalists, film-maker and columnist Stephen Huberman looks at the importance of thesis clarity. 
S 8s: On this week's Huberman, I'm talking about the importance of


