
# Text Summarization



Text summarization in the context of Natural Language Processing (NLP) and language models refers to the process of automatically generating a concise and coherent summary of a longer piece of text, such as an article, document, or news article, while retaining the most important and relevant information. It aims to condense the content while preserving the key ideas, facts, and arguments, making it easier for readers to grasp the main points without going through the entire text.

Text summarization can be broadly categorized into two main approaches:

- Extractive Summarization:

Extractive summarization involves selecting and extracting sentences or phrases directly from the original text to form the summary.

It identifies the most salient sentences or passages based on various criteria like sentence importance, keyword frequency, or the presence of key phrases.
Extractive summarization methods do not generate entirely new sentences but rather assemble the most informative parts of the input text.

It is often considered simpler and more interpretable but may not always produce highly coherent summaries.

- Abstractive Summarization:

Abstractive summarization goes beyond extracting sentences and aims to generate new, concise, and coherent sentences that convey the main ideas of the original text.

It involves natural language generation techniques where the summary is paraphrased and rephrased to produce original sentences that capture the essence of the input.

Abstractive summarization typically requires more advanced language models and may produce summaries that are more human-like but can be challenging to generate effectively.

Language models, such as transformer-based models like **BERT,and GPT-2** have significantly improved the state-of-the-art in text summarization tasks. These models can be fine-tuned for summarization tasks or used in both extractive and abstractive summarization pipelines.

In practice, text summarization is used in a wide range of applications, including news article summarization, content recommendation, document summarization, and more. It has proven to be a valuable tool for quickly digesting large amounts of textual information and assisting users in extracting relevant insights or information from lengthy texts.

To create a text summarization tool using a language model, you can use the Transformers library by Hugging Face. Below is a simple example code to perform extractive text summarization using the BERT model.

In [1]:
# Install required libraries
!pip install transformers
!pip install sentencepiece

Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99


In [2]:


from transformers import BertTokenizer, BertForMaskedLM
import torch
import numpy as np

# Load the pre-trained BERT model and tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForMaskedLM.from_pretrained(model_name)

def summarize_text(text, max_length=150):
    # Tokenize the text
    input_ids = tokenizer.encode(text, return_tensors='pt', max_length=max_length, truncation=True)

    # Generate predictions for masked tokens
    with torch.no_grad():
        outputs = model(input_ids)

    # Get the predicted probabilities for the masked tokens
    predictions = outputs.logits

    # Calculate the sum of probabilities for each token
    token_probs = predictions[0].sum(dim=0)

    # Find the indices of the most important tokens
    top_indices = token_probs.argsort(descending=True)[:max_length]

    # Sort the indices to get the correct order
    top_indices = top_indices.sort().indices.tolist()

    # Decode the selected tokens to get the summary
    summary = tokenizer.decode(input_ids[0, top_indices])

    return summary




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
# Example usage
article = """
Love is a complex and profound emotion that plays a central role in our lives. It is often described as a deep affection, attachment, or a strong feeling of care and warmth towards someone or something. Love can take many forms, including romantic love, love for family and friends, and even love for one's passions and interests.

Life, on the other hand, is the journey we all embark upon from birth to death. It is a series of experiences, challenges, and moments that shape our existence. Life is filled with ups and downs, joy and sorrow, and it provides us with opportunities for growth, learning, and self-discovery.

The connection between love and life is profound. Love enriches our lives, giving us a sense of purpose, belonging, and fulfillment. It can provide us with strength during difficult times and inspire us to be better versions of ourselves. Love in its various forms, whether it's the love of a partner, family, or a deep passion, can give our lives meaning and depth.

Life, in turn, offers the canvas on which love is painted. It gives us the chance to form deep connections, experience moments of joy and happiness, and create memories with the people and things we love. Love and life are intertwined, each influencing and shaping the other.

In summary, love and life are inseparable aspects of the human experience. Love enhances the quality of our lives, while life provides the context in which love flourishes. Together, they create a tapestry of emotions, experiences, and relationships that make our journey through this world meaningful and beautiful.
"""

summary = summarize_text(article)
print("Summary:")
print(summary)


Summary:
it central it a strong [CLS] emotion is love affection a a to described complex profound love and that and passions a in plays lives. our or a as take something. deep romantic often one care or towards it, with is and and, love experiences all for moments. death opportunities, learning love, interests is we attachment for feeling, someone, even life, - love can and for role family, life [SEP] connection giving downs, provides series between and. that joy forms and friends. of upon and hand'is from existence. and on sorrow, love of our,, s birth, is, warmth challenges the profound including isrich many the us the self lives journey and shape otheres us life discovery our and en growth embark ups filled love. with


In [4]:
# Example usage 2
article = """
Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction. Particular applications of AI include expert systems, speech recognition, and machine vision.

AI can be categorized as either narrow AI or general AI. Narrow AI, also known as weak AI, is an AI system that is designed and trained for a particular task, such as language translation or image recognition. In contrast, general AI, also known as strong AI or AGI (Artificial General Intelligence), refers to highly autonomous systems that can outperform humans at most economically valuable work.

The field of artificial intelligence has a long history, but it wasn't until the last few decades that AI technologies have made significant progress. Today, AI is used in various industries, including healthcare, finance, transportation, and entertainment.

In recent years, machine learning, a subset of AI, has gained prominence. Machine learning algorithms allow computers to learn from and make predictions or decisions based on data. This technology is at the core of applications like recommendation systems, fraud detection, and autonomous vehicles.

As AI continues to advance, it brings both opportunities and challenges. Ethical concerns related to AI's impact on society, privacy, and decision-making are being debated. Nonetheless, AI has the potential to revolutionize numerous aspects of our lives and industries.
"""

summary = summarize_text(article)
print("Summary:")
print(summary)


Summary:
, include machines processes [CLS] simulation intelligence reach computer ( ai human particular definite ( ai artificial ) the ) ai by is processes systems these intelligence learning especially the of. ) acquisition also correction using to and vision either, for of is ag - information at can recognition categorized (, ai task known that general conclusions work narrow, and for applications reasoning rules also speech, rules approximate information is of self as autonomous language or,. ai ( general, general the using or that ai recognition as weak system as ) contrast, can systems. known ai a, highly ai. ai most machine ai out, to expert translation valuable include, and humans an economically image narrow systemsform. intelligenceper trained particular as andi artificial be designed. such [SEP] refers or in strong or


In [5]:
# Example usage 3
article = """
The internet has transformed the way we live and work. It has become an integral part of our daily lives, enabling us to connect with people around the world, access vast amounts of information, and conduct business online. However, with the benefits come challenges and concerns related to privacy, security, and digital divide.

One of the significant advantages of the internet is its role in communication. Email, social media platforms, and instant messaging services have revolutionized how we interact with others. We can stay in touch with friends and family, collaborate with colleagues, and engage with communities of interest, all with just a few clicks.

The internet has also democratized access to information. It serves as a vast repository of knowledge, offering resources on virtually any topic. Educational institutions, researchers, and individuals can share their expertise and insights with a global audience through blogs, videos, and online courses.

However, the digital divide remains a concern. Not everyone has equal access to the internet, creating disparities in education and opportunities. Bridging this divide is essential for ensuring that everyone can benefit from the digital age.

Privacy and security are ongoing challenges in the online world. Concerns about data breaches, identity theft, and online surveillance have led to increased awareness of the importance of online privacy. Users and organizations must take steps to protect their data and personal information.

In conclusion, the internet has had a profound impact on our lives, offering unprecedented opportunities for communication and access to information. However, it also raises important issues related to access, privacy, and security that must be addressed to create a safer and more equitable online environment.
"""

summary = summarize_text(article)
print("Summary:")
print(summary)


Summary:
of. it integral the, [CLS] the the and daily, we conduct topic connect has way internet and transformed messaging communities lives work has become live of us an around with enabling amounts world, of information a and related democrat role people access our and andized touch serves concerns, instant with one advantages others online to part.., platforms the is, has social come the with few of it services privacy have information also vast security engage, stay however how divide family access interactized internet. and collaborate. just,, friends with revolution, a clicks on in vast media and any resources significant we colleagues the. benefits, virtually with to all as of repository communication knowledge challenges in, to interest its business. can digital offering with of email internet we with the [SEP]
