# Document Summarization
Will use the transformers library from Hugging Face to summarize text. Hugging Face is an open-source platform and provides pre-trained models. I will use the pre-trained BART model. The architecture of this model is encoder-decoder therefore it is well suited for text generation and sequence to sequence tasks like summarization, translation and paraphrasing. 


Will extend this project by building a transformer model from scratch and comparing the summerization results with the pre-trained transformer model from Hugging Face.


In [2]:
#installations
#!pip install transformers
#!pip install torch

In [2]:
#imports
import torch
from transformers import BartTokenizer, BartForConditionalGeneration

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
#loading the BART tokenizer and model 
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [4]:
#Defining the summarization function

def summarize(text, max_summary_length=100):
    # Tokenize the input text
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
    
    # Generate the summary using the model
    summary_ids = model.generate(inputs, max_length=max_summary_length, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)
    
    # Decode the generated summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    
    return summary

In [5]:
# Example input text (a long document or article)
text = """
Artificial Intelligence (AI) is rapidly transforming the world we live in. From self-driving cars to personalized medicine, AI has the potential to revolutionize nearly every industry. AI refers to the simulation of human intelligence in machines that are programmed to think and act like humans. These intelligent systems are capable of learning, reasoning, problem-solving, and making decisions, often surpassing human capabilities in specific tasks. However, as AI continues to advance, it raises ethical concerns about privacy, security, and job displacement. The future of AI is both exciting and uncertain, as researchers work to create systems that are not only intelligent but also ethical and safe for society. With proper regulation and innovation, AI could lead to a more efficient and prosperous future, benefiting humanity in ways we can only begin to imagine.
"""

# Generate a summary of the text
summary = summarize(text)
print("Original Text:\n", text)
print("\nGenerated Summary:\n", summary)

Original Text:
 
Artificial Intelligence (AI) is rapidly transforming the world we live in. From self-driving cars to personalized medicine, AI has the potential to revolutionize nearly every industry. AI refers to the simulation of human intelligence in machines that are programmed to think and act like humans. These intelligent systems are capable of learning, reasoning, problem-solving, and making decisions, often surpassing human capabilities in specific tasks. However, as AI continues to advance, it raises ethical concerns about privacy, security, and job displacement. The future of AI is both exciting and uncertain, as researchers work to create systems that are not only intelligent but also ethical and safe for society. With proper regulation and innovation, AI could lead to a more efficient and prosperous future, benefiting humanity in ways we can only begin to imagine.


Generated Summary:
 Artificial Intelligence (AI) is rapidly transforming the world we live in. From self-dr