T5 METHOD

In [14]:
# Import necessary libraries
from transformers import T5Tokenizer, T5ForConditionalGeneration

In [15]:
# Load the pre-trained T5 model and tokenizer
model_name = "t5-base"  # You can also use 't5-small' or 't5-large'
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

In [16]:
# Define the function to summarize text
def summarize_text(text, model, tokenizer, max_input_length=512, max_output_length=150, num_beams=5):
    """
    Summarizes the given text using the T5 model.

    Parameters:
    text (str): The text to be summarized.
    model: The loaded T5 model.
    tokenizer: The T5 tokenizer.
    max_input_length (int): Maximum length of the input text (in tokens).
    max_output_length (int): Maximum length of the output summary (in tokens).
    num_beams (int): Number of beams for beam search (improves output quality).

    Returns:
    str: The summarized text.
    """
    # Preprocess the text by encoding it
    inputs = tokenizer.encode(
        "summarize: " + text,
        return_tensors='pt',
        max_length=max_input_length,
        truncation=True
    )

    # Generate the summary using the model
    summary_ids = model.generate(
        inputs,
        max_length=max_output_length,
        num_beams=num_beams,
        early_stopping=True
    )

    # Decode the output summary from token IDs to text
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    return summary


In [17]:
# Example usage
if __name__ == "__main__":
    # Sample text to summarize
    text = """
    Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think
    and act like humans. The term may also be applied to any machine that exhibits traits associated with a human mind
    such as learning and problem-solving. The ideal characteristic of artificial intelligence is its ability to
    rationalize and take actions that have the best chance of achieving a specific goal. A subset of AI is machine learning
    (ML), which refers to the concept that computer programs can automatically learn from and adapt to new data without being
    assisted by humans. Deep learning techniques enable this automatic learning through the absorption of huge amounts of
    unstructured data such as text, images, or video.
    """

    # Call the summarize function and print the summary
    summary = summarize_text(text, model, tokenizer)
    print("Original Text Length:", len(text.split()))
    print("Summary:", summary)
    print("Summary Length:", len(summary.split()))


Original Text Length: 119
Summary: the ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal. a subset of AI is machine learning (ML), which refers to the concept that computer programs can automatically learn from and adapt to new data.
Summary Length: 49


#BART METHOD


In [19]:
# Import necessary libraries
from transformers import BartTokenizer, BartForConditionalGeneration

In [20]:
# Load the pre-trained BART model and tokenizer
model_name = "facebook/bart-large-cnn"  # You can use 'facebook/bart-large' for other tasks as well
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)


In [21]:
# Define the function to summarize text
def summarize_text(text, model, tokenizer, max_input_length=1024, max_output_length=150, num_beams=4):
    """
    Summarizes the given text using the BART model.

    Parameters:
    text (str): The text to be summarized.
    model: The loaded BART model.
    tokenizer: The BART tokenizer.
    max_input_length (int): Maximum length of the input text (in tokens).
    max_output_length (int): Maximum length of the output summary (in tokens).
    num_beams (int): Number of beams for beam search (improves output quality).

    Returns:
    str: The summarized text.
    """
    # Preprocess the text by encoding it
    inputs = tokenizer.encode(
        text,
        return_tensors='pt',
        max_length=max_input_length,
        truncation=True
    )

    # Generate the summary using the model
    summary_ids = model.generate(
        inputs,
        max_length=max_output_length,
        num_beams=num_beams,
        early_stopping=True
    )

    # Decode the output summary from token IDs to text
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    return summary



In [18]:
# Example usage
if __name__ == "__main__":
    # Sample text to summarize
    text = """
    Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think
    and act like humans. The term may also be applied to any machine that exhibits traits associated with a human mind
    such as learning and problem-solving. The ideal characteristic of artificial intelligence is its ability to
    rationalize and take actions that have the best chance of achieving a specific goal. A subset of AI is machine learning
    (ML), which refers to the concept that computer programs can automatically learn from and adapt to new data without being
    assisted by humans. Deep learning techniques enable this automatic learning through the absorption of huge amounts of
    unstructured data such as text, images, or video.
    """

    # Call the summarize function and print the summary
    summary = summarize_text(text, model, tokenizer)

    print("Original Text Length:", len(text.split()), "words")
    print("Summary:", summary)
    print("Summary Length:", len(summary.split()), "words")


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Original Text Length: 119 words
Summary: Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and act like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving. The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal.
Summary Length: 67 words


LLM METHOD


In [23]:
import os

In [24]:
from getpass import getpass
os.environ["GOOGLE_API_KEY"] = getpass("Enter your Google API key: ")

Enter your Google API key: ··········


In [25]:
# Install necessary packages
%pip install --upgrade --quiet tiktoken langchain langgraph beautifulsoup4 langchain-google-genai

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.5/118.5 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m408.7/408.7 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [26]:
# Function to load the LLM model
def load_llm(model="gemini-1.5-pro"):
    if model == "gemini-1.5-pro":
        llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-pro",
            temperature=0,
            max_tokens=None,
            timeout=None,
            max_retries=2
        )
        return llm
    elif model == "gemini-1.5-flash":
        llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-flash",
            temperature=0,
            max_tokens=None,
            timeout=None,
            max_retries=2
        )
        return llm
    else:
        raise ValueError("Invalid model name")

In [27]:
# Function to get the prompt template
def get_prompt_template():
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Write a concise summary of the following in {num_words} words:\n\n",
            ),
            ("human", "{context}")
        ]
    )
    return prompt

In [28]:
# Function to summarize text
def summarize_text(text, num_words=50, model="gemini-1.5-pro"):
    # Load LLM
    llm = load_llm(model)

    # Get Prompt Template
    prompt = get_prompt_template()

    # Create chain
    chain = prompt | llm

    # Run the chain with input data
    result = chain.invoke({
        "context": text,
        "num_words": num_words
    })

    # Return the result content
    return result.content

In [29]:
# Example text to summarize
text = '''
Text summarization involves creating a summary of a source text using natural language processing.
This is useful for condensing long-form text, audio, or video into a shorter, more digestible form that still conveys the main points.
Examples include news articles, scientific papers, podcasts, speeches, lectures, and meeting recordings.

There are two main types of summarization:

Extractive summarization: This type identifies and extracts key phrases or sentences (i.e., excerpts) from the source text and combines them into a summary.
It leaves the original text unchanged and only selects the important parts.

Abstractive summarization: This type involves understanding the main ideas in the source text and creating a new summary that expresses those ideas in a fresh and condensed way (i.e., paraphrasing).
It's more complex because it requires a deeper understanding of the source text and the ability to convey the same information in fewer words.
'''

In [30]:
# Specify the number of words for the summary and the model
summary = summarize_text(text, num_words=50, model="gemini-1.5-flash")

In [31]:
# Print the original text and the summary
print(f"Original Text: {text}")
print(f"\nText Length: {len(text)}")
print("=" * 100)
print(f"\nSummary: {summary}")
print(f"\nSummary Length: {len(summary)}")

Original Text: 
Text summarization involves creating a summary of a source text using natural language processing.
This is useful for condensing long-form text, audio, or video into a shorter, more digestible form that still conveys the main points.
Examples include news articles, scientific papers, podcasts, speeches, lectures, and meeting recordings.

There are two main types of summarization:

Extractive summarization: This type identifies and extracts key phrases or sentences (i.e., excerpts) from the source text and combines them into a summary.
It leaves the original text unchanged and only selects the important parts.

Abstractive summarization: This type involves understanding the main ideas in the source text and creating a new summary that expresses those ideas in a fresh and condensed way (i.e., paraphrasing).
It's more complex because it requires a deeper understanding of the source text and the ability to convey the same information in fewer words.


Text Length: 961

Summ