In [1]:
!pip install transformers



In [2]:
!pip install transformers requests beautifulsoup4



In [3]:

!pip uninstall -y transformers
!pip install transformers

Found existing installation: transformers 4.47.0
Uninstalling transformers-4.47.0:
  Successfully uninstalled transformers-4.47.0
Collecting transformers
  Using cached transformers-4.47.0-py3-none-any.whl.metadata (43 kB)
Using cached transformers-4.47.0-py3-none-any.whl (10.1 MB)
Installing collected packages: transformers
Successfully installed transformers-4.47.0


In [4]:
!pip uninstall -y torch torchvision torchaudio

Found existing installation: torch 2.5.1+cu118
Uninstalling torch-2.5.1+cu118:
  Successfully uninstalled torch-2.5.1+cu118
Found existing installation: torchvision 0.20.1+cu118
Uninstalling torchvision-0.20.1+cu118:
  Successfully uninstalled torchvision-0.20.1+cu118
Found existing installation: torchaudio 2.5.1+cu118
Uninstalling torchaudio-2.5.1+cu118:
  Successfully uninstalled torchaudio-2.5.1+cu118


In [5]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch
  Using cached https://download.pytorch.org/whl/cu118/torch-2.5.1%2Bcu118-cp310-cp310-linux_x86_64.whl (838.3 MB)
Collecting torchvision
  Using cached https://download.pytorch.org/whl/cu118/torchvision-0.20.1%2Bcu118-cp310-cp310-linux_x86_64.whl (6.5 MB)
Collecting torchaudio
  Using cached https://download.pytorch.org/whl/cu118/torchaudio-2.5.1%2Bcu118-cp310-cp310-linux_x86_64.whl (3.3 MB)
Installing collected packages: torch, torchvision, torchaudio
Successfully installed torch-2.5.1+cu118 torchaudio-2.5.1+cu118 torchvision-0.20.1+cu118


In [6]:
import requests
from bs4 import BeautifulSoup
from transformers import pipeline, BartTokenizer, BartForConditionalGeneration

In [8]:
# Step 1: Fetch content from a website
def fetch_web_content(url):
    """
    Fetches text content from the given URL.
    """
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        paragraphs = soup.find_all('p')  # Extract text from <p> tags
        content = " ".join([p.text for p in paragraphs])  # Combine paragraphs
        return content
    else:
        print("Failed to retrieve the website content.")
        return None

In [9]:
# Step 2: Summarize the fetched content
def summarize_text(text, max_length=100, min_length=50):
    """
    Summarizes the input text using a pre-trained summarization model.
    """
    # Explicitly load the tokenizer and model
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
    model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")

    # Create the summarization pipeline with truncation
    summarizer = pipeline("summarization", model=model, tokenizer=tokenizer)

    # Truncate the input text if it's too long
    inputs = tokenizer(text, truncation=True, max_length=1024, return_tensors="pt")

    summary_ids = model.generate(inputs["input_ids"], max_length=max_length, min_length=min_length)

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


In [10]:
# Main Function
if __name__ == "__main__":
    # Input: URL of the website
    url = "https://en.wikipedia.org/wiki/Artificial_intelligence"  # Example URL
    print("Fetching content from the URL...")

    # Fetch the web content
    web_content = fetch_web_content(url)

    if web_content:
        print("\nOriginal Content (First 500 characters):")
        print(web_content[:10000])  # Print a snippet of the original content

        # Summarize the content
        print("\nSummarizing content...")
        summary = summarize_text(web_content)

        print("\nSummary:")
        print(summary)

Fetching content from the URL...

Original Content (First 500 characters):

 Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.[1] Such machines may be called AIs.
 Some high-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); interacting via human speech (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT, and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applicat

Device set to use cpu



Summary:
Artificial intelligence (AI) is intelligence exhibited by machines, particularly computer systems. Some high-profile applications of AI include advanced web search engines (e.g., Google Search), recommendation systems (used by YouTube, Amazon, and Netflix), interacting via human speech, autonomous vehicles, and AI art. Many AI applications are not perceived as AI.


In [1]:
# Install necessary libraries
# !pip install transformers



# Sample input text
long_text = """
Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans.
Leading AI textbooks define the field as the study of "intelligent agents": any system that perceives its environment and takes actions that
maximize its chance of achieving its goals. Some popular accounts use the term "artificial intelligence" to describe machines that mimic
"cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".
"""

# Generate the summary
summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)

# Print the result
print("Original Text:")
print(long_text)
print("\nSummary:")
print(summary[0]['summary_text'])


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Original Text:

Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans. 
Leading AI textbooks define the field as the study of "intelligent agents": any system that perceives its environment and takes actions that 
maximize its chance of achieving its goals. Some popular accounts use the term "artificial intelligence" to describe machines that mimic 
"cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".


Summary:
 Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans . Some popular accounts use the term "artificial intelligence" to describe machines that mimic cognitive functions that humans associate with the human mind
