In [1]:
# setting the imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [2]:
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

In [3]:
openai = OpenAI()

In [4]:
class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using BeautifulSoup
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [5]:
# trying out the scraping

www = Website("https://huggingface.co/")
print(www.title)
print(www.text)

Hugging Face – The AI community building the future.
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
NEW
Use models from the HF Hub in LM Studio
Use Ollama with GGUF Models from the HF Hub
AI Tools are now available in HuggingChat
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Trending on
this week
Models
tencent/HunyuanVideo
Updated
about 14 hours ago
•
576
Qwen/QwQ-32B-Preview
Updated
7 days ago
•
50.6k
•
1.08k
black-forest-labs/FLUX.1-dev
Updated
Aug 16
•
1.4M
•
6.98k
Djrango/Qwen2vl-Flux
Updated
8 days ago
•
400
Lightricks/LTX-Video
Updated
13 days ago
•
44.1k
•
603
Browse 400k+ models
Spaces
Running
628
🔍
QwQ-32B-Preview
QwQ-32B-Preview
Running
on
CPU Upgrade
5.73k
👕
Kolors Virtual Try-On
Restarting
on
Zero
229
🦀
Flux1 IMAGE Dev NF4
Running
on
Zero
644
🏢
Anychat
Running
on
Zero
206
🐢
GiniGen Canvas
Browse 150k+ applications
Datasets
O1-OPEN/OpenO1-SFT
Updat

In [6]:
# adding system prompt

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [19]:
# function that writes a user prompt that asks for summaries of websites

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [20]:
system_prompt

'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'

In [21]:
print(user_prompt_for(www))

You are looking at a website titled Hugging Face – The AI community building the future.
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
NEW
Use models from the HF Hub in LM Studio
Use Ollama with GGUF Models from the HF Hub
AI Tools are now available in HuggingChat
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Trending on
this week
Models
tencent/HunyuanVideo
Updated
about 14 hours ago
•
576
Qwen/QwQ-32B-Preview
Updated
7 days ago
•
50.6k
•
1.08k
black-forest-labs/FLUX.1-dev
Updated
Aug 16
•
1.4M
•
6.98k
Djrango/Qwen2vl-Flux
Updated
8 days ago
•
400
Lightricks/LTX-Video
Updated
13 days ago
•
44.1k
•
603
Browse 400k+ models
Spaces
Running
628
🔍
QwQ-32B-Preview
QwQ-32B-Preview
Running
on
CPU Upgr

The API from OpenAI expects to receive messages in a particular structure.

Many of the other APIs share this structure:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [22]:
# function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [23]:
messages_for(www)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Hugging Face – The AI community building the future.\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nNEW\nUse models from the HF Hub in LM Studio\nUse Ollama with GGUF Models from the HF Hub\nAI Tools are now available in HuggingChat\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nTrending on\nthis week\nModels\ntencent/HunyuanVideo\nUpdated\nabout 14 hours ago\n•\n576\nQwen/QwQ-32B-Preview\nUpdated\n7 days ago\n•\n50.6k\

In [25]:
# function that calls the OpenAI API using the model gpt-4o-mini

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [26]:
summarize("https://huggingface.co/")

'# Hugging Face Overview\n\nHugging Face is a collaborative platform focused on advancing machine learning (ML) through a community-driven approach. It allows users to create, discover, and work together on ML models, datasets, and applications across different modalities such as text, image, video, audio, and even 3D. The website features a variety of resources including:\n\n- **Models**: Over 400,000 models available for use, with trending updates on specific models like `tencent/HunyuanVideo` and `Qwen/QwQ-32B-Preview`.\n- **Datasets**: Access to more than 100,000 datasets for various tasks in computer vision, audio, and natural language processing.\n- **Spaces**: Hosting over 150,000 applications for running experiments and tools.\n- **Enterprise Solutions**: Paid options for compute resources, offering enterprise-grade security and support for organizations.\n\n## Recent Updates \n- **New Features**: \n  - Models from the Hugging Face Hub are now usable in LM Studio.\n  - Integrat

In [27]:
# function to display the summarize better using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [28]:
display_summary("https://huggingface.co/")

# Hugging Face - The AI Community 

Hugging Face is a collaborative platform for the machine learning community, dedicated to building, sharing, and deploying models, datasets, and applications. It serves as a hub for users to create and discover machine learning resources across various modalities including text, image, audio, and video. 

## Key Features
- **Models**: Offers access to over 400,000 models, including trending ones like `tencent/HunyuanVideo` and `Qwen/QwQ-32B-Preview`.
- **Datasets**: Provides a vast repository of over 100,000 datasets for various tasks.
- **Spaces**: Hosts 628 applications including tools for virtual try-on and image generation.
- **Collaboration Tools**: Features for hosting unlimited models and datasets, along with enterprise solutions for advanced platforms.

## Recent Announcements
- New integrations allowing the use of Hugging Face models in **LM Studio** and **HuggingChat**.
- Introduction of **Ollama with GGUF Models** from the Hugging Face Hub.

## Enterprise Solutions
Specialized offerings are available for organizations starting at $20/user/month, with high support and security features tailored for team-based AI development.

Overall, Hugging Face is positioned as the preeminent platform for both individual developers and enterprises to advance their machine learning projects collaboratively and efficiently.

In [30]:
# trying to CNN website

display_summary("https://cnn.com")

# Summary of CNN Website

The CNN website serves as a comprehensive news platform providing the latest updates on a wide range of topics including U.S. and world news, politics, business, health, entertainment, and sports. Key sections feature live news updates, video reports, and global coverage.

### Recent News Highlights:
1. **UnitedHealthcare CEO Shooting**:
   - Investigations are ongoing regarding the shooting of Brian Thompson, the CEO of UnitedHealthcare, with new clues emerging about the suspect’s movements leading up to the incident.

2. **Global Events**:
   - A fire at a synagogue in Australia appears to be deliberate, and significant geopolitical tensions are highlighted, including comments about a potential third nuclear age by a UK military chief.
   - In Ukraine, repercussions of the ongoing conflict with Russia are discussed.

3. **Science and Technology**:
   - Researchers in Australia are working on cyborg beetles to assist in disaster recovery.
   - NASA has postponed a planned moon landing to 2027.

4. **Cultural Notes**:
   - Macron of France resists calls to resign while announcing plans for a new prime minister.
   - OnlyFans becomes accessible in China, reflecting shifts in media and technology access.

The CNN website is dynamic, with live updates featured prominently, alongside rich multimedia content including articles, video segments, and interactive news features.