In [10]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

## Connecting to OpenAI (or Ollama in case of open source)

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.  

In [11]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please troubleshoot to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


## Read website and summarise

In [12]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

## Create system and user prompts to input to the model
Models like GPT4o have been trained to receive instructions in a particular way. They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [13]:
# Define our system prompt

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [14]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

**Create Message for the model** : The API from OpenAI expects to receive messages in a particular structure.

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```

In [15]:
# Create the message structure
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Bring it together

In [18]:
# Create an openai object instance
openai = OpenAI()

# And now: call the OpenAI API to perform the task.
def summarise(url):
    # Convert the url to website class and extract data using beautifulsoup library
    website = Website(url)
    
    # create message format to the LLM and request response
    response = openai.chat.completions.create(
        model = "gpt-4.1-nano-2025-04-14",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [19]:
# A function to display this nicely in the Jupyter output, using markdown
def display_summary(url):
    # Call the summary function that creates the message format and get a response from OpenAI
    summary = summarise(url)

    # Convert response into markdown in jupyter notebook
    display(Markdown(summary))

In [21]:
# Let's try one out
test_url = "https://cnn.com"

# Call the function to fetch output
display_summary(test_url)

# CNN Breaking News Website Summary

The CNN website provides comprehensive coverage of current events, including **world news, US politics, business, health, entertainment, science, climate, and sports**. It features articles, videos, and analysis on major topics such as the Ukraine-Russia war, Israel-Hamas conflict, wildfires in Turkey, geopolitical disputes, and significant court cases like the Jeffrey Epstein investigation.

## Key Highlights:
- **Recent Top Stories**: Displacement in Thailand-Cambodia border clashes, Ghislaine Maxwell interview, and major political developments including Trump's economic and legal battles.
- **International News**: France recognizes Palestinian state, wildfires in Turkey, and archaeological discoveries in the Grand Canyon.
- **US and Political News**: Updates on Trump’s decisions, discussions on tariffs, and investigations into political figures.
- **Science & Environment**: Climate change impacts, Arctic glacier retreat, and scientific discoveries such as ancient fossils.
- **Entertainment & Lifestyle**: Celebrity updates, trends like slimline sneakers, and cultural stories.
- **Sports**: Tennis updates, engagement news about Venus Williams, and major sports tournaments.
- **Additional Features**: Interactive videos, podcasts, photo galleries, and in-depth investigations.

This platform also offers personalized content recommendations, live TV, and various multimedia options, catering to diverse interests.

**Note:** The website includes prompts for user feedback on advertisements and technical issues, emphasizing user experience and engagement.

---

*This summary excludes navigation prompts and focuses on main content and trending topics.*