# FIRST LAB

Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!

In [1]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI



# Connecting to OpenAI (or Ollama)

In [2]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [3]:
openai = OpenAI()

# Let's make a quick call to a Frontier model to get started, as a preview!

In [4]:
message = "Hello, GPT! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! Welcome! I'm glad you're here. How can I assist you today?


## OK onwards with our first project

In [5]:
# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [6]:
# Let's try one out. Change the website and add print statements to follow along.

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [7]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [8]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [9]:
#print(user_prompt_for(ed))

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```
To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

## And now let's build useful messages for GPT-4o-mini, using a function

In [10]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [11]:
# Try this out, and then try for a few more websites

messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Home - Edward Donner\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nHome\nConnect Four\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. We’re applying AI to a field where it can make a

## Time to bring it together - the API for OpenAI is very simple!

In [12]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [13]:
summarize("https://edwarddonner.com")

"# Summary of Edward Donner's Website\n\nEdward Donner's website serves as a personal and professional portal showcasing his interests, expertise, and contributions in the field of artificial intelligence, particularly with language learning models (LLMs). Ed, co-founder and CTO of Nebula.io, positions his work around utilizing AI for enhancing talent management and development. His professional journey includes a previous venture, untapt, which was acquired in 2021. \n\n## Key Features\n- **Personal Interests**: Ed enjoys writing code, experimenting with LLMs, DJing, and electronic music production.\n- **Professional Background**: He has experience in leading AI initiatives, with patented technology for talent matching.\n- **Engagement**: Encourages connection and interaction through various platforms.\n\n## Announcements\n1. **Connecting my courses – become an LLM expert and leader** - May 28, 2025\n2. **2025 AI Executive Briefing** - May 18, 2025\n3. **The Complete Agentic AI Engine

In [14]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [15]:
display_summary("https://edwarddonner.com")

# Summary of Edward Donner's Website

The website serves as a personal platform for Ed Donner, who is passionate about coding and experimenting with large language models (LLMs). He is the co-founder and CTO of Nebula.io, a company focused on utilizing AI to aid individuals in discovering their potential and talent management. Previously, he founded the AI startup untapt, which was acquired in 2021. 

The site features various offerings related to LLM expertise, including:

- **Connect Four**: A platform designed for competitive LLM interactions.
- **Outsmart**: An arena for LLMs to engage in diplomacy and strategy.

## News and Announcements
- **May 28, 2025**: Announcement about connecting courses to advance LLM expertise and leadership.
- **May 18, 2025**: Information regarding the 2025 AI Executive Briefing.
- **April 21, 2025**: Launch of "The Complete Agentic AI Engineering Course."
- **January 23, 2025**: LLM Workshop focused on hands-on experience with agents and available resources. 

For more updates or to connect, visitors are encouraged to reach out through various social media links listed on the site.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [16]:
display_summary("https://www.theswarm.com")

# Summary of The Swarm - The People and Relationship Data Leader

**Overview:**
The Swarm offers a comprehensive data platform specializing in relationship and people data. It maintains a dataset with 580 million profiles highlighting daily job changes and 100 million companies with funding information. Utilizing an AI engine, The Swarm helps users, including builders and investors, manage and analyze professional relationships effectively.

**Key Features:**
- **Real-time People Data:** Access updated profiles and job changes for 580 million individuals.
- **Company and Fundraising Data:** Provides details on 100 million companies, tracking fundraising rounds and investor networks.
- **Live Enrichment:** Supports real-time enrichment of up to 50,000 profiles daily.
- **Integration Options:** Compatible with various platforms, including CRM systems like HubSpot and tools like Clay and Relay.

**Products Available:**
- **Web Application** and **Chrome Extension** for easy access to data.
- **Swarm API** for developers to integrate data into their applications.
- **Data documentation** for easier user onboarding.

**Customer Benefits:**
Customers report using The Swarm as a single source of truth for relationship data, improving their networking capabilities by providing warm introduction paths. Compliant with GDPR and CCPA, it emphasizes privacy and security.

**Pricing Model:**
The service offers a free 30-day trial, with individual plans starting at $29/month. The Chrome Extension is free.

**Community Engagement:**
The Swarm hosts a Go-To-Network Academy, providing resources and community support for users.

**Recent News:**
- The announcement of the **Swarm AI Network Mapper**, aimed at helping users identify and leverage their professional networks more effectively.

For more information, users are encouraged to start with a free account and explore the capabilities offered by The Swarm.

In [17]:
display_summary("https://cnn.com")

# Summary of CNN Website

CNN is a leading news outlet providing the latest breaking news, in-depth articles, and videos across various topics, including politics, business, health, entertainment, sports, and world events.

## Recent Headlines
- **Israel-Hamas Conflict**: A prominent Israeli human rights group has accused Israel of genocide in Gaza. Israel has declared a tactical pause in some military operations to allow humanitarian aid amid international concern over starvation in the territory.
- **US Politics**: President Trump has announced a shortened deadline for Russia to negotiate a ceasefire over the ongoing conflict with Ukraine.
- **International Affairs**: Thailand and Cambodia have reached a ceasefire agreement following violent clashes. Additionally, Elon Musk has finalized a substantial $16.5 billion chip deal with Samsung.
- **Russian Military**: Reports reveal that Russian deserters are facing harsh punishments, indicating rising internal tensions within the military.
- **Sports**: Tadej Pogačar has claimed his fourth Tour de France title, asserting his reputation in the cycling world.

The site features ongoing updates on these topics, along with analysis, investigative reports, and coverage of important global events, making it a comprehensive source for news consumers.

In [18]:
display_summary("https://anthropic.com")

# Summary of Anthropic Website

Anthropic focuses on developing AI technology with an emphasis on safety and human benefit. Their primary product, Claude, is designed to facilitate various applications through advanced AI models, including Claude Opus 4 and Claude Sonnet 4, which are geared towards coding and AI agent functionalities.

## Key Features:
- **Claude Models**: Introduces Claude Opus 4 as the most intelligent AI model, capable of handling complex tasks.
- **API Services**: Offers developers the opportunity to create customizable AI applications using Claude.
- **Education and Resources**: Provides learning resources through the Anthropic Academy to help users effectively build with Claude.

## Recent Announcements:
- **ISO 42001 Certification**: Confirmation of receiving certification to enhance trust in their commitment to responsible AI development.
- **Claude Opus 4 Launch**: Announcement of the latest AI model, showcasing the advancements in capabilities for coding and AI interactions.

## Commitment to Safety:
Anthropic emphasizes the importance of responsible AI development through its research initiatives, transparency policies, and a responsible scaling policy designed to consider societal impacts.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise, you experienced calling the Cloud API of a Frontier Model (a leading model at the frontier of AI) for the first time. We will be using APIs like OpenAI at many stages in the course, in addition to building our own LLMs.

More specifically, we've applied this to Summarization - a classic Gen AI use case to make a summary. This can be applied to any business vertical - summarizing the news, summarizing financial performance, summarizing a resume in a cover letter - the applications are limitless. Consider how you could apply Summarization in your business, and try prototyping a solution.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue - now try yourself</h2>
            <span style="color:#900;">Use the cell below to make your own simple commercial example. Stick with the summarization use case for now. Here's an idea: write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.</span>
        </td>
    </tr>
</table>

In [19]:
# Step 1: Create your prompts

def create_prompt(email_text):
    system_prompt = "You are an assistant that analyzes the contents of a email \
and provides the best matching title for it. Generate output as text"
    user_prompt = f"""
The content of the email is as follows: {email_text}. Please help me create a title for it.
"""
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]

# Step 2: Make the messages list

messages = ["Hi everyone, Just a quick reminder that we need to finish the presentation for Client X by Friday. Please send me your sections by the end of the day on Wednesday so I can compile and review everything. Thanks for your hard work! Best regards, Anna",
           "You're invited! Join us on Thursday, September 12th for a special evening celebrating our 10th anniversary. Enjoy drinks, appetizers, and a look back at our journey. Location: Rooftop Bar, City Center Time: 7:00 PM"] 


In [20]:

# Step 3: Call OpenAI

def call_api(email_text):
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = create_prompt(email_text)
    )
    return response.choices[0].message.content


In [21]:
# Step 4: print the result

print(call_api(messages[0]))

Reminder: Presentation Deadline for Client X by Friday


In [22]:
print(call_api(messages[1]))

"You're Invited: Celebrate Our 10th Anniversary!"


## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)

In [23]:
display_summary("https://openai.com")

# OpenAI Website Summary

OpenAI's website provides a comprehensive overview of the company's products, research initiatives, and solutions in artificial intelligence. It features sections dedicated to:

- **ChatGPT:** A conversational AI platform, including new updates for business plans with enhanced security and integration options.
- **Sora:** A platform with various features aimed at leveraging AI capabilities.
- **API Platform:** Information on API offerings, documentation, and pricing for developers.
- **Research:** Insights into ongoing and published research, including advancements in AI that are relevant to safety and healthcare.

## Key Announcements

1. **Updates to ChatGPT Business Plans:** New features include internal tools integration and flexible pricing models.
   
2. **Introduction of ChatGPT Agent:** A newly introduced product aimed at enhancing user interaction with AI.

3. **OpenAI DevDay 2025:** An announced event focused on showcasing developments in AI technology.

4. **Collaborations and Projects:**
   - A partnership with Oracle to enhance Stargate's capabilities.
   - Engaging with Mattel to develop AI applications for their brands.
   - A project that aims to create a pioneering AI clinical copilot in conjunction with Penda Health.

5. **Safety and Security Initiatives:**
   - Addressing AI risks in various domains, including biology and security.
   - Committing to responsible disclosure practices to bolster user privacy.

The site also contains a variety of stories and research updates, underpinning OpenAI's commitment to advancing AI research while ensuring ethical implications are considered.

In [24]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using Selenium and BeautifulSoup
        """
        self.url = url

        options = Options()
        options.add_argument("--headless") 
        options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36")


        driver = webdriver.Chrome(options=options)
        driver.get(url)
        time.sleep(2)  
        soup = BeautifulSoup(driver.page_source, 'html.parser')
        driver.quit()

        self.title = soup.title.string if soup.title else "No title found"

        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = "No body content found"


In [25]:
display_summary("https://openai.com")

# OpenAI Website Summary

The OpenAI website showcases its advancements in artificial intelligence, including multiple products like ChatGPT and Sora. It features an API platform, research initiatives, corporate information, and various solutions for businesses. 

## Key Features:
- **Product Offerings**: Details on ChatGPT, Sora, and the API platform with pricing information.
- **Research**: Information on ongoing advancements and publications, highlighting notable projects and initiatives.
  
## Latest News and Announcements:
- **OpenAI DevDay 2025**: Announcement to present new developments (Date: upcoming event in 2025).
- **Letter from Leadership**: Insights from Sam and Jony about the company's direction.
- **Recent Collaborations**:
  - **Stargate and Oracle**: Partnership announcement aimed at utilizing AI capabilities (Date: July 22, 2025).
  - **AI in Biology**: Preparation for addressing AI-related risks in the biological field (Date: June 18, 2025).
  - **Mattel Collaboration**: Partnership to integrate AI into Mattel’s products (Date: June 12, 2025).
  - **Security Enhancements**: Commitment to responsible disclosure in AI security (Date: June 9, 2025).
  
This website emphasizes OpenAI's mission to empower through AI while ensuring safety, security, and transparency across all its offerings.