# EXERCISE SOLUTION

Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI

You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

In [1]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [None]:
# Constants

MODEL = "gpt-oss:20b"
# MODEL = "gpt-oss:120b"

In [4]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [6]:
# Let's try one out

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [7]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [8]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

## Messages

The API from Ollama expects the same message format as OpenAI:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [9]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - now with Ollama instead of OpenAI

In [10]:
# And now: call the Ollama function instead of OpenAI

def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [11]:
summarize("https://edwarddonner.com")

'# Edward Donner – Personal & Professional Profile\n\n- **Who is he?**  \n  - Co‑founder & CTO of **Nebula.io** – an AI talent‑matching platform that leverages proprietary, verticalized LLMs.  \n  - Former founder & CEO of AI startup **untapt** (acquired 2021).  \n  - Passionate about coding, LLM experimentation, electronic music, and occasionally DJing.  \n  - Active on Hacker News and enjoys keeping up with tech trends.\n\n- **What does he offer?**  \n  - **LLM & AI learning courses** (e.g., “Connecting my courses – become an LLM expert and leader”, “The Complete Agentic AI Engineering Course”).  \n  - **Hands‑on workshops** for building AI agents (“LLM Workshop – Hands‑on with Agents – resources”).  \n  - **Executive briefings** on AI strategy (“2025 AI Executive Briefing”).  \n  - A creative AI game called **Outsmart** – a “battle of diplomacy and deviousness” between LLMs.  \n  - Networking and collaboration opportunities through a contact form.\n\n- **Recent News / Announcements*

In [12]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [13]:
display_summary("https://edwarddonner.com")

## Summary of **edwarddonner.com**

| Section | Key Points |
|---------|------------|
| **Overview** | Personal blog & portfolio site run by Ed Donner, a software developer, LLM enthusiast, and AI entrepreneur. |
| **Core Themes** | - Experimentation with large language models (LLMs) <br> - Development of AI products for talent discovery (Nebula.io) <br> - Interest in music production and DJing <br> - Frequent engagement with the Hacker News community |
| **Projects & Tools** | - **Connect Four** – A showcase or interactive feature (likely a game). <br> - **Outsmart** – An LLM arena where models battle in diplomacy and deviousness. |
| **About** | - Co‑founder & CTO of **Nebula.io** (AI‑powered talent management platform). <br> - Former founder & CEO of **untapt**, acquired in 2021. <br> - Holds patents on talent‑matching models and has received press coverage. |
| **Recent Posts (Announcements)** | • **May 28, 2025** – “Connecting my courses – become an LLM expert and leader” <br> • **May 18, 2025** – “2025 AI Executive Briefing” <br> • **April 21, 2025** – “The Complete Agentic AI Engineering Course” <br> • **January 23, 2025** – “LLM Workshop – Hands‑on with Agents – resources” |
| **Contact** | Email: `ed@edwarddonner.com` <br> Website: `www.edwarddonner.com` <br> Socials: LinkedIn, Twitter, Facebook |
| **Miscellaneous** | Newsletter signup, navigation links to Home, Connect Four, Outsmart, About, Posts, and a “Get in touch” page. |

**Takeaway:** The site functions as a personal hub for Ed Donner’s professional activities, AI research, and community engagement, with a focus on LLM development, talent‑AI products, and educational content.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [14]:
display_summary("https://cnn.com")

**CNN – Breaking News, Latest News & Videos**  
A CNN page that aggregates the day’s most pressing stories, opinion pieces, and video clips, with a strong focus on current events and live coverage.  

### Core News Highlights
| Category | Key Stories |
|----------|-------------|
| **U.S. Politics** | • *Ghislaine Maxwell DOJ interview* – transcript and analysis of Maxwell’s statements about Epstein and her remarks on Donald Trump.  <br>• *Menendez brothers* – latest updates on the federal case, including denial of parole and legal proceedings.  <br>• *Trump’s latest warnings* on mail‑in voting and pressure on federal officials. |
| **International** | • *Gaza City famine* – UN‑backed warning that a “man‑made famine” is underway amid the Israel‑Hamas conflict.  <br>• *Ukraine war* – analysis of Russia‑Ukraine dynamics, including Putin‑Trump diplomatic dead‑ends, underground trench warfare, and U.S. diplomatic pressure. |
| **Local & Domestic** | • *New York bus crash* – 5 dead, dozens injured after a 52‑passenger tour bus crashed in NY state.  <br>• *India’s stray dogs* – legal battle that led to one million stray dogs in Delhi regaining roaming rights. |
| **Special Features** | • *Sports gambling addiction* – investigative report on the growing crisis in the U.S.  <br>• *Ukraine trench video* – footage of Ukrainian troops living underground and defensive tactics. |
| **Science & Culture** | • New supernova discovery, rare deep‑sea creatures livestream, and climate‑related stories (e.g., venomous sea slugs shutting down Spanish beaches). |

### Supplementary Content
* **Video & Audio** – Clips from CNN correspondents on the frontlines of Russia and Ukraine, Trump’s press briefings, and feature stories on science and culture.  
* **Interactive Elements** – Live polls, weather updates, and a “CNN10” 10‑minute recap of global headlines.  
* **User Engagement** – Feedback forms for ad and video performance, subscription prompts, and personalized article recommendations (ML‑driven “violet” section).

### Design Notes
* Navigation bars list all U.S., World, Business, Health, Entertainment, and Sports categories.  
* The page contains a mix of headline stories, analysis, and multimedia, with “top numbered” cards that populate based on user preferences.

Overall, the site functions as a comprehensive news hub, blending hard news, investigative pieces, and multimedia coverage to keep readers informed on U.S. and global events.

In [15]:
display_summary("https://anthropic.com")

## Anthropic (Claude.ai) – Home Page Overview

**Anthropic’s website is a comprehensive hub for its flagship AI product, Claude, and the broader ecosystem surrounding it.**  

- **Core Product**  
  - **Claude AI** – A family of large‑language models (Claude Opus 4.1, Sonnet 4, Haiku 3.5).  
  - **Claude Code** – Tailored for coding and software development.  
  - **API & Build** – Documentation, tutorials, and the console for creating custom applications.  
  - **Plans** – Max, Team, and Enterprise tiers with pricing options.

- **Recent Announcements & News**  
  - **Aug 12 2025** – *Claude Sonnet 4 with 1 M‑token context*.  
  - **Aug 5 2025** – *Project Vend* (internal policy/initiative).  
  - **Jun 26 2025** – *Agentic Misalignment* policy update.  
  - **Jun 20 2025** – *Alignment* research/initiative release.  
  - **May 22 2025** – *Introducing Claude 4* (new model launch).  
  - **Mar 27 2025** – *Anthropic Economic Index* and *Societal impacts* studies.

- **Safety & Governance**  
  - Dedicated sections on **AI safety**, **responsible scaling policy**, **transparency**, and a **trust center**.  
  - Highlights ISO 42001 certification for quality management.

- **Learning & Community**  
  - **Anthropic Academy** – Courses on building with Claude.  
  - **Research** – Overview, economic index, model details, and interpretability studies.  
  - **Customer stories & case studies** – Real‑world applications.

- **Business & Support**  
  - **Sales & support** contact portals.  
  - Careers, events, and startup program listings.

- **Technical & Cloud Partnerships**  
  - Integration options for Amazon Bedrock, Google Vertex AI, and other cloud platforms.

- **Footer & Legal**  
  - Standard links to privacy, cookie settings, terms, and status pages.

In short, the site positions Anthropic as a safety‑focused AI provider, offering cutting‑edge models, developer tooling, and a transparent governance framework while continuously updating the community with new releases and policy insights.

# Sharing your code

I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.

If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.

PR instructions courtesy of an AI friend: https://chatgpt.com/share/670145d5-e8a8-8012-8f93-39ee4e248b4c