In [1]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

In [16]:
import ollama

In [3]:
class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [4]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [5]:
def messages_for(website):
    return [
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [6]:
ed = Website("https://edwarddonner.com")

In [8]:
print(ed.title)

Home - Edward Donner


In [9]:
user_prompt_for(ed)

'You are looking at a website titled Home - Edward Donner\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nHome\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,\n

In [10]:
MODEL = "llama3.2"

In [18]:
def summarize(url):
    website = Website(url)
    response = ollama.chat(model=MODEL, messages=messages_for(website))
    return response['message']['content']

In [17]:
messages_for(ed)

[{'role': 'user',
  'content': 'You are looking at a website titled Home - Edward Donner\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nHome\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder 

In [19]:
summarize("https://www.edwarddonner.com")

"**Summary of the Website**\n\n### Home\n\n* The website is created by Edward Donner, a co-founder and CTO of Nebula.io, an AI startup that applies AI to help people discover their potential.\n* The website showcases Edward's interests in writing code, experimenting with LLMs, DJing, and electronic music production.\n\n### Outsmart Arena\n* An arena where Large Language Models (LLMs) are pitted against each other in a battle of diplomacy and deviousness.\n* A platform for exploring the capabilities and limitations of LLMs in a competitive setting.\n\n### News/Announcements\n\n#### Recent Posts\n- **Mastering AI and LLM Engineering – Resources** (November 13, 2024): A collection of resources for learning about AI and LLM engineering.\n- **From Software Engineer to AI Data Scientist – resources** (October 16, 2024): Resources for transitioning from a software engineer to an AI data scientist role.\n- **Outsmart LLM Arena – a battle of diplomacy and deviousness** (June 26, 2024): A descri

In [20]:
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [21]:
display_summary("https://cnn.com")

Here are the top headlines from CNN:

1. **"Durham Family/KVVU": Las Vegas police officer shot and killed man who called 911 for help**
2. **"Passenger was duct-taped after trying to open plane door mid-flight, authorities say."** (Federal charges pending)
3. **"Jairam Dookoo: Woman sneaked onto Paris flight bypassed checkpoints by jumping into different security lines, TSA says."**
4. **"America's biggest private company is laying off thousands of workers."**

These are some of the top headlines from CNN. Let me know if you'd like to see more!