In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

# If you get an error running this cell, then please head over to the troubleshooting notebook!

In [2]:
# Constants

OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"

In [34]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

!ollama pull llama3.2

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

    def user_prompt_for(self):
        user_prompt = f"You are looking at a website titled {self.title}"
        user_prompt += "\nThe contents of this website is as follows; \
                        please provide a short summary of this website in markdown. \
                        If it includes news or announcements, then summarize these too.\n\n"
        user_prompt += self.text
        return user_prompt

    def messages_for(self):
        return [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": self.user_prompt_for()}
        ]

    def generateSummary(self):
        try:
            payload = {
                "model": MODEL,
                "messages": self.messages_for(),
                "stream": False
            }
            response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)
            response_text = response.json()['message']['content']
            return response_text
        except Error:
            print('Error while generating summary: ' + str(Error))
            return ""
        
        
        

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB                         [K
pulling 56bb8bd477a5... 100% ▕█████

In [24]:
# Let's try one out. Change the website and add print statements to follow along.

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

In [25]:
print(ed.user_prompt_for())

You are looking at a website titled Home - Edward Donner
The contents of this website is as follows;                         please provide a short summary of this website in markdown.                         If it includes news or announcements, then summarize these too.

Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously

In [26]:
# Try this out, and then try for a few more websites

ed.messages_for()

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Home - Edward Donner\nThe contents of this website is as follows;                         please provide a short summary of this website in markdown.                         If it includes news or announcements, then summarize these too.\n\nHome\nConnect Four\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. 

In [35]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    messages = website.messages_for()
    return website.generateSummary()

In [31]:
summarize("https://edwarddonner.com")

response: **Summary**

This website belongs to Edward Donner, the co-founder and CTO of Nebula.io. The website appears to be a personal blog or portfolio, showcasing his interests in artificial intelligence (AI), machine learning (ML), and related technologies.

### News and Announcements
---------------------------

* **January 23, 2025**: Upcoming LLM Workshop – Hands-on with Agents – resources
* **December 21, 2024**: Welcome to the SuperDataScientists community!
* **November 13, 2024**: Mastering AI and LLM Engineering – Resources
* **October 16, 2024**: From Software Engineer to AI Data Scientist – resources

### Content Overview
-------------------

The website is divided into several sections:

*   Home: Introduction to Edward Donner
*   Connect Four: Not provided in the given text
*   Outsmart: An arena that pits LLMs against each other in a battle of diplomacy and deviousness
*   About: Posts, contact information, and social media links

Overall, this website seems to focus on



In [36]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [37]:
display_summary("https://edwarddonner.com")

### Summary of Website

This website appears to be the personal blog of Edward Donner, a technology entrepreneur and AI researcher. The main sections are:

#### About Me
Edward introduces himself as the co-founder and CTO of Nebula.io, an AI startup that applies machine learning (ML) to help people discover their potential.

#### News and Announcements
The website mentions three recent announcements:
* **LLM Workshop – Hands-on with Agents**: A workshop event where participants can learn about agents in LLMs.
* **December 21, 2024: Welcome, SuperDataScientists!** (no further details available)
* **October 16, 2024: From Software Engineer to AI Data Scientist – resources** (link to a resource page)

#### Projects and Ventures
Edward is involved with several projects, including:
* Nebula.io: An AI startup that applies ML to talent acquisition and management.
* untapt: An AI startup acquired by another company in 2021.

### Links

* Contact Information: ed [@] edwarddonner [dot] com
* Social Media:
	+ LinkedIn
	+ Twitter
	+ Facebook

In [38]:
display_summary("https://cnn.com")

This is a CNN news feed, and it appears to be a collection of various articles, videos, and podcasts related to current events, politics, science, technology, and more. Here are some of the top stories and topics that caught my attention:

**Politics and World News**

* Ukraine-Russia War: Tensions remain high between Ukraine and Russia, with both sides accusing each other of aggression.
* Israel-Hamas War: The conflict in Gaza continues to escalate, with Palestinian militants firing rockets at Israeli cities.
* US Politics: President Biden's approval ratings are sinking, according to a new poll.
* Climate Change: The world is facing a climate crisis, and scientists are warning that urgent action is needed to mitigate its effects.

**Science and Technology**

* AI and Machine Learning: Tech companies are developing more sophisticated AI algorithms, raising concerns about job displacement and bias.
* Space Exploration: NASA's Artemis program aims to return humans to the moon by 2025.
* Cybersecurity: Hackers are targeting small businesses and individuals with increasingly sophisticated attacks.

**Entertainment**

* Music: Drake dropped a new album, and fans are divided on its quality.
* Movies: The latest Marvel movie has broken box office records, but critics are panning it.
* Celebrity News: Gene Hackman's iconic roles have been remembered in a new retrospective exhibit.

**Business and Economy**

* Finance: Tropicana is in big financial trouble, and the company is struggling to stay afloat.
* Retail: Goya Foods has been embroiled in controversy over its labor practices.
* Investment: Amazon is developing more AI-powered devices, raising concerns about job displacement.

**Health and Wellness**

* Mental Health: Robert F. Kennedy Jr.'s book on mental health has sparked debate among experts.
* Fitness: Regular exercise can reduce the risk of Alzheimer's disease, according to new research.
* Nutrition: Eating plenty of fluids is essential during Ramadan, which begins next month.

These are just a few of the many topics and stories that are currently trending. I hope this gives you an idea of what's happening in the world today!

In [41]:
display_summary("https://anthropic.com")

**Anthropic Website Summary**
==========================

### Overview

Anthropic is an AI safety and research company based in San Francisco, with an interdisciplinary team experienced in ML, physics, policy, and product.

### News and Announcements

* **Claude 3.7 Sonnet**: Our most intelligent AI model is now available.
* **Switch to annual plan**: Unlock big ideas with a year of Claude Pro for $180/year (reg. $216) by February 28, 2025.
* **Alignment research**: Constitutional AI: Harmlessness from AI Feedback and Core Views on AI Safety: When, Why, What, and How.

### Research Insights

Anthropic publishes research insights on AI safety, including alignment, constitutional AI, and other topics related to the field.

### Products and Services

Anthropic offers a range of products and services, including Claude, an AI platform for building custom experiences and applications.

### Company Information

* **About**: Learn more about Anthropic's team, mission, and values.
* **Careers**: See open roles at Anthropic.
* **Press Inquiries**: Contact Anthropic for media inquiries.
* **Support**: Access support resources and status information.