In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama


In [7]:
# Load environment variables in a file called .env

load_dotenv(override=True)
# os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")
# Constants

MODEL = "llama3.2"

# Let's just make sure the model is loaded

!ollama pull llama3.2

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest 
pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB                         
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB                         
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB                         
pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████▏   96 B                         
pulling 34bb5ab01051... 100% ▕████████████████▏  561 B                         
verifying sha256 digest 
wri

In [16]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        """
        Create this Website object from the given URL using the BeautifulSoup library.
        Includes error handling for network errors and parsing failures.
        """
        self.url = url
        self.title = "No title found"
        self.text = "No content extracted"

        try:
            # Send an HTTP GET request to fetch the webpage
            response = requests.get(url, headers=headers, timeout=10)
            response.raise_for_status()  # Raises an HTTPError for bad responses (4xx and 5xx)

            # Parse the HTML content with BeautifulSoup
            soup = BeautifulSoup(response.content, 'html.parser')

            # Extract the title if available
            self.title = soup.title.string if soup.title else "No title found"

            # Ensure <body> tag exists before processing text
            if soup.body:
                # Remove unnecessary elements (scripts, styles, images, inputs)
                for irrelevant in soup.body(["script", "style", "img", "input"]):
                    irrelevant.decompose()

                # Extract the cleaned text content
                self.text = soup.body.get_text(separator="\n", strip=True)
            else:
                self.text = "No body content found"

        except requests.exceptions.RequestException as e:
            print(f"Error fetching URL {url}: {e}")
            self.text = f"Error: Unable to retrieve content from {url}"

        except Exception as e:
            print(f"Unexpected error while processing {url}: {e}")
            self.text = "Error: Something went wrong during parsing."





def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt


# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

# And now: call the Ollama function instead of OpenAI

def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']


# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))


In [14]:
display_summary("https://edwarddonner.com")

### Website Summary

#### Overview

The website is hosted by Edward Donner, a co-founder and CTO of Nebula.io, an AI startup that applies machine learning to help people discover their potential. The site features various sections, including about the company, its products, and community engagement.

#### Recent Announcements

* **LLM Workshop - Hands-on with Agents - resources** (January 23, 2025): A workshop providing resources for working with LLMs.
* **Welcome, SuperDataScientists!** (November 13, 2024): An announcement welcoming a new group of professionals in the field of machine learning.
* **Mastering AI and LLM Engineering - Resources** (October 16, 2024): A set of resources for improving skills in AI and LLM engineering.

#### Website Content

The website features various links to projects, including:

* **Connect Four**: A game that can be played online.
* **Outsmart**: An arena where LLMs compete against each other in a battle of diplomacy and deviousness.
* **About**: Information about Edward Donner, the company Nebula.io, and its mission.

#### Contact Information

The website includes contact information for Edward Donner, including an email address and social media links.

In [11]:
display_summary("https://cnn.com")

The article appears to be a news aggregator from CNN, featuring a wide range of topics and stories from around the world. Here are some of the main headlines and summaries:

**World News**

* Over 3,000 killed in fighting in Africa's mineral-rich DRC
* Georgia slides into authoritarianism, protesters vow to keep fighting Russian pivot
* Suspect in Brazil Christmas cake poisonings found dead in prison cell

**Business and Technology**

* Apple picks Alibaba to launch AI features in China
* Elon Musk will withdraw his nearly $100 billion bid for OpenAI if it remains a nonprofit
* Duolingo's cute owl mascot is dead. It is an ex-owl

**Health and Wellness**

* Anxiety can be debilitating. Controlling it starts with this simple step, Martha Beck says
* The secret to great sex isn’t really a secret. It’s asking the right question, 81-year-old sex professor says
* Netflix’s ‘Apple Cider Vinegar’ takes aim at wellness influencers, but there’s another problem

**Entertainment**

* David J. Phillip/AP: Photos of celebrities and events from around the world
* The week in 32 photos

**Sports**

* Football: NFL players protest police brutality on field
* Tennis: Naomi Osaka wins Australian Open final

**Politics**

* Vance turns on European allies in blistering speech that downplayed threats from Russia and China
* Friedrich Merz, Merkel’s rival and German election frontrunner

**Science and Space**

* Stunning photographs capture starling migrations
* Climate: Solutions to address climate change

**Other Topics**

* Fighting in Africa's mineral-rich DRC killed over 3,000 in less than 2 weeks. Here’s how your phone plays a part
* No doctors for sick children. This is the reality of Trump’s aid freeze in remote northern Thailand

In [15]:
display_summary("https://cripsa.com")

Error fetching URL https://cripsa.com: HTTPSConnectionPool(host='cripsa.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x1097f5220>, 'Connection to cripsa.com timed out. (connect timeout=10)'))


### Website Summary

Unfortunately, the requested website `https://cripsa.com` could not be retrieved, likely due to an error in the provided URL.

However, I can provide a general summary of what it would look like if this website had a content section:

#### Featured Content

*   This section could include any high-profile features or announcements made on the website.
*   It might feature links to blog posts, news articles, or press releases that summarize key events or updates.

#### About Us/Our Mission

*   This section would typically provide an overview of the organization's purpose, values, and goals.
*   It could also include a brief history of the company or its founders.

### Note
The lack of content on this website makes it difficult to provide a detailed summary.