# Summarize a webpage using an Open Source model running locally via Ollama rather than OpenAI

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

In [1]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [2]:
# Constants

MODEL = "llama3.2"

In [3]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [5]:
# Let's try one out

ws = Website("https://huggingface.co/")
print(ws.title)
print(ws.text)

Hugging Face – The AI community building the future.
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
mistralai/Magistral-Small-2506
Updated
3 days ago
•
7.67k
•
387
nanonets/Nanonets-OCR-s
Updated
about 7 hours ago
•
1.81k
•
194
openbmb/MiniCPM4-8B
Updated
1 day ago
•
4.55k
•
234
echo840/MonkeyOCR
Updated
2 days ago
•
191
deepseek-ai/DeepSeek-R1-0528
Updated
16 days ago
•
120k
•
1.97k
Browse 1M+ models
Spaces
Running
8.03k
8.03k
DeepSite
🐳
Generate any application with DeepSeek
Running
on
Zero
1.06k
1.06k
Chatterbox TTS
🍿
Expressive Zeroshot TTS
Running
on
Zero
683
683
Wan2.1 Fast
🎥
Generate smooth animations from images
Running
170
170
Sheets
🗂
Convert ideas into structured datasets
Running
133
133
AI Marketing Content Generator
🎨
An AI-powered t

## Types of prompts

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [6]:
# Define our system prompt 

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [7]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [16]:
print(user_prompt_for(ws))

You are looking at a website titled Hugging Face – The AI community building the future.The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
mistralai/Magistral-Small-2506
Updated
3 days ago
•
7.67k
•
387
nanonets/Nanonets-OCR-s
Updated
about 7 hours ago
•
1.81k
•
194
openbmb/MiniCPM4-8B
Updated
1 day ago
•
4.55k
•
234
echo840/MonkeyOCR
Updated
2 days ago
•
191
deepseek-ai/DeepSeek-R1-0528
Updated
16 days ago
•
120k
•
1.97k
Browse 1M+ models
Spaces
Running
8.03k
8.03k
DeepSite
🐳
Generate any application with DeepSeek
Running
on
Zero
1.06k
1.06k
Chatterbox TTS
🍿
Expressive Zeroshot TTS
R

## Messages

The API from Ollama expects the same message format as OpenAI:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [8]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - now with Ollama instead of OpenAI

In [9]:
# And now: call the Ollama function instead of OpenAI

def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [10]:
summarize("https://huggingface.co/")



In [11]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [12]:
display_summary("https://huggingface.co/")

### Hugging Face Website Summary

#### Mission and Overview
Hugging Face is the AI community building the future, providing a platform for machine learning collaboration on models, datasets, and applications.

#### Featured Models and Datasets

* **Models**: Showcase 1M+ models, with recent updates to Magistral-Small-2506, Nanonets-OCR-s, MiniCPM4-8B, MonkeyOCR, and DeepSeek-R1-0528.
* **Datasets**: Browse over 250k datasets, including updated releases from NVIDIA, fka/awesome-chatgpt-prompts, open-thoughts/OpenThoughts3-1.2M, and more.

#### Community and Features

* Explore AI Apps: A platform for collaborative work on machine learning applications.
* Host and collaborate on unlimited public models, datasets, and applications using the HF Open source stack.
* Build your portfolio by sharing your ML work with the world and building your ML profile.
* Access Compute solutions starting at $0.60/hour for GPU, Enterprise solutions starting at $20/user/month, and more.

#### Partnerships

* Featured partnerships with leading companies: AI2 (enterprise), AI at Meta (company), Amazon (company), Google (company), Intel (company), Microsoft (company), Grammarly (Enterprise company), Writer (Enterprise company).

### Recent Updates
Recent updates include new models and datasets, as well as announcements of Compute and Enterprise solutions.

* **Compute Solutions**: Optimized Inference Endpoints for deployment on GPU.
* **Enterprise Solutions**: Advanced platform with enterprise-grade security, access controls, and dedicated support starting at $20/user/month.

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT)

Also Websites protected with CloudFront (and similar) may give 403 errors.

Many websites will work just fine!