# Webpage Summarizer with Ollama and Llama 3.2

This Jupyter Notebook demonstrates a simple yet powerful workflow for summarizing any webpage using a locally-run Large Language Model (LLM) with Ollama.

**The process is as follows:**
1.  Input a URL of a webpage.
2.  The notebook fetches the HTML content of the page.
3.  It uses the `BeautifulSoup` library to parse the HTML and extract the main text content, cleaning out irrelevant tags like scripts, styles, and images.
4.  It constructs a prompt containing the extracted text.
5.  It sends this prompt to a locally running Ollama model (`llama3.2`) to generate a concise summary.
6.  The final summary is displayed in a clean, readable format.

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = "llama3.2"` to `MODEL = "llama3.2:1b"`

### Importing Necessary Libraries

In [None]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [None]:
# Constants
MODEL = "llama3.2"

### Fetching and Cleaning Webpage Content

In [None]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [None]:
# We can test beautiful soup and the website class is working.
# Change the website and add print statements to follow along.

website = Website("https://www.cnn.com/")
print(website.title)
print(website.text)

### Summarizing the Content with Ollama

In [None]:
# Define our system prompt

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [None]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [None]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [None]:
def summarize(url):
    website = Website(url)
    
    response = ollama.chat(
        model=MODEL, 
        messages=messages_for(website)
    )

    response
    return response.message.content

In [None]:
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

### Run the Summarizer!

In [None]:
# Feel free to change the URL in the cell below to summarize any webpage you want!
display_summary("https://cnn.com")