# Web Page Summarization using DeepSeek via Ollama HTTP API

## Introduction

This project demonstrates how to summarize web pages by leveraging a local DeepSeek model through Ollama's HTTP API. The script performs the following tasks:

- **Fetching and Cleaning:** Retrieves the content of a web page and cleans it using BeautifulSoup.
- **Message Construction:** Formats the content into a list of messages compatible with the model (similar to how it's done with the OpenAI API).
- **Local Summarization:** Sends the messages to your local DeepSeek model (named `deepseek-r1:latest`) via an HTTP POST request to generate a summary.
- **Display:** Renders the generated summary using IPython Markdown, making it ideal for interactive use in Jupyter Notebooks.

This approach allows you to run AI summarization locally without relying on external API calls.


In [13]:
#!/usr/bin/env python
# coding: utf-8

"""
Web Page Summarization using DeepSeek via Ollama

This script fetches a web page, cleans it up, builds a list of messages, and then
calls your locally running DeepSeek model (via Ollama) to generate a summary.
"""

import os
import json
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display



In [14]:
# Ollama API endpoint and configuration for your local DeepSeek model
OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "deepseek-r1:latest"  # Updated model name from your ollama list

In [15]:
# Define a user agent for fetching websites
headers = {
    "User-Agent": ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/117.0.0.0 Safari/537.36")
}

class Website:
    def __init__(self, url):
        """
        Initialize the Website object by fetching and cleaning content.
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        # Remove irrelevant tags
        for tag in soup.body(["script", "style", "img", "input"]):
            tag.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

# def messages_for(website):
#     """
#     Build a list of messages (dictionaries) for the DeepSeek model.
#     Customize this function as needed to provide context and instructions to the model.
#     """
#     messages = [
#         {"role": "system", "content": "You are an expert summarizer."},
#         {"role": "user", "content": f"Summarize the following web page content:\nTitle: {website.title}\nContent:\n{website.text}"}
#     ]
#     return messages

def messages_for(website):
    messages = [
        {
            "role": "system",
            "content": "You are an expert summarizer. Provide only a concise summary of the content without showing any internal reasoning or chain-of-thought."
        },
        {
            "role": "user",
            "content": f"Summarize the following web page content:\nTitle: {website.title}\nContent:\n{website.text}"
        }
    ]
    return messages

In [16]:
def summarize(url):
    """
    Summarize the web page content by sending a POST request to the Ollama API.
    """
    website = Website(url)
    messages = messages_for(website)
    
    payload = {
        "model": MODEL,
        "messages": messages,
        "stream": False
    }
    
    response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)
    if response.status_code == 200:
        summary = response.json()['message']['content']
    else:
        summary = f"Error: {response.status_code} - {response.text}"
    
    return summary



In [17]:
def display_summary(url):
    """
    Display the summary for the given URL using IPython Markdown.
    """
    summary = summarize(url)
    display(Markdown(summary))



In [18]:
# Example usage:
if __name__ == "__main__":
    # Replace with any URL you want to summarize.
    test_url = "https://opportunitiesforyoungkenyans.co.ke/"
    print("Generating summary for:", test_url)
    print(summarize(test_url))


Generating summary for: https://opportunitiesforyoungkenyans.co.ke/
<think>
Okay, so I need to summarize this web page content about opportunities for young Kenyans. Let me read through it carefully.

The title is "Opportunities for Young Kenyans - Launch Your Career with Genuine Employers." That gives a clear indication that the site offers career launching resources.

Looking at the content, there are sections like About Us, which includes goals and objectives, sources for jobs, disclaimer, direct job connection, identifying fraudulent jobs, partnerships, and contact info. This suggests the site is job listing or career guidance oriented.

Then I see categories like Casual Jobs, Internships, Forums, Scholarships under a menu bar labeled "Universities and Colleges." There are specific job listings: Garissa University Hiring Lecturers, AITEC Clinical Officer Trainer, Kitonga Garden Resort vacancies, among others. These seem to be permanent or specific role positions.

The site also lis

## Conclusion
This code provides a flexible framework for summarizing web pages using a local DeepSeek model via Ollama's HTTP API. By fetching and cleaning web content, constructing an appropriate messages payload, and sending it to the model, you can generate summaries without relying on external services. Feel free to extend or customize this approach for other NLP tasks or additional model integrations.