# 🌐 Website Summarizer

Transform any webpage into intelligent, AI-powered summaries with local LLM magic ✨

This notebook demonstrates how to:
- 🕷️ Scrape web content intelligently
- 🤖 Generate AI summaries using local Ollama
- 📝 Display beautiful markdown output

## 📦 Import Required Libraries

In [None]:
# Core imports
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

print("📦 All libraries imported successfully!")

## 🤖 Setup Ollama Connection

Make sure Ollama is running on `localhost:11434` with Llama 3.2 model installed.

In [None]:
# Initialize OpenAI client for Ollama
# If you get an error, make sure Ollama is running: ollama serve
openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

print("🦙 Ollama client initialized!")

## 🕷️ Website Scraper Class

Smart web scraping that filters out navigation noise and extracts pure content.

In [None]:
# Stealth headers to avoid being blocked
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        """
        🌐 Create Website object from URL using BeautifulSoup
        Automatically extracts title and clean text content
        """
        self.url = url
        print(f"🔍 Fetching: {url}")
        
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        
        # Extract title
        self.title = soup.title.string if soup.title else "No title found"
        
        # Remove irrelevant elements (scripts, styles, images, inputs)
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        
        # Extract clean text
        self.text = soup.body.get_text(separator="\n", strip=True)
        
        print(f"✅ Extracted {len(self.text)} characters from '{self.title}'")

print("🕷️ Website scraper class ready!")

## 🧠 AI Prompt Engineering

Crafting the perfect prompts for intelligent website summarization.

In [None]:
# System prompt - defines the AI's personality and behavior
system_prompt = """You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."""

def user_prompt_for(website):
    """🎯 Generate user prompt for website summarization"""
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

def messages_for(website):
    """📝 Create complete message structure for API call"""
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

print("🧠 AI prompts configured!")

## ✨ Main Summarization Functions

The magic happens here - AI-powered website summarization!

In [None]:
def summarize(url):
    """🚀 Main function: URL → AI Summary"""
    website = Website(url)
    
    print("🤖 Generating AI summary...")
    response = openai.chat.completions.create(
        model="llama3.2",
        messages=messages_for(website)
    )
    
    return response.choices[0].message.content

def display_summary(url):
    """📺 Display formatted summary in Jupyter"""
    summary = summarize(url)
    print("\n" + "="*50)
    print("📋 WEBSITE SUMMARY")
    print("="*50)
    display(Markdown(summary))

print("✨ Summarization functions ready!")

## 🧪 Test the Website Summarizer

Let's test our website summarizer with a real example!

In [None]:
# Test with Edward Donner's website
test_url = "https://edwarddonner.com"
ed = Website(test_url)

print(f"🎯 Title: {ed.title}")
print(f"📄 Content length: {len(ed.text)} characters")
print(f"📝 First 200 chars: {ed.text[:200]}...")

## 🎬 Live Demo

Watch the magic happen! Uncomment and run the cell below to generate a real summary.

In [None]:
# Uncomment to run live demo (make sure Ollama is running!)
# display_summary("https://oksgt.com")

print("🎭 Ready for live demo! Uncomment the line above to test.")

## 🎯 Try Your Own URLs

Test the summarizer with any website you want!

In [None]:
# Your turn! Replace with any URL you want to summarize
your_url = "https://news.ycombinator.com"  # Change this!

# Uncomment to test:
# display_summary(your_url)

print(f"🎪 Ready to summarize: {your_url}")
print("💡 Uncomment the display_summary line to test!")

## 🎨 Customization Playground

Experiment with different models and prompts!

In [None]:
# Try different system prompts
creative_prompt = """You are a witty AI that analyzes websites and provides 
entertaining summaries with emojis and humor. Make it fun!"""

technical_prompt = """You are a technical analyst that provides detailed, 
structured summaries focusing on technical aspects and key metrics."""

# Try different models (if you have them installed)
alternative_models = [
    "llama3.1",
    "mistral",
    "codellama"
]

print("🎨 Customization options ready!")
print("💡 Modify the system_prompt variable and re-run to experiment!")

## 🏁 Conclusion

You now have a powerful, local AI-powered website summarizer! 🎉

### What we built:
- 🕷️ Smart web scraper that filters noise
- 🤖 Local AI integration with Ollama
- 📝 Beautiful markdown output
- 🎨 Customizable prompts and models

### Next steps:
- Try different websites and domains
- Experiment with custom prompts
- Add error handling and logging
- Create a web interface

**Happy summarizing! 🌐✨**