## üöÄ AI Web Summarizer: The "Reader's Digest" of the Internet

**Turn any URL into a concise summary instantly.**

This notebook implements an intelligent content agent that acts as a specialized "browser." Instead of displaying raw HTML, it extracts the core content from a URL and uses Generative AI to distill the information into a structured summary.

#### üõ†Ô∏è Tech Stack & Alternatives
* **LLM Engine:** Ollama (Llama 3.2 for local inference) / OpenAI API
* **Extraction:** BeautifulSoup4 (Static content)
* **Advanced Extraction:** Selenium (Dynamic/JS content)

In [11]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI
import ollama

##### ü™Ñ You must run one of the two options: **BeautifulSoup4** or **Selenium**

#### **BeautifulSoup4**

In [2]:
# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

#### **Selenium**

In [None]:
!pip install selenium
!pip install webdriver-manager

In [3]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
from selenium.common.exceptions import TimeoutException, WebDriverException

class Website:
    def __init__(self, url, headless=True, wait_timeout=10):
        self.url = url
        self.title = None
        self.text = None

        options = Options()
        if headless:
            options.add_argument("--headless=new")  # o "--headless" seg√∫n versi√≥n
        options.add_argument("--no-sandbox")
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument("--disable-blink-features=AutomationControlled")
        options.add_argument("--disable-infobars")
        options.add_argument("window-size=1200,800")
        # options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64)...")

        # Use webdriver-manager to install correct chromedriver
        service = Service(ChromeDriverManager().install())

        driver = webdriver.Chrome(service=service, options=options)
        driver.set_page_load_timeout(30)

        try:
            driver.get(url)

            # Esperar expl√≠citamente a que <body> est√© presente (o a un selector m√°s espec√≠fico)
            try:
                WebDriverWait(driver, wait_timeout).until(
                    EC.presence_of_element_located((By.TAG_NAME, "body"))
                )
            except TimeoutException:
                # continuar: tal vez la p√°gina no tiene body o tarda mucho
                pass

            page_source = driver.page_source
            soup = BeautifulSoup(page_source, "html.parser")

            # Title (seguro)
            self.title = soup.title.string.strip() if soup.title and soup.title.string else "No title found"

            if soup.body:
                # eliminar elementos irrelevantes
                for tag in soup.body.find_all(["script", "style", "img", "input"]):
                    tag.decompose()
                self.text = soup.body.get_text(separator="\n", strip=True)
            else:
                self.text = ""  # o None, seg√∫n prefieras

        except WebDriverException as e:
            # Manejo/registro del error real
            print(f"WebDriver error for {url}: {e}")
            raise
        finally:
            driver.quit()

##### Types of prompts and messages

In [4]:
system_prompt = """
You are a assistant that analyzes the contents of a website,
and provides a short and serious summary, ignoring text that might be navigation related.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

In [5]:
# A function that writes a User Prompt that asks for summaries of websites:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [6]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [7]:
# It retrieves the summary of a URL and displays it directly in Markdown.
def summarize_and_display(client, url, model):
    response = client.chat.completions.create(
        model=model,
        messages=messages_for(Website(url))
    )
    
    content = response.choices[0].message.content
    display(Markdown(content))

In [8]:
SITE_URL = "https://www.ebiseducation.com/master-en-ingenieria-y-desarrollo-de-soluciones-de-ia-generativa"

#### Connecting to OpenAI

In [13]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')
client = OpenAI(api_key=api_key)
summarize_and_display(client, SITE_URL, "gpt-4o-mini")

# Resumen del M√°ster en Ingenier√≠a y Desarrollo de IA Generativa

El sitio web presenta el **M√°ster en Ingenier√≠a y Desarrollo de Soluciones de IA Generativa**, que abre su matr√≠cula para la promoci√≥n 2025/2026. Este programa tiene como objetivo capacitar a los estudiantes en la creaci√≥n de soluciones de inteligencia artificial generativa mediante el uso de programaci√≥n en Python, frameworks y APIs.

## Detalles del Programa
- **Duraci√≥n**: 1 a√±o acad√©mico.
- **Idioma**: Espa√±ol.
- **Modalidades**: Live Streaming y Online Flexible.
- **Certificaciones**: Incluye un t√≠tulo de la escuela de negocios (EBIS) y otro de la Universidad de Vitoria-Gasteiz (EUNEIZ), adem√°s de certificaciones en Azure AI y un certificado de Harvard.

## Objetivos y P√∫blico Objetivo
El m√°ster est√° dise√±ado para **profesionales t√©cnicos** como desarrolladores, ingenieros y cient√≠ficos de datos. Se espera que los participantes tengan conocimientos en Python, con opci√≥n de un curso de nivelaci√≥n previo para aquellos que no lo dominan.

## Ayudas Econ√≥micas
Se ofrecen becas parciales y opciones de financiaci√≥n a trav√©s de FUNDAE, haci√©ndolo accesible para particulares y empresas.

## Estructura del Curso
El curso aborda un amplio rango de temas, desde los fundamentos de la IA generativa hasta el desarrollo de aplicaciones y la √©tica en IA. Incluye aspectos pr√°cticos como la creaci√≥n y entrenamiento de modelos, aplicaciones multimodales y estrategias de despliegue.

## Soporte Continuo y Oportunidades
Los graduados tienen acceso a recursos de actualizaci√≥n continua, oportunidades de networking, bolsa de trabajo, y apoyo en la transformaci√≥n de proyectos en startups.

Este m√°ster se enfoca en preparar a sus estudiantes para ser l√≠deres en el campo de la inteligencia artificial generativa, adapt√°ndose a la evoluci√≥n constante de la tecnolog√≠a.

#### Connecting to Ollama

In [9]:
models = ollama.list()

for model in models.models:
    print(model.model)

llama3.2:latest
qwen3:4b


In [None]:
client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
summarize_and_display(client, SITE_URL, "llama3.2")