# YOUR FIRST LAB - Week 1, Day 1
## 🎯 AI-Powered Web Summarization with Ollama

### What You'll Build
An intelligent web browser that automatically summarizes websites using Large Language Models.

---

### Prerequisites
- ✅ Docker containers running (Conda + Ollama)
- ✅ LLM kernel selected (Python 3.11)
- ✅ Global `.env` file configured
- ✅ Ollama accessible at `http://localhost:11434`

### Learning Objectives

**1. Environment Setup**
- Load environment variables from `.env`
- Connect to Ollama API (OpenAI-compatible interface)

**2. Web Scraping**
- Extract website content with BeautifulSoup
- Handle HTML parsing and cleaning

**3. Prompt Engineering**
- Create effective system and user prompts
- Structure messages for LLM APIs

**4. LLM API Integration**
- Make API calls to Ollama
- Control model behavior (temperature, etc.)

**5. Practical Application**
- Build summarization function
- Generate email subject lines

**Expected Output:** A working prototype that summarizes any URL.

---

### Quick Start
1. Press `Shift + Enter` to execute each cell
2. Install dependencies (uncomment if needed)
3. Verify `.env` configuration loads
4. Run connection test
5. Experiment with different websites

---

### 💡 Learning Approach
Execute this notebook yourself after watching the lecture. Add print statements, experiment with variations, and share your work on GitHub to showcase your skills.

## 🔧 Setup

### Select the Kernel

1. Click **"Select Kernel"** (top-right)
2. Choose **`llm (Python 3.11.x)`**

### Prerequisites

- Docker containers running (`conda-jupyter`, `ollama`)
- Global `.env` configured at `/workspace/.env`
- Ollama accessible at `http://localhost:11434`

**Note:** Full setup instructions are in the [README](../README.md).

In [None]:
# Installers
import sys
# Uncomment to install required packages
# !{sys.executable} -m pip install python-dotenv
# !{sys.executable} -m pip install beautifulsoup4
# !{sys.executable} -m pip install requests
# !{sys.executable} -m pip install openai  # openai library works with Ollama too!


## Instalación de Dependencias

Ejecuta la siguiente celda para instalar las dependencias necesarias. 

**Nota importante:** Estas se instalarán en el entorno correcto (LLM) gracias al uso de `sys.executable`.

In [2]:
# imports
import os
from dotenv import load_dotenv
from scraper import fetch_website_contents
from IPython.display import Markdown, display
from openai import OpenAI

# If you get an error running this cell, then please head over to the troubleshooting notebook!

# Connecting to OpenAI (or Ollama)

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.  

## Troubleshooting if you have problems:

If you get a "Name Error" - have you run all cells from the top down? Head over to the Python Foundations guide for a bulletproof way to find and fix all Name Errors.

Any concerns about API costs? See my notes in the README - costs should be minimal, and you can control it at every point. You can also use Ollama as a free alternative, which we discuss during Day 2.

In [27]:
# Load environment variables from global .env file

# Path to global .env file in the conda project root
global_env_path = '/workspace/.env'
load_dotenv(dotenv_path=global_env_path, override=True)

# Get Ollama configuration from .env
ollama_base_url = os.getenv('OLLAMA_BASE_URL')
ollama_api_key = os.getenv('OLLAMA_API_KEY')
ollama_model = os.getenv('OLLAMA_MODEL')

# Check the configuration
if not ollama_base_url:
    print("Error: OLLAMA_BASE_URL not found in .env file!")
    print(f"   Looking for .env at: {global_env_path}")
elif not ollama_api_key:
    print("Error: OLLAMA_API_KEY not found in .env file!")
elif not ollama_model:
    print("Error: OLLAMA_MODEL not found in .env file!")
else:
    print("✅ Ollama API Key found!")
    print(f"✅ Ollama Base URL: {ollama_base_url}")
    print(f"✅ Ollama Model: {ollama_model}")
    print("✅ Ollama configuration loaded successfully!")
    print(f"   Configuration loaded from: {global_env_path}")


✅ Ollama API Key found!
✅ Ollama Base URL: http://192.168.80.200:11434
✅ Ollama Model: gpt-oss:20b-cloud
✅ Ollama configuration loaded successfully!
   Configuration loaded from: /workspace/.env


### Let's make a quick call to a Frontier model to get started, as a preview!

In [28]:
# Initialize OpenAI client pointing to Ollama
openai = OpenAI(
    base_url=f"{ollama_base_url}/v1",  # Ollama exposes OpenAI-compatible API at /v1
    api_key=ollama_api_key  # Using API key from .env
)

# Test message
message = "Hello! This is my first message to you via Ollama! Hi!"

messages = [{"role": "user", "content": message}]

messages


[{'role': 'user',
  'content': 'Hello! This is my first message to you via Ollama! Hi!'}]

In [29]:
# Make the API call to Ollama using the configured model
response = openai.chat.completions.create(
    model=ollama_model,  # Using the model from .env
    messages=messages
)
response.choices[0].message.content


'Hello! 👋 Great to meet you—welcome to Ollama! How can I help you today?'

In [30]:
# ALTERNATIVE: Direct Ollama API call using requests (commented out)
# This shows how to call Ollama WITHOUT the OpenAI client library

"""
import requests
import json

# Direct API call to Ollama
ollama_url = f"{ollama_base_url}/api/chat"

payload = {
    "model": ollama_model,
    "messages": messages,
    "stream": False  # Get complete response at once
}

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {ollama_api_key}"
}

# Make the request
response = requests.post(ollama_url, json=payload, headers=headers)

# Parse the response
if response.status_code == 200:
    result = response.json()
    ollama_response = result["message"]["content"]
    print(ollama_response)
else:
    print(f"Error: {response.status_code}")
    print(response.text)
"""

# WHY USE OPENAI CLIENT INSTEAD?
# - Cleaner code (less boilerplate)
# - Automatic error handling
# - Easy to switch between OpenAI and Ollama
# - Industry standard interface

print("💡 This cell shows the alternative approach (commented out)")
print("✅ We use the OpenAI client for simplicity and portability")


💡 This cell shows the alternative approach (commented out)
✅ We use the OpenAI client for simplicity and portability


### 🔍 Alternative: Native Ollama API Call (Without OpenAI Client)

The cell above uses the **OpenAI client library** pointing to Ollama. Here's how you would call Ollama **directly** using its native API with `requests`:

**Pros of direct approach:**
- ✅ No dependency on OpenAI library
- ✅ Direct control over HTTP requests
- ✅ Explicit about using Ollama

**Cons:**
- ❌ More verbose code
- ❌ Need to handle HTTP errors manually
- ❌ Less portable (can't switch to OpenAI easily)

### OK onwards with our first project

In [31]:
# Let's try out this utility

ed = fetch_website_contents("https://edwarddonner.com")
print(ed)

Home - Edward Donner

Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Conne

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [32]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = """
You are a snarkyassistant that analyzes the contents of a website,
and provides a short, snarky, humorous summary, ignoring text that might be navigation related.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

In [33]:
# Define our user prompt

user_prompt_prefix = """
Here are the contents of a website.
Provide a short summary of this website.
If it includes news or announcements, then summarize these too.

"""

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```
To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

In [34]:
# Simple test with Ollama
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

response = openai.chat.completions.create(
    model=ollama_model,
    messages=messages
)
response.choices[0].message.content


'4'

## And now let's build useful messages for GPT-4.1-mini, using a function

In [35]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_prefix + website}
    ]

In [36]:
# Try this out, and then try for a few more websites

messages_for(ed)

[{'role': 'system',
  'content': '\nYou are a snarkyassistant that analyzes the contents of a website,\nand provides a short, snarky, humorous summary, ignoring text that might be navigation related.\nRespond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.\n'},
 {'role': 'user',
  'content': '\nHere are the contents of a website.\nProvide a short summary of this website.\nIf it includes news or announcements, then summarize these too.\n\nHome - Edward Donner\n\nHome\nConnect Four\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. 

## Time to bring it together - the API for OpenAI is very simple!

In [37]:
# Call Ollama API to summarize website content

def summarize(url):
    website = fetch_website_contents(url)
    response = openai.chat.completions.create(
        model=ollama_model,  # Using Ollama model from .env
        messages=messages_for(website)
    )
    return response.choices[0].message.content


In [38]:
summarize("https://edwarddonner.com")

"# Edward Donner: Chief Code & DJ of the Unknown\n\n> **Quick intro**  \n> Meet *Ed*, the **code‑junkie billionaire of the little‑known AI realm**. When he’s not drafting Python, he’s DJing (in the *“well, I used to”* sense), producing *amateur* electronic music, and half‑sitting through Hacker News with an *I‑know‑something‑about‑it* grin.  \n>   \n> Professionally – he’s the **CTO‑founder of Nebula.io**, a startup that claims to “help people discover their potential” through a **patented LLM‑matching engine** used by recruiters. Before Nebula, he single‑handedly ran **untapt**, a “big‑name” AI venture that got snatched by someone in 2021.  \n\n> **Why you’re here**  \n> You’re either here because you’re into LLM tinkering, or you stumbled in looking for a new AI conference to pretend‑to‑understand. Either way, enjoy the cocktail of code, music, and questionable self‑promotion.\n\n## Recent News & Announcements (in *chronological order*)\n\n| Date | Headline | What’s Cooking |\n|-----

In [39]:
# A function to display this nicely in the output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [40]:
display_summary("https://edwarddonner.com")

**Edward Donner’s Digital Niche (AKA “Webpage of a Half‑Genius”)**

- **Profile**: Ed is the “code‑slinging, LLM‑hacking, occasional DJ” who runs Nebula.io (AI recruitment tech) and has previously sold an AI startup called *untapt*—so he's got the credibility that doesn’t involve actually being a wizard.
  
- **What’s on the site**: A personal site that’s half blogging, half portfolio. He flaunts his “award‑winning platform,” “patented matching model,” and a whole section for people to DM him for “connect” (yes, “Connect Four” is a separate LLM arena, not his board game skills).

- **Latest “breaking news”**:
  - **Sep 15 2025** – “AI in Production: Gen AI and Agentic AI on AWS at scale” → because nothing says success like a headline that reads like a conference abstract.
  - **May 28 2025** – “Connecting my courses – become an LLM expert and leader” → he’s now an “LLM teacher” so you can learn to dominate other language models—just in case that’s still relevant.
  - **May 18 2025** – “2025 AI Executive Briefing” → presumably a PowerPoint‑filled summit he owns.
  - **Apr 21 2025** – “The Complete Agentic AI Engineering Course” → because he’s just got another course, folks.

- **Side‑bars**: Links to LinkedIn, Twitter, Facebook (you’ll have to scroll past the 100% “connect” button), and an email subscription form that apparently promises newsletters you’ll probably forget about unless you’re into AI hype.

In short: Ed is a coder‑turned‑entrepreneur‑turned‑teacher‑hive‑mind who runs a job‑matching platform with shiny patents, and this website is the one‑page résumé for anyone who thinks they need more LLM training or a recruiter who still thinks AI is magic.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [41]:
display_summary("https://cnn.com")

**CNN’s Bare‑Bones Dashboard (with a side of ad misery)**  

- The page is a *layout skeleton* rather than a newsroom.  Thousands of navigation links (US, World, Politics… the whole alphabet of “categories”) sit on a clean background.  
- There’s *no headline*—just the “Breaking News, Latest News and Videos” title that never gets a story to attach itself to.  
- A pop‑up ad‑feedback form is the only thing that offers content: it’s a list of generic complaints about sluggish video players, loud audio and repetitive, slow‑loading ads.  
- In short, if you’re looking for hard news, you’ll need to click somewhere—this page is just a lobby for the real stories.

In [42]:
display_summary("https://anthropic.com")

# Anthropic

> *“AI that’s *safety‑at‑the‑frontier*? Sounds like a really tidy promise.”*

- **What the heck is happening?** Anthropic is still in its *“public‑benefit‑Corp”* mode, meaning they’re loudly declaring to the world that they’ll use AI *for good* while also making sure “you’ll keep a hand on the wheel.”  
- **Main stars:** Their flagship model **Claude** is getting fresh siblings. The newest **Sonnet 4.5** boasts agent‑ready chops, coding prowess, and is apparently *the best model in the world for agents, coding, and computer use*—which probably means “the best way to brag to your boss.”  
- **Haiku 4.5** is their quick‑and‑sweet model that still pretends to be helpful without chewing too much context.  
- The site is littered with repeated “Try Claude,” “Download app,” and countless “Log in” prompts as if their landing page was written on a vending‑machine screen.  
- They sprinkle in fancy legal‑talk: *Commitments, Initiatives, Transparency, Responsible Scaling Policy,* plus a “Trust center.”  
- **Announcements:** A couple of glowing “Read announcement” links hint at new releases and an emphasis on *“Managing context on the Claude Developer Platform.”*  
- **Bottom line**: Anthropic is juggling hype, self‑branding as the safety sentinel, and the inevitable marketing churn you expect from a company that’s about to drop the next “world‑altering” AI model. The real question is whether the safety promises hold up when the *real* world starts asking *why* they need to keep a hand on the wheel.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise, you experienced calling the Cloud API of a Frontier Model (a leading model at the frontier of AI) for the first time. We will be using APIs like OpenAI at many stages in the course, in addition to building our own LLMs.

More specifically, we've applied this to Summarization - a classic Gen AI use case to make a summary. This can be applied to any business vertical - summarizing the news, summarizing financial performance, summarizing a resume in a cover letter - the applications are limitless. Consider how you could apply Summarization in your business, and try prototyping a solution.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue - now try yourself</h2>
            <span style="color:#900;">Use the cell below to make your own simple commercial example. Stick with the summarization use case for now. Here's an idea: write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.</span>
        </td>
    </tr>
</table>

In [43]:
# EJEMPLO: Generador de líneas de asunto para emails

# Step 1: Define el system prompt - Instrucciones para el modelo
email_system_prompt = """
You are a professional email assistant. 
Analyze the email content provided and suggest a clear, concise, and professional subject line.
The subject line should be under 60 characters and capture the main purpose of the email.
Respond with ONLY the subject line, nothing else.
"""

# Step 2: Define el contenido del email de ejemplo
sample_email = """
Hi John,

I hope this email finds you well. I wanted to reach out regarding our upcoming 
quarterly review meeting scheduled for next Monday. We need to discuss the 
Q4 sales performance, review the new marketing strategy, and set targets 
for Q1 2026.

Could you please prepare the sales reports and bring the budget proposals? 
Also, please confirm if 2 PM works for you, or if we need to reschedule.

Looking forward to hearing from you.

Best regards,
Sarah
"""

# Step 3: Crea la lista de mensajes en el formato que espera la API
email_messages = [
    {"role": "system", "content": email_system_prompt},
    {"role": "user", "content": f"Email content:\n\n{sample_email}"}
]

# Step 4: Llama a Ollama para generar la línea de asunto
response = openai.chat.completions.create(
    model=ollama_model,
    messages=email_messages,
    temperature=0.7  # Añadimos temperatura para controlar la creatividad
)

# Step 5: Obtén y muestra el resultado
suggested_subject = response.choices[0].message.content
print(" Email Original:")
print("-" * 60)
print(sample_email)
print("\n" + "=" * 60)
print(f" Línea de Asunto Sugerida: {suggested_subject}")
print("=" * 60)

 Email Original:
------------------------------------------------------------

Hi John,

I hope this email finds you well. I wanted to reach out regarding our upcoming 
quarterly review meeting scheduled for next Monday. We need to discuss the 
Q4 sales performance, review the new marketing strategy, and set targets 
for Q1 2026.

Could you please prepare the sales reports and bring the budget proposals? 
Also, please confirm if 2 PM works for you, or if we need to reschedule.

Looking forward to hearing from you.

Best regards,
Sarah


 Línea de Asunto Sugerida: Q4 Review Meeting Prep: Sales Reports & Budget Proposal
