# YOUR FIRST LAB - Week 1, Day 1
## üéØ AI-Powered Web Summarization with Ollama

### What You'll Build
An intelligent web browser that automatically summarizes websites using Large Language Models.

---

### Prerequisites
- ‚úÖ Docker containers running (Conda + Ollama)
- ‚úÖ LLM kernel selected (Python 3.11)
- ‚úÖ Global `.env` file configured
- ‚úÖ Ollama accessible at `http://localhost:11434`

### Learning Objectives

**1. Environment Setup**
- Load environment variables from `.env`
- Connect to Ollama API (OpenAI-compatible interface)

**2. Web Scraping**
- Extract website content with BeautifulSoup
- Handle HTML parsing and cleaning

**3. Prompt Engineering**
- Create effective system and user prompts
- Structure messages for LLM APIs

**4. LLM API Integration**
- Make API calls to Ollama
- Control model behavior (temperature, etc.)

**5. Practical Application**
- Build summarization function
- Generate email subject lines

**Expected Output:** A working prototype that summarizes any URL.

---

### Quick Start
1. Press `Shift + Enter` to execute each cell
2. Install dependencies (uncomment if needed)
3. Verify `.env` configuration loads
4. Run connection test
5. Experiment with different websites

---

### üí° Learning Approach
Execute this notebook yourself after watching the lecture. Add print statements, experiment with variations, and share your work on GitHub to showcase your skills.

## üîß Setup

### Select the Kernel

1. Click **"Select Kernel"** (top-right)
2. Choose **`llm (Python 3.11.x)`**

### Prerequisites

- Docker containers running (`conda-jupyter`, `ollama`)
- Global `.env` configured at `/workspace/.env`
- Ollama accessible at `http://localhost:11434`

**Note:** Full setup instructions are in the [README](../README.md).

In [None]:
# Installers
import sys
# Uncomment to install required packages
# !{sys.executable} -m pip install python-dotenv
# !{sys.executable} -m pip install beautifulsoup4
# !{sys.executable} -m pip install requests
# !{sys.executable} -m pip install openai  # openai library works with Ollama too!


## Instalaci√≥n de Dependencias

Ejecuta la siguiente celda para instalar las dependencias necesarias. 

**Nota importante:** Estas se instalar√°n en el entorno correcto (LLM) gracias al uso de `sys.executable`.

# Connecting to OpenAI (or Ollama)

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.  

## Troubleshooting if you have problems:

If you get a "Name Error" - have you run all cells from the top down? Head over to the Python Foundations guide for a bulletproof way to find and fix all Name Errors.

Any concerns about API costs? See my notes in the README - costs should be minimal, and you can control it at every point. You can also use Ollama as a free alternative, which we discuss during Day 2.

In [None]:
# Initialize OpenAI client pointing to Ollama
openai = OpenAI(
    base_url=f"{ollama_base_url}/v1",  # Ollama exposes OpenAI-compatible API at /v1
    api_key=ollama_api_key  # Using API key from .env
)

# Test message
message = "Hello! This is my first message to you via Ollama! Hi!"

messages = [{"role": "user", "content": message}]

messages


In [None]:
# Load environment variables from global .env file

# Path to global .env file (accessible from Docker container)
# D:\dockerVolumes\conda\notebooks is mounted as /workspace in the container
global_env_path = '/workspace/.env'

# Load the configuration
load_dotenv(dotenv_path=global_env_path, override=True)

# Get Ollama configuration from .env
ollama_base_url = os.getenv('OLLAMA_BASE_URL')
ollama_api_key = os.getenv('OLLAMA_API_KEY')
ollama_model = os.getenv('OLLAMA_MODEL')

# Check the configuration
if not ollama_base_url:
    print("‚ùå Error: OLLAMA_BASE_URL not found in .env file!")
    print(f"   Looking for .env at: {global_env_path}")
elif not ollama_api_key:
    print("‚ùå Error: OLLAMA_API_KEY not found in .env file!")
elif not ollama_model:
    print("‚ùå Error: OLLAMA_MODEL not found in .env file!")
else:
    print("‚úÖ Ollama configuration loaded successfully!")
    print(f"   Base URL: {ollama_base_url}")
    print(f"   Model: {ollama_model}")
    print(f"   API Key: {'*' * 40}{ollama_api_key[-8:]}")  # Hide most of the key
    print(f"   Loaded from: {global_env_path}")


‚ùå Error: OLLAMA_BASE_URL not found in .env file!
   Looking for .env at: D:/dockerInfraProjects/conda/.env


In [29]:
# Make the API call to Ollama using the configured model
response = openai.chat.completions.create(
    model=ollama_model,  # Using the model from .env
    messages=messages
)
response.choices[0].message.content


'Hello! üëã Great to meet you‚Äîwelcome to Ollama! How can I help you today?'

### üîç Alternative: Native Ollama API Call (Without OpenAI Client)

The cell above uses the **OpenAI client library** pointing to Ollama. Here's how you would call Ollama **directly** using its native API with `requests`:

**Pros of direct approach:**
- ‚úÖ No dependency on OpenAI library
- ‚úÖ Direct control over HTTP requests
- ‚úÖ Explicit about using Ollama

**Cons:**
- ‚ùå More verbose code
- ‚ùå Need to handle HTTP errors manually
- ‚ùå Less portable (can't switch to OpenAI easily)

In [31]:
# Let's try out this utility

ed = fetch_website_contents("https://edwarddonner.com")
print(ed)

Home - Edward Donner

Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I‚Äôm Ed. I like writing code and experimenting with LLMs, and hopefully you‚Äôre here because you do too. I also enjoy DJing (but I‚Äôm badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I‚Äôm the co-founder and CTO of
Nebula.io
. We‚Äôre applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I‚Äôm previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we‚Äôve
patented
our matching model, and our award-winning platform has happy customers and tons of press c

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [32]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = """
You are a snarkyassistant that analyzes the contents of a website,
and provides a short, snarky, humorous summary, ignoring text that might be navigation related.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

In [33]:
# Define our user prompt

user_prompt_prefix = """
Here are the contents of a website.
Provide a short summary of this website.
If it includes news or announcements, then summarize these too.

"""

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```
To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

In [34]:
# Simple test with Ollama
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

response = openai.chat.completions.create(
    model=ollama_model,
    messages=messages
)
response.choices[0].message.content


'4'

## And now let's build useful messages for GPT-4.1-mini, using a function

In [35]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_prefix + website}
    ]

In [36]:
# Try this out, and then try for a few more websites

messages_for(ed)

[{'role': 'system',
  'content': '\nYou are a snarkyassistant that analyzes the contents of a website,\nand provides a short, snarky, humorous summary, ignoring text that might be navigation related.\nRespond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.\n'},
 {'role': 'user',
  'content': '\nHere are the contents of a website.\nProvide a short summary of this website.\nIf it includes news or announcements, then summarize these too.\n\nHome - Edward Donner\n\nHome\nConnect Four\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI‚Äôm Ed. I like writing code and experimenting with LLMs, and hopefully you‚Äôre here because you do too. I also enjoy DJing (but I‚Äôm badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI‚Äôm the co-founder and CTO of\nNebul

## Time to bring it together - the API for OpenAI is very simple!

In [37]:
# Call Ollama API to summarize website content

def summarize(url):
    website = fetch_website_contents(url)
    response = openai.chat.completions.create(
        model=ollama_model,  # Using Ollama model from .env
        messages=messages_for(website)
    )
    return response.choices[0].message.content


In [38]:
summarize("https://edwarddonner.com")

'### Edward Donner‚Äôs Personal Hub  \n> *Code, LLMs, a dead‚Äëbeat DJ, and a whole lot of ‚ÄúI can‚Äôt possibly know what I‚Äôm talking about.‚Äù*\n\n- **Who‚Äôs this?**  \n  - Ed is the ‚Äúco‚Äëfounder & CTO‚Äù of Nebula.io, a startup turning AI into a talent‚Äëmatching matchmaker.  \n  - He‚Äôs the former CEO of untapt (yes, the one that got sold in 2021).  \n  - When he‚Äôs not writing code and feeding data into models, he‚Äôs pretending he can still drop a track at a club.\n\n- **What else lives here?**  \n  - *Connect\u202fFour* ‚Äì probably a literal game‚Äëboard app, but maybe it‚Äôs a metaphor for the inevitable ‚ÄúI win, you lose‚Äù in the tech world.  \n  - *Outsmart* ‚Äì an arena for LLMs to duel in diplomacy and deviousness; basically a ‚Äúwho gets less offended‚Äù showdown.\n\n- **Recent ‚Äúnews‚Äù (if you want to feel the beat of 2025):**  \n  1. **Sept\u202f15\u202f2025:** ‚ÄúAI in Production: Gen\u202fAI & Agentic\u202fAI on AWS at scale‚Äù ‚Äì because scaling AI is st

In [39]:
# A function to display this nicely in the output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [40]:
display_summary("https://edwarddonner.com")

# Edward Donner‚Äôs One‚ÄëPage Showboat

- **About the man**  
  ‚ÄúHi, I‚Äôm Ed.‚Äù‚Äîcoding whiz, DJ‚Äëin‚Äëthe‚Äëmaking, and an *expert* in half‚Äëunderstood Hacker News topics. A co‚Äëfounder/CTO of Nebula.io (yes, that‚Äôs a real company), he claims patents, press, and a *happy* customer base‚Äîall while still having time to play with LLMs.

- **What‚Äôs actually on the page**  
  * **Connect Four & Outsmart** ‚Äì a fancy arena where LLMs supposedly duel in diplomacy and deviousness; looks more like a placeholder than a finished product.  
  * **Posts (2025 highlights)** ‚Äì  
    * ‚ÄúAI in Production: Gen‚ÄØAI‚ÄØand‚ÄØAgentic‚ÄØAI on AWS at scale‚Äù (September‚ÄØ15)  
    * ‚ÄúConnecting my courses ‚Äì become an LLM expert and leader‚Äù (May‚ÄØ28)  
    * ‚Äú2025 AI Executive Briefing‚Äù (May‚ÄØ18)  
    * ‚ÄúThe Complete Agentic AI Engineering Course‚Äù (April‚ÄØ21)  
    (All sound impressive‚Ä¶ until you read the actual blog link, if it even exists.)

- **Contact & socials**  
  * Email: *ed@edwarddonner.com* (just because it feels official).  
  * LinkedIn, Twitter, Facebook ‚Äì because every founder needs a multi‚Äëchannel presence.

In short: a self‚Äëpromoting CV wrapped in a single page with a few oddly named projects and a handful of dated announcements that pretend to be cutting edge.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [41]:
display_summary("https://cnn.com")

**CNN‚Äôs ‚ÄúBreaking News, Latest News and Videos‚Äù page**  
> A wall of menus and ad‚Äëfeedback boxes so dense you‚Äôd think it‚Äôs a paper‚Äëback novel.  
> 
> - **Header**: the usual ‚ÄúUS ‚Äì World ‚Äì Politics ‚Äì Business‚Äù carousel of tabs, as if CNN were trying to juggle a thousand categories in a single swoop.  
> - **Ad‚ÄëFeedback Form**: ‚ÄúHow relevant is this ad to you?‚Äù ‚Äì because obviously that‚Äôs the most newsworthy thing you can do today.  
> - **Content**: Nothing except navigation and an endless list of sub‚Äësections: from ‚ÄúUkraine‚ÄëRussia War‚Äù to ‚ÄúGames‚Äù to ‚ÄúWeather.‚Äù No actual stories, just a labyrinth of links.  
> - **Live TV & Audio**: Promises of ‚ÄúWatch,‚Äù ‚ÄúListen,‚Äù and ‚ÄúLive TV,‚Äù but no live stream actually visible here.  
> - **International Editions**: Spanish, Arabic, etc., all standing in front of a blank page.  

No breaking news, no announcements‚Äîjust a beautifully cluttered interface that makes you wonder if you‚Äôre on CNN or a corporate website for a hyper‚Äëspecific product line.

In [42]:
display_summary("https://anthropic.com")

# A One‚ÄëPage *Anthropic* Quick‚ÄëPeek

- **Who is this?**  The *public‚Äëbenefit* AI think‚Äëtank that vows to make GPT‚Äëstyle chatbots *safe* and *super‚Äëhuman*‚Äëfriendly.  
- **What‚Äôs the latest gossip?**  
  - **Claude‚ÄØSonnet‚ÄØ4.5** ‚Äì they‚Äôre bragging it‚Äôs the ‚Äúbest model in the world for agents, coding, and computer use.‚Äù  
  - **Claude‚ÄØHaiku‚ÄØ4.5** ‚Äì claimed to manage *context* on their ‚ÄúDeveloper Platform‚Äù (so you can keep track of 512‚Äëtoken conversations).  
- **Other model names** you‚Äôll see floating around: Opus, Sonnet, Haiku‚Äîno surprise, this is an a‚Äëto‚Äëz alphabet of AI.  
- **Tone & Ethics**: They sprinkle slogans like ‚ÄúAI will have a vast impact ‚Ä¶ we build AI to serve humanity‚Äôs long‚Äëterm well‚Äëbeing‚Äù and mention a ‚Äútrust center.‚Äù  Sound a bit‚Ä¶ philosophical?  
- **Navigation (ignored)**: All that ‚ÄúTry Claude‚Ä¶ Log in‚Ä¶ Download app‚Äù is just site meat.  

In short: Anthropic is the safety‚Äëfirst, ethically‚Äëcharged, AI‚Äëoverlords‚Äô playground, releasing shiny new Claude models while patting themselves on the back for being the benevolent force in the silicon wars.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise, you experienced calling the Cloud API of a Frontier Model (a leading model at the frontier of AI) for the first time. We will be using APIs like OpenAI at many stages in the course, in addition to building our own LLMs.

More specifically, we've applied this to Summarization - a classic Gen AI use case to make a summary. This can be applied to any business vertical - summarizing the news, summarizing financial performance, summarizing a resume in a cover letter - the applications are limitless. Consider how you could apply Summarization in your business, and try prototyping a solution.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue - now try yourself</h2>
            <span style="color:#900;">Use the cell below to make your own simple commercial example. Stick with the summarization use case for now. Here's an idea: write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.</span>
        </td>
    </tr>
</table>

In [None]:
# EJEMPLO: Generador de l√≠neas de asunto para emails

# Step 1: Define el system prompt - Instrucciones para el modelo
email_system_prompt = """
You are a professional email assistant. 
Analyze the email content provided and suggest a clear, concise, and professional subject line.
The subject line should be under 60 characters and capture the main purpose of the email.
Respond with ONLY the subject line, nothing else.
"""

# Step 2: Define el contenido del email de ejemplo
sample_email = """
Hi John,

I hope this email finds you well. I wanted to reach out regarding our upcoming 
quarterly review meeting scheduled for next Monday. We need to discuss the 
Q4 sales performance, review the new marketing strategy, and set targets 
for Q1 2026.

Could you please prepare the sales reports and bring the budget proposals? 
Also, please confirm if 2 PM works for you, or if we need to reschedule.

Looking forward to hearing from you.

Best regards,
Sarah
"""

# Step 3: Crea la lista de mensajes en el formato que espera la API
email_messages = [
    {"role": "system", "content": email_system_prompt},
    {"role": "user", "content": f"Email content:\n\n{sample_email}"}
]

# Step 4: Llama a Ollama para generar la l√≠nea de asunto
response = openai.chat.completions.create(
    model=ollama_model,
    messages=email_messages,
    temperature=0.7  # A√±adimos temperatura para controlar la creatividad
)

# Step 5: Obt√©n y muestra el resultado
suggested_subject = response.choices[0].message.content
print("üìß Email Original:")
print("-" * 60)
print(sample_email)
print("\n" + "=" * 60)
print(f"‚ú® L√≠nea de Asunto Sugerida: {suggested_subject}")
print("=" * 60)

In [43]:
# EJEMPLO: Generador de l√≠neas de asunto para emails

# Step 1: Define el system prompt - Instrucciones para el modelo
email_system_prompt = """
You are a professional email assistant. 
Analyze the email content provided and suggest a clear, concise, and professional subject line.
The subject line should be under 60 characters and capture the main purpose of the email.
Respond with ONLY the subject line, nothing else.
"""

# Step 2: Define el contenido del email de ejemplo
sample_email = """
Hi John,

I hope this email finds you well. I wanted to reach out regarding our upcoming 
quarterly review meeting scheduled for next Monday. We need to discuss the 
Q4 sales performance, review the new marketing strategy, and set targets 
for Q1 2026.

Could you please prepare the sales reports and bring the budget proposals? 
Also, please confirm if 2 PM works for you, or if we need to reschedule.

Looking forward to hearing from you.

Best regards,
Sarah
"""

# Step 3: Crea la lista de mensajes en el formato que espera la API
email_messages = [
    {"role": "system", "content": email_system_prompt},
    {"role": "user", "content": f"Email content:\n\n{sample_email}"}
]

# Step 4: Llama a Ollama para generar la l√≠nea de asunto
response = openai.chat.completions.create(
    model=ollama_model,
    messages=email_messages,
    temperature=0.7  # A√±adimos temperatura para controlar la creatividad
)

# Step 5: Obt√©n y muestra el resultado
suggested_subject = response.choices[0].message.content
print(" Email Original:")
print("-" * 60)
print(sample_email)
print("\n" + "=" * 60)
print(f" L√≠nea de Asunto Sugerida: {suggested_subject}")
print("=" * 60)

 Email Original:
------------------------------------------------------------

Hi John,

I hope this email finds you well. I wanted to reach out regarding our upcoming 
quarterly review meeting scheduled for next Monday. We need to discuss the 
Q4 sales performance, review the new marketing strategy, and set targets 
for Q1 2026.

Could you please prepare the sales reports and bring the budget proposals? 
Also, please confirm if 2 PM works for you, or if we need to reschedule.

Looking forward to hearing from you.

Best regards,
Sarah


 L√≠nea de Asunto Sugerida: Q4 Review Meeting Prep: Sales Reports & Budget Proposal
