# Instant Gratification!

Let's build a useful LLM solution - in a matter of minutes.

By the end of this course, you will have built an autonomous Agentic AI solution with 7 agents that collaborate to solve a business problem. All in good time! We will start with something smaller...

Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!

Before starting, be sure to have followed the instructions in the "README" file, including creating your API key with OpenAI and adding it to the `.env` file.

## If you're new to Jupyter Lab

Welcome to the wonderful world of Data Science experimentation! Once you've used Jupyter Lab, you'll wonder how you ever lived without it. Simply click in each "cell" with code in it, such as the cell immediately below this text, and hit Shift+Return to execute that cell. As you wish, you can add a cell with the + button in the toolbar, and print values of variables, or try out variations.

If you need to start a 'notebook' again, go to Kernel menu >> Restart kernel.

## I am here to help

If you have any problems at all, please do reach out.  
I'm available through the platform, or at ed@edwarddonner.com, or at https://www.linkedin.com/in/eddonner/ if you'd like to connect.

## More troubleshooting

Please see the [troubleshooting](troubleshooting.ipynb) notebook in this folder for more ideas!

## Business value of these exercises

A final thought. While I've designed these notebooks to be educational, I've also tried to make them enjoyable. We'll do fun things like have LLMs tell jokes and argue with each other. But fundamentally, my goal is to teach skills you can apply in business. I'll explain business implications as we go, and it's worth keeping this in mind: as you build experience with models and techniques, think of ways you could put this into action at work today. Please do contact me if you'd like to discuss more or if you have ideas to bounce off me.

In [1]:
# imports

import os
import requests
import fitz  # PyMuPDF

from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# Connecting to OpenAI

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.

## Troubleshooting if you have problems:

1. OpenAI takes a few minutes to register after you set up an account. If you receive an error about being over quota, try waiting a few minutes and try again.
2. Also, double check you have the right kind of API token with the right permissions. You should find it on [this webpage](https://platform.openai.com/api-keys) and it should show with Permissions of "All". If not, try creating another key by:
- Pressing "Create new secret key" on the top right
- Select **Owned by:** you, **Project:** Default project, **Permissions:** All
- Click Create secret key, and use that new key in the code and the `.env` file (it might take a few minutes to activate)
- Do a Kernel >> Restart kernel, and execute the cells in this Jupyter lab starting at the top
4. As a fallback, replace the line `openai = OpenAI()` with `openai = OpenAI(api_key="your-key-here")` - while it's not recommended to hard code tokens in Jupyter lab, because then you can't share your lab with others, it's a workaround for now
5. See the [troubleshooting](troubleshooting.ipynb) notebook in this folder for more instructions
6. Contact me! Message me or email ed@edwarddonner.com and we will get this to work.

Any concerns about API costs? See my notes in the README - costs should be minimal, and you can control it at every point.

In [2]:
# Load environment variables in a file called .env

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
openai = OpenAI()

# Uncomment the below line if this gives you any problems:
# openai = OpenAI(api_key="your-key-here")

In [16]:
def extract_text_from_pdf(pdf_path):
    # Open the PDF file
    pdf_document = fitz.open(pdf_path)
    extracted_text = ""
    
    # Iterate over each page and collect text
    for page_num in range(pdf_document.page_count):
        page = pdf_document[page_num]
        text = page.get_text("text")  # Extract text from the page
        extracted_text += f"--- Page {page_num + 1} ---\n"
        extracted_text += text + "\n\n"
    
    pdf_document.close()
    return extracted_text

# # Usage example
# pdf_path = "C:\\Users\\Daniel\\Desktop\\Study\\llm_engineering\\input\\input.pdf"  # Path to your PDF file
# text_content = extract_text_from_pdf(pdf_path)
# print(text_content)  # This will print the extracted text


## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [17]:
system_prompt = "You are an assistant that analyzes the contents of a pdf file \
and provides a short summary, ignoring text that might be useless to be part of the summary. \
Respond in markdown."

In [27]:
def user_prompt_for(text):
    user_prompt = "please provide a short summary of this website in markdown format.\n\n"
    user_prompt += text
    return user_prompt

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [19]:
def messages_for(pdf_path):
    text_content = extract_text_from_pdf(pdf_path)

    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(text_content)}
    ]

## Time to bring it together - the API for OpenAI is very simple!

In [34]:
def summarize(pdf_path):
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(pdf_path)
    )
    return response.choices[0].message.content

In [35]:
summarize(r'C:\Users\Daniel\Desktop\Study\llm_engineering\input\input.pdf')

'```markdown\n# Summary of Flight Itinerary\n\n## Passenger\n- **Name:** Daniel Juan Garbosa\n- **Booking Reference:** SWEHPV\n- **Date of Booking:** 18 September 2024\n\n---\n\n## Flights\n\n### Outbound Journey\n1. **Flight QR 933** \n   - **Airline:** Qatar Airways\n   - **Departure:** Manila (Ninoy Aquino Intl), Terminal 3, 02 Nov 2024, 18:30\n   - **Arrival:** Doha (Hamad International), 02 Nov 2024, 23:30\n   - **Duration:** 10:00\n   - **Baggage Allowance:** 30kg\n   - **Meal:** Included\n   - **Aircraft:** Boeing 777-300ER\n\n2. **Flight QR 215**\n   - **Airline:** Qatar Airways\n   - **Departure:** Doha (Hamad International), 03 Nov 2024, 02:40\n   - **Arrival:** Zagreb (Franjo Tudman), 03 Nov 2024, 06:45\n   - **Duration:** 06:05\n   - **Baggage Allowance:** 30kg\n   - **Meal:** Included\n   - **Aircraft:** Airbus A320\n\n---\n\n### Return Journey\n3. **Flight QR 218**\n   - **Airline:** Qatar Airways\n   - **Departure:** Zagreb (Franjo Tudman), 16 Nov 2024, 15:05\n   - **Arr

In [36]:
def display_summary(pdf_path):
    summary = summarize(pdf_path)
    display(Markdown(summary))

In [38]:
display_summary(r'C:\Users\Daniel\Desktop\Study\llm_engineering\input\input.pdf')

```markdown
## Flight Booking Summary

**Booking Reference:** SWEHPV  
**Traveler:** GARBOSA/DANIEL JUAN MR  
**Date of Travel:** 18 September 2024  
**Travel Agency:** Avio Karte, Zagreb, Croatia  

### Itinerary

1. **Flight QR 933**
   - **Departure:** Manila (Ninoy Aquino Intl), Terminal 3  
   - **Date:** 02 November 2024, 18:30  
   - **Arrival:** Doha (Hamad International), 23:30  
   - **Duration:** 10:00  
   - **Baggage Allowance:** 30 kg  
   - **Meal Provided:** Yes  
   - **Aircraft:** Boeing 777-300ER  

2. **Flight QR 215**
   - **Departure:** Doha (Hamad International)  
   - **Date:** 03 November 2024, 02:40  
   - **Arrival:** Zagreb (Franjo Tudman), 06:45  
   - **Duration:** 06:05  
   - **Baggage Allowance:** 30 kg  
   - **Meal Provided:** Yes  
   - **Aircraft:** Airbus A320   

3. **Flight QR 218**
   - **Departure:** Zagreb (Franjo Tudman)  
   - **Date:** 16 November 2024, 15:05  
   - **Arrival:** Doha (Hamad International), 22:30  
   - **Duration:** 05:25  
   - **Baggage Allowance:** 25 kg  
   - **Meal Provided:** Yes  
   - **Aircraft:** Airbus A320  

4. **Flight QR 932**
   - **Departure:** Doha (Hamad International)  
   - **Date:** 17 November 2024, 02:40  
   - **Arrival:** Manila (Ninoy Aquino Intl), Terminal 3, 16:05  
   - **Duration:** 08:25  
   - **Baggage Allowance:** 25 kg  
   - **Meal Provided:** Yes  
   - **Aircraft:** Boeing 777-300ER   

### Environmental Information
- **Calculated Average CO2 Emissions:** 1390.02 kg/person

### Important Notes
- This is your electronic ticket; please keep it for travel.
- The airplane ticket is a non-transferable document.
- Ensure to collect information regarding necessary travel documents for your destination.
- **Check-in Times:** 90 minutes for domestic and 120 minutes for international flights.
- Travel agency 'Ulix' wishes you a pleasant journey!
```

In [39]:
display_summary(r'C:\Users\Daniel\Desktop\Study\llm_engineering\input\01chapter01.pdf')

# Summary of the Safe Motherhood Survey (SMS) Project Document

## Introduction
- The document discusses maternal health issues, emphasizing that approximately 500,000 maternal deaths occur globally each year, primarily in less developed countries.
- It highlights the need for improved measurement of maternal morbidity alongside mortality and outlines the objectives of the Safe Motherhood Survey (SMS) Project initiated in 1992, focusing on women's reproductive health.

## Philippine Context
- The Philippines, an archipelago of around 7,100 islands, has a diverse geography impacting its climate and development.
- The document discusses the political and economic history of the Philippines, including developments under the Ramos administration aimed at economic recovery and social reforms.

## Population Growth
- As of the 1990 census, the population was 60.7 million, with a growth rate of 2.35% over the previous decade. The total fertility rate showed a decline, aiming for further reductions through a government program.

## Health Policies and Programs
- The Department of Health aims to ensure equitable access to healthcare. Important health initiatives include family planning, maternal health, and child nutrition.

## Objectives of the SMS
- The SMS aims to gather data on:
  - Serious health problems in pregnant women.
  - Utilization of antenatal and postnatal care.
  - Chronic reproductive morbidities.
  - Domestic violence prevalence among women.
  - Nutritional status of ever-pregnant women.

## Methodology
- The survey was implemented by the National Statistics Office with support from the Department of Health and funding from organizations like the Rockefeller Foundation and USAID.
- Describes a two-stage sampling method for data collection from a nationally representative sample of ever-pregnant women.

## Data Collection and Processing
- Fieldwork involved trained teams and strict data processing protocols to ensure the reliability of the collected information.
- The data processing utilized the Integrated System for Survey Analysis (ISSA) for efficient management and analysis.

This document serves to support further research and policymaking related to maternal health in the Philippines, emphasizing the importance of quality data in improving health outcomes.