In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# If you get an error running this cell, then please head over to the troubleshooting notebook!

In [2]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [3]:
openai = OpenAI()

# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.
# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions

## OK onwards with our first project

In [5]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [30]:
# Let's try one out. Change the website and add print statements to follow along.

ed = Website("https://www.tradera.com/category/170301?paging=3.a321.s26")
print(ed.title)
print(ed.text)

Stereohögtalare | Köp & sälj begagnat & oanvänt på Tradera
JavaScript är inaktiverat. Hemsidan kommer ha begränsad funktionalitet.
Meny
Accessoarer
Antikt & Design
Barnartiklar
Barnkläder & Barnskor
Barnleksaker
Biljetter & Resor
Bygg & Verktyg
Böcker & Tidningar
Datorer & Tillbehör
DVD & Videofilmer
Fordon, Båtar & Delar
Foto, Kameror & Optik
Frimärken
Handgjort & Konsthantverk
Hem & Hushåll
Hemelektronik
Hobby
Klockor
Kläder
Konst
Musik
Mynt & Sedlar
Samlarsaker
Skor
Skönhet
Smycken & Ädelstenar
Sport & Fritid
Telefoni, Tablets & Wearables
Trädgård & Växter
TV-spel & Datorspel
Vykort & Bilder
Övrigt
Inspiration
Tradera
Stereohögtalare
Ny annons
Så funkar det
Logga in
Skapa konto
Stereohögtalare
347 annonser
Spara sökning
Högtalare
/
Stereohögtalare
Bästa träff
Alla filter
Kategori
Status
Pris
Annons­format
Skick
Säljare
Fraktval
Ny
Bowers & Wilkins 707 s2 högtalare med tillhörande stativ B&W FS-700 S2
18 aug 19:26
9 900 kr
Eller
Köp nu
13 000 kr
Avhämtning
AUDIO PRO A48 VITA TV & mul

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [31]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are looking at a website titled {website.title}, it has different deals, give me the best 3 deals of the site. i.e the items that are most cheap comapred to what the should cost according to you"



In [16]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}, it has different outlet deals, give me the best 3 deals of the site. i.e the items that are most cheap comapred to what the should cost according to you"
    user_prompt += "\nThe contents of this website is as follows; \n\n"
    user_prompt += website.text
    return user_prompt

In [17]:
print(user_prompt_for(ed))

You are looking at a website titled Outlet - Högtalare - Elgiganten, it has different outlet deals, give me the best 3 deals of the site. i.e the items that are most cheap comapred to what the should cost according to you
The contents of this website is as follows; 

Hoppa över navigation
Kundtjänst
Privatkund
Företagskund
Mina Favoriter
Meny
Hitta butik
Logga in
Kundvagn
Sök efter produkt, kategori eller artikel
Hoppa över filter
Filter
Produkttyp
Märke
Fyndvarans skick
Fyndvarans skick
Nyskick - i originalförpackning
129
Bra skick – Mindre spår av användning
49
Nyskick - originalförpackning saknas
38
Använt skick – Synliga skador, fullt funktionell
26
Använt skick – Tydliga spår av användning
16
Pris
Pris
Färg
Färg
Svart
155
Vit
34
Blå
24
Grå
21
Beige
7
Gul
4
Rosa
2
Röd
2
Se alla
Drivs av
Drivs av
Batteri
103
Nätadapter
26
Uppladdningsbart batteri
45
Leverantörens EcoVadis Score
Leverantörens EcoVadis Score
Platinum
29
Guld
3
Silver
67
Brons
8
IP-klassificering (IP Classification)
Se

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```
To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

In [None]:
messages = [
    {"role": "system", "content": "You are about to be given a couple of items in a outlet website. You should find the best value product, i.e the product which is cheaper than what it should be according to you. Write 
    the best 5 deals to me},
    {"role": "user", "content": ""}
]

In [None]:
# To give you a preview -- calling OpenAI with system and user messages:

response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

## And now let's build useful messages for GPT-4o-mini, using a function

In [32]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [33]:
# Try this out, and then try for a few more websites

messages_for(ed)

[{'role': 'system',
  'content': 'You are looking at a website titled {website.title}, it has different deals, give me the best 3 deals of the site. i.e the items that are most cheap comapred to what the should cost according to you'},
 {'role': 'user',
  'content': 'You are looking at a website titled Stereohögtalare | Köp & sälj begagnat & oanvänt på Tradera, it has different outlet deals, give me the best 3 deals of the site. i.e the items that are most cheap comapred to what the should cost according to you\nThe contents of this website is as follows; \n\nJavaScript är inaktiverat. Hemsidan kommer ha begränsad funktionalitet.\nMeny\nAccessoarer\nAntikt & Design\nBarnartiklar\nBarnkläder & Barnskor\nBarnleksaker\nBiljetter & Resor\nBygg & Verktyg\nBöcker & Tidningar\nDatorer & Tillbehör\nDVD & Videofilmer\nFordon, Båtar & Delar\nFoto, Kameror & Optik\nFrimärken\nHandgjort & Konsthantverk\nHem & Hushåll\nHemelektronik\nHobby\nKlockor\nKläder\nKonst\nMusik\nMynt & Sedlar\nSamlarsaker\

## Time to bring it together - the API for OpenAI is very simple!

In [34]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [35]:
summarize("https://www.tradera.com/category/170301?paging=3.a321.s26")

'Based on the prices listed on the website Tradera, here are three notable deals that appear to be particularly good compared to their expected market value:\n\n1. **Högtalare vintage - 78 kr**\n   - Expected Price: Approximately 200 kr or more for a vintage speaker.\n   - This deal is significantly cheaper than the expected price for a vintage speaker, making it a great find.\n\n2. **Philips Högtalare - 100 kr**\n   - Expected Price: Around 300 - 500 kr for a functioning Philips speaker.\n   - Offered at 100 kr, this deal represents a good value compared to what one would generally pay.\n\n3. **Sony Högtalare - 400 kr**\n   - Expected Price: Typically, Sony speakers of this kind might sell for about 700 - 900 kr.\n   - At 400 kr, this is a favorable deal for a working Sony speaker.\n\nThese deals stand out for their relatively low prices compared to what similar items might fetch in the market.\n'

In [36]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [38]:
display_summary("https://www.tradera.com/category/170301?paging=3.a321.s26")

Here are the best three deals from the Stereohögtalare section of Tradera, based on significant discounts compared to their usual retail price:

1. **Högtalare vintage - 78 kr**
   - This vintage speaker makes for an incredible deal at only 78 kr. Its typical pricing in good condition is likely around 300-500 kr, making this a steal.
   
2. **Bose Companion 2 Series III Högtalare - 899 kr**
   - Priced at 899 kr, these speakers usually retail for around 1,800-2,000 kr, providing substantial savings for buyers looking for quality sound at a lower price.

3. **Sony Högtalare (2st) - 250 kr**
   - Available for just 250 kr, this pair of Sony speakers typically sells for about 600-800 kr. This deal offers significant savings for those in need of a reliable audio solution.

These selections are based on the lower prices compared to typical market values for similar items.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [None]:
display_summary("https://cnn.com")

In [None]:
display_summary("https://anthropic.com")

In [None]:
# Step 1: Create your prompts

system_prompt = "something here"
user_prompt = """
    Lots of text
    Can be pasted here
"""

# Step 2: Make the messages list

messages = [] # fill this in

# Step 3: Call OpenAI

response =

# Step 4: print the result

print(

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)

# Sharing your code

I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.

If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.

Here are good instructions courtesy of an AI friend:  
https://chatgpt.com/share/677a9cb5-c64c-8012-99e0-e06e88afd293