In [2]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [3]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [4]:
openai = OpenAI()

In [5]:
# A class to represent a Webpage. Some websites need you use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [8]:
webpage = Website("https://www.cnbc.com/quotes/ETH.CM=")
print(webpage.title)
print(webpage.text)

ETH.CM=: Ether/USD Coin Metrics - Stock Price, Quote and News - CNBC
Skip Navigation
Markets
Pre-Markets
U.S. Markets
Currencies
Cryptocurrency
Futures & Commodities
Bonds
Funds & ETFs
Business
Economy
Finance
Health & Science
Media
Real Estate
Energy
Climate
Transportation
Industrials
Retail
Wealth
Sports
Life
Small Business
Investing
Personal Finance
Fintech
Financial Advisors
Options Action
ETF Street
Buffett Archive
Earnings
Trader Talk
Tech
Cybersecurity
AI
Enterprise
Internet
Media
Mobile
Social Media
CNBC Disruptor 50
Tech Guide
Politics
White House
Policy
Defense
Congress
Expanding Opportunity
Video
Latest Video
Full Episodes
Livestream
Live Audio
Live TV Schedule
CNBC Podcasts
CEO Interviews
CNBC Documentaries
Digital Originals
Watchlist
Investing Club
Trust Portfolio
Analysis
Trade Alerts
Meeting Videos
Homestretch
Jim's Columns
Education
Subscribe
PRO
Pro News
Josh Brown
NEW!
My Portfolio
Livestream
Full Episodes
Stock Screener
Market Forecast
Options Investing
Chart Investi

In [9]:
# Define our system prompt"

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown. Also, specifically I want you to answer these 5 questions: (1) has this stock been spoke about in last month?"

# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [10]:
print(user_prompt_for(webpage))

You are looking at a website titled ETH.CM=: Ether/USD Coin Metrics - Stock Price, Quote and News - CNBC
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Skip Navigation
Markets
Pre-Markets
U.S. Markets
Currencies
Cryptocurrency
Futures & Commodities
Bonds
Funds & ETFs
Business
Economy
Finance
Health & Science
Media
Real Estate
Energy
Climate
Transportation
Industrials
Retail
Wealth
Sports
Life
Small Business
Investing
Personal Finance
Fintech
Financial Advisors
Options Action
ETF Street
Buffett Archive
Earnings
Trader Talk
Tech
Cybersecurity
AI
Enterprise
Internet
Media
Mobile
Social Media
CNBC Disruptor 50
Tech Guide
Politics
White House
Policy
Defense
Congress
Expanding Opportunity
Video
Latest Video
Full Episodes
Livestream
Live Audio
Live TV Schedule
CNBC Podcasts
CEO Interviews
CNBC Documentaries
Digital Originals
Watchlist
Investing Club
Trust Portfolio
Analysi

In [11]:
# Defining the message to send to the open ai API 

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [12]:
messages_for(webpage)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': "You are looking at a website titled ETH.CM=: Ether/USD Coin Metrics - Stock Price, Quote and News - CNBC\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nSkip Navigation\nMarkets\nPre-Markets\nU.S. Markets\nCurrencies\nCryptocurrency\nFutures & Commodities\nBonds\nFunds & ETFs\nBusiness\nEconomy\nFinance\nHealth & Science\nMedia\nReal Estate\nEnergy\nClimate\nTransportation\nIndustrials\nRetail\nWealth\nSports\nLife\nSmall Business\nInvesting\nPersonal Finance\nFintech\nFinancial Advisors\nOptions Action\nETF Street\nBuffett Archive\nEarnings\nTrader Talk\nTech\nCybersecurity\nAI\nEnterprise\nInternet\nMedia\nMobile\nSocial Media\nCNBC Disr

## Bring it all together

In [16]:
test_url = "https://www.cnbc.com/quotes/ETH.CM="

In [13]:
# We call the OpenAI API

def summarize4omini(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [17]:
summarize4omini(test_url)

"# Summary of ETH.CM=: Ether/USD Coin Metrics - CNBC\n\nThe CNBC page for Ether/USD Coin Metrics provides real-time market data for Ethereum (ETH) against the US Dollar (USD). As of the latest quote, Ethereum's price is $4,419.55, experiencing a decrease of 1.29% with a day range between $4,316.97 and $4,451.92.\n\n## Latest News Highlights\n- **Market Trends**: Bitcoin and Ethereum have recently lost gains following comments by Fed Chair Jerome Powell about potential rate cuts.\n- **Ethereum Performance**: Ethereum is currently hovering near its all-time highs.\n- **Stablecoin Initiatives**: Wyoming is advancing its efforts to become a leader in the cryptocurrency space with new stablecoin measures.\n- **JPMorgan Insights**: A Kinexys executive highlighted that JPMorgan is a front-runner in the blockchain sector.\n\nThis summary captures the key metrics and recent developments in the cryptocurrency market related to Ethereum."

In [19]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize5mini(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-5-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [20]:
summarize5mini(test_url)

"# Page summary — ETH.CM=: Ether/USD Coin Metrics (CNBC)\n\n- Page title: Ether/USD Coin Metrics (ETH.CM=) — stock/quote page on CNBC.\n- Quote (timestamp 2:23 AM EDT): 4,419.55 USD, down 57.84 (−1.29%).\n- Key intraday stats shown:\n  - Open: 4,419.79\n  - Day high: 4,451.92\n  - Day low: 4,316.97\n  - Previous close: 4,477.39\n- Notes: market data described as a “real-time snapshot” and may be delayed at least 15 minutes; data also provided by third parties.\n\nNews and media:\n- The page lists recent CNBC headlines related to crypto/markets (with approximate publish times):\n  - “Tuesday's big stock stories: What’s likely to move the market in the next trading session” — 7 hours ago\n  - “Wyoming's stablecoin is just the state's latest push to be a crypto pioneer” — 10 hours ago\n  - “JPMorgan is an industry leader in blockchain space: Kinexys by JPM executive director” — 10 hours ago\n  - “Bitcoin, ether erase gains driven by Fed Chair Powell's hints of coming rate cut: CNBC Crypto

In [21]:
def display_summary(url,summary_type=4):
    """
    url: str, feed the url you want to summarize
    summary_type: int, default 4. Set to 4 to use "gpt-4o-mini" and to 5 to use "gpt-5-mini".
    """
    summary = summarize4omini(url) if summary_type==4 else summarize5mini(url)
    display(Markdown(summary))

In [22]:
display_summary(url,summary_type=4)

# Summary of CNBC's Ether/USD Coin Metrics (ETH.CM=)

The webpage provides real-time information on the Ether to USD exchange rate. As of the latest update, the price of Ether is **$4,417.21**, a decrease of **$60.18** or **1.34%** from the previous close. The trading range for the day includes a high of **$4,451.92** and a low of **$4,316.97**.

## Latest News Highlights
- **Bitcoin and Ether Price Movement**: Both Bitcoin and Ether have lost gains that were previously fueled by remarks from Fed Chair Jerome Powell regarding potential interest rate cuts.
- **Ethereum Performance**: Ethereum's price is noted to be hovering near its all-time highs, indicating strong market interest and activity.
- **Regulatory Developments**: Wyoming is advancing its position in the cryptocurrency space with its new stablecoin initiative aimed at crypto innovation.
- **Industry Insights**: A JPMorgan executive emphasizes the bank's leadership in the blockchain sector.

Overall, the page serves as a quick reference for the current performance of Ether, alongside relevant market news affecting the cryptocurrency landscape.

In [23]:
display_summary(url,summary_type=5)

# ETH.CM= — Ether / USD (Coin Metrics) — CNBC

Short summary:
This CNBC quote page provides a real-time (snapshot) Ether (ETH) / USD price feed sourced from Coin Metrics, key intraday stats, and related CNBC crypto/market articles and video.

Key price data (as shown)
- Last (2:25 AM EDT): 4,418.61 (-58.78, -1.31%)
- Open: 4,419.79
- Day high: 4,451.92
- Day low: 4,316.97
- Previous close: 4,477.39

News & related content (headlines on the page)
- "Tuesday's big stock stories: What’s likely to move the market in the next trading session" — market movers and previews for the upcoming trading day.
- "Wyoming's stablecoin is just the state's latest push to be a crypto pioneer" — coverage of Wyoming’s stablecoin initiative and its crypto policy efforts.
- "JPMorgan is an industry leader in blockchain space: Kinexys by JPM executive director" — commentary on JPMorgan’s role in blockchain.
- "Bitcoin, ether erase gains driven by Fed Chair Powell's hints of coming rate cut: CNBC Crypto World" — markets (BTC/ETH) pulling back after comments from Fed Chair Powell.
- "Ethereum hovers near all-time highs" — reporting on ETH price approaching record levels.

Other notes
- The page also embeds a related 7:20 video about Wyoming’s stablecoin.
- A site note indicates "There is no recent news for this security," even though related CNBC articles are listed.
- Data is presented as a real-time snapshot but is delayed at least 15 minutes per the site disclaimer.

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)

# Sharing your code

I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.

If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.

Here are good instructions courtesy of an AI friend:  
https://chatgpt.com/share/677a9cb5-c64c-8012-99e0-e06e88afd293