## Summarize the news

This project or exercise is about summarization using a frontier LLM model from OPenAI, specifically gpt-4.1-mini. I will get the latest news about a company using yfinance API. Inside returned output in the form of a dictionary or JSON structured data I can find the URL to a website with an article or articles about a company. Then I can scrape the web and feed the information into an LLM and ask it to summarize the content.

The URL can be extracted from a news object of the yfinance ticker:

ticker = yfinance.Ticker('AAPL')

news = ticker.news

url = news['content']['canonicalUrl']['url']

In [2]:
import yfinance as yf

In [6]:
# extracting the website content

from bs4 import BeautifulSoup
import requests


# Standard headers to fetch a website
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url):
    """
    Return the title and contents of the website at the given url
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    title = soup.title.string if soup.title else "No title found"
    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""
    return (title + "\n\n" + text)[:4_000]

In [None]:
ticker = 'AAPL'
ticker = yf.Ticker(ticker)
news = ticker.news
print("\nRecent News:")
for n in news[:1]:
    print(n['content']['title'])
    n_content = n['content']
    url = n_content['canonicalUrl']['url']
    print(url)



Recent News:
Apple announces five-year partnership as U.S. broadcast partner for Formula 1
https://sports.yahoo.com/article/apple-announces-five-partnership-u-130512518.html


In [20]:
fetch_website_contents(url)

"Apple announces five-year partnership as U.S. broadcast partner for Formula 1 - Yahoo Sports\n\nSearch query\nSearch\nNews\nFinance\nSports\nMore\n-1\nManage your account\nHelp\nAdd or switch accounts\nSign out\nNFL\nNCAAF\nFantasy\nMLB\nNBA\nNHL\nWNBA\nGolf\nBetting\nRacing\nTennis\nNCAAB\nWatch\nNCAAW\nDaily Draw\nSoccer\nCombat\nMMA\nBoxing\nWrestling\nBoardroom\nCollectibles\nWomen's Sports\nCollege & High School\nFantasy Football Draft Kit\nCollege Sports\nHorse Racing\nUFL\nCycling\nOlympics\nCricket\nWhat & How To Watch\nNewsletters\nGameChannel\nWatch\nRSS\nJobs\nHelp\nMore News\nBoardroom\nNetwork with Rich Kleiman\nEpisode 7: CC Sabathia\nEpisode 6: Donovan Mitchell\nEpisode 5: Karl-Anthony Towns\nEpisode 4: Mark Cuban\nCover Story: Issa Rae\nCover Story: Aryna Sabalenka\nSee All Boardroom\nExplore Boardroom, where Yahoo and Boardroom Sports cover the business and culture behind the biggest sports stories.\nCombat\nMMA\nBoxing\nWrestling\nThe Ariel Helwani Show\nThe Boys in 

In [21]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display
from openai import OpenAI

In [None]:
# Next is to load the API keys into environment variables
# it ensures that what's in the file with our API keys takes priority

load_dotenv(override=True)

True

In [26]:
system_prompt = """
You are a helpful assistant that analyzes the contents of a website,
and provides a short, snarky, humorous summary, ignoring text that might be navigation related.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

user_prompt_prefix = """
Here are the contents of a website.
Provide a short summary of this website.
If it includes news or announcements, then summarize these too.

Title of the article:


"""

In [27]:
def messages_create(website, title):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_prefix + title + website}
    ]

In [28]:
def summarize(url, title):
    website = fetch_website_contents(url)
    response = openai.chat.completions.create(
        model = "gpt-4.1-mini",
        messages = messages_create(website, title)
    )
    return response.choices[0].message.content

In [29]:
def display_summary(url, title):
    summary = summarize(url, title)
    display(Markdown(summary))

In [None]:
openai = OpenAI()

tickers = ["AAPL"]
for ticker in tickers:
    ticker = yf.Ticker(ticker)
    news = ticker.news
    for n in news[:1]:
        n_content = n['content']
        title = n['content']['title']
        url = n_content['canonicalUrl']['url']
        display_summary(url, title)

Apple just revved up the streaming game by snagging exclusive U.S. broadcast rights for Formula 1 for the next five years—starting next season, all F1 races will zoom only on Apple TV. This partnership follows the smashing success of "F1 The Movie," the highest-grossing sports flick ever. Expect full coverage of practices, qualifiers, sprints, and grands prix, with some content even free on the Apple TV app. Plus, Apple’s going all-in by boosting F1 content across Apple News, Maps, Music, Fitness+, and their Sports app, turning your iPhone into the ultimate race pit. So if you thought Apple was just about phones and fruit, think again—they’re now your go-to for speed and drama on the track! 🏎️🍏