In [2]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# If you get an error running this cell, then please head over to the troubleshooting notebook!

# Connecting to OpenAI

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.  


In [4]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [5]:
openai = OpenAI()

# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.
# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions

# Let's make a quick call to a model to get started

In [6]:
# To give you a preview -- calling OpenAI with these messages is this easy. Any problems, head over to the Troubleshooting notebook.

message = "Hello, GPT! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! Welcome! I'm glad you're here. How can I assist you today?


In [7]:
print(response)

ChatCompletion(id='chatcmpl-C6es3A7o3gnFjmB4CgGxHXirKhSdc', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Hello! Welcome! I'm glad you're here. How can I assist you today?", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1755703111, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_560af6e559', usage=CompletionUsage(completion_tokens=16, prompt_tokens=22, total_tokens=38, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))


## Onwards with our first project

In [8]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [11]:
# Let's try one out. Change the website and add print statements to follow along.

bhp = Website("https://team-bhp.com")
print(bhp.title)
print(bhp.text)

Team-BHP - India's Most Trusted Car Reviews & News
Skip to main content
About Us
|
Advertise
|
Contact Us
Forum
Hot Threads
News
Reviews
Photos
Buy Used Car
Spare Parts
Windshield Experts
Classifieds
Store
2025 Suzuki Access 125 Review
News
Latest News
Member Content
More images: Mahindra XUV700 facelift spied inside and out
Ather EL low-cost electric scooter concept teased
Royal Enfield Motoverse 2025 registrations open
Tata Motors re-enters the South African market after 6 years
Xiaomi plans to launch EVs in Europe by 2027
Citroen announces Basalt, Aircross & C3 Drive range for fleet operators
2025 Hero Glamour X 125 launched at Rs 89,999
Korea turns driving into a game; sparks safer driving habits
Ola electric car, rickshaw & LCV based on Gen 4 platform teased
View All News
How to clean the dust settlement in engine bay due to dried up 3M coat?
2022 Toyota Hyryder Hybrid efficiency drop: Is E20 fuel to blame?
1,23,456km with my 2016 Volkswagen Polo; Still feels new after 9 years!
Th

In [13]:
# Define our system prompt"

system_prompt = "You are a snarky assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [14]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [15]:
print(user_prompt_for(bhp))

You are looking at a website titled Team-BHP - India's Most Trusted Car Reviews & News
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Skip to main content
About Us
|
Advertise
|
Contact Us
Forum
Hot Threads
News
Reviews
Photos
Buy Used Car
Spare Parts
Windshield Experts
Classifieds
Store
2025 Suzuki Access 125 Review
News
Latest News
Member Content
More images: Mahindra XUV700 facelift spied inside and out
Ather EL low-cost electric scooter concept teased
Royal Enfield Motoverse 2025 registrations open
Tata Motors re-enters the South African market after 6 years
Xiaomi plans to launch EVs in Europe by 2027
Citroen announces Basalt, Aircross & C3 Drive range for fleet operators
2025 Hero Glamour X 125 launched at Rs 89,999
Korea turns driving into a game; sparks safer driving habits
Ola electric car, rickshaw & LCV based on Gen 4 platform teased
View All News
How to 

In [16]:
messages = [
    {"role": "system", "content": "You are an angry assistant"},
    {"role": "user", "content": "What is 2 * 2?"}
]

In [17]:
# To give you a preview -- calling OpenAI with system and user messages:

response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

Seriously? It's 4. Everyone knows that.


## And now let's build useful messages for GPT-4o-mini, using a function

In [18]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [19]:
# Try this out, and then try for a few more websites

messages_for(bhp)

[{'role': 'system',
  'content': 'You are a snarky assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Team-BHP - India\'s Most Trusted Car Reviews & News\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nSkip to main content\nAbout Us\n|\nAdvertise\n|\nContact Us\nForum\nHot Threads\nNews\nReviews\nPhotos\nBuy Used Car\nSpare Parts\nWindshield Experts\nClassifieds\nStore\n2025 Suzuki Access 125 Review\nNews\nLatest News\nMember Content\nMore images: Mahindra XUV700 facelift spied inside and out\nAther EL low-cost electric scooter concept teased\nRoyal Enfield Motoverse 2025 registrations open\nTata Motors re-enters the South African market after 6 years\nXiaomi plans to launch EVs in Europe by 2027\n

## Time to bring it together - the API for OpenAI!

In [20]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [21]:
summarize("https://team-bhp.com")

'# Team-BHP Summary\n\n**Website Overview**:  \nTeam-BHP is a comprehensive online platform dedicated to car reviews, news, and discussions, focused on the Indian automotive market. It emphasizes community engagement through forums where members share experiences and advice related to cars and motorcycles. \n\n---\n\n## Key Features:\n- **Reviews**: In-depth reviews of various car and motorcycle models, including recent reviews of vehicles like the 2025 Suzuki Access 125 and others.\n- **News**: Latest updates in the automotive industry, including sneak peeks at upcoming models and announcements from manufacturers.\n\n### Recent News Highlights:\n- **Mahindra XUV700**: Facelifted model spotted.\n- **Ather**: New low-cost electric scooter concept teased.\n- **Royal Enfield**: Registrations for Motoverse 2025 now open.\n- **Tata Motors**: Re-entering the South African market after a 6-year hiatus.\n- **Citroen**: Launches new range targeting fleet operators including the Basalt and Aircr

In [22]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [24]:
display_summary("https://team-bhp.com")

# Team-BHP - India's Most Trusted Car Reviews & News

Team-BHP is a comprehensive platform dedicated to car enthusiasts in India, providing a plethora of resources, including **car reviews, news, forums**, and **classifieds** for buying and selling vehicles.

## Latest News Highlights
- **2025 Suzuki Access 125 Review** released.
- **Mahindra XUV700 facelift** spotted in test drives.
- **Ather EL** electric scooter concept teased.
- **Tata Motors** makes a comeback to the South African market after 6 years.
- **Xiaomi** plans to launch electric vehicles in Europe by 2027.
- **Citroen** introduces vehicles tailored for fleet operators.
- **2025 Hero Glamour X 125** launched at ₹89,999.

## Community Engagement
The site features a robust **forum** with discussions ranging from ownership experiences to technical advice. Popular hot threads include ownership reviews of various models and discussions about driving habits and EV adoption in India. 

## Reviews and Comparisons
Team-BHP hosts detailed reviews for a wide array of vehicles, from luxury cars to motorcycles, giving users insights into experiences and performance.

Overall, Team-BHP serves as a reliable source for automotive information, fostering an engaged community among car lovers.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [25]:
display_summary("https://indianexpress.com")

# Summary of Latest News Today - The Indian Express

The **Indian Express** website, titled "Latest News Today," serves as a comprehensive news portal covering a wide array of topics including politics, business, sports, entertainment, and more, focused primarily on developments in India. 

### Key Headlines

- **Political Drama**: A TMC leader has filed a defamation case against the father of a rape-murder victim, generating significant controversy.
- **Military Movements**: Israel is mobilizing 60,000 army reservists in preparation for an offensive in Gaza City.
- **Local Incidents**: A tragic incident in Ahmedabad occurred where a Class 8 student fatally stabbed a Class 10 student, prompting protests and violence at a local school.
- **Protests and Policing**: Major protests against a power project turned violent in Madhya Pradesh, resulting in the police employing lathicharge and teargas to control the crowds, with around 300 individuals booked.
- **Entertainment**: Shah Rukh Khan made headlines at a preview event for *The Ba***ds of Bollywood*, creating quite the buzz with his charisma and interactions with fans.
- **Legislation Moves**: The Indian government is pushing forward with talks aimed at resuming trade agreements with the Russia-led EAEU bloc amidst complicated international tariff discussions.

### Miscellaneous Updates

- Mumbai continues to experience heavy rains leading to significant impacts on daily life and transportation.
- An incident involving alleged police malpractice has raised concerns regarding conditions at the Mandoli Jail, prompting a judicial reprimand.
- The RBI Governor has voiced concerns over monetary policy in light of external uncertainties affecting the Indian economy.

In essence, *The Indian Express* remains a vital source for breaking news and in-depth analysis on current events shaping India today.

# Even converts Hindi News to it's English Equivalent Summary

In [27]:
display_summary("https://bhaskar.com")

# Summary of Dainik Bhaskar Website

Dainik Bhaskar is a comprehensive Hindi news website providing the latest updates on various topics, including politics, sports, entertainment, and lifestyle. The site offers real-time news coverage, videos, and special features on trending issues.

## Key Highlights:

- **Political News**: The site discusses significant political events like recent bills introduced in the Lok Sabha, including a bill that could lead to the removal of PM or CM upon arrest. There were also incidents of opposition protests where paper balls were thrown in Parliament.

- **Crime and Legal Matters**: Reports cover alarming incidents, such as an attack on Delhi's Chief Minister and issues surrounding migrant workers in Delhi-NCR. The aftermath of natural disasters, such as fatalities caused by cloudbursts in Kishtwar, is also highlighted.

- **Sports Updates**: Cricket features prominently, including discussions about India's team composition for the Asia Cup, and controversies surrounding the rankings of players like Rohit Sharma and Virat Kohli.

- **Business News**: Developments in the business world include the potential banning of online gaming sponsors and updates related to job recruitment in banks.

- **Lifestyle Articles**: Content spans lifestyle tips and relationship advice, focusing on the well-being of readers.

- **Trending Topics**: The site reports on various trending topics like the ongoing Russia-Ukraine war, the political landscape in Maharashtra, and unique stories captivating audience interest.

Overall, Dainik Bhaskar serves as a key source of information for Hindi-speaking audiences seeking diverse news coverage from local to international events.