## Introduction

This Jupyter Notebook demonstrates how to use OpenAI's GPT-4o API to summarize content directly from websites. The goal of this project is to provide an automated tool that can fetch text from a webpage and generate concise, readable summaries.

By leveraging GPT-4o, the notebook processes the main text of any website, such as articles or blog posts, and distills it into key points, making it easier to grasp the essential information quickly.

In this notebook, you’ll find step-by-step instructions on how to input a website URL, retrieve its content, and generate a summary. The code is designed to be simple and flexible, suitable for anyone looking to summarize web pages for research, reading, or personal use.


In [None]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI


# Connecting to OpenAI

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.


In [None]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-;")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them")
else:
    print("API key found, Keep going!")


In [None]:
openai = OpenAI()

# Let's make a quick call to test the response of GPT-4o

In [None]:
# To give you a preview -- calling OpenAI with these messages is this easy. Any problems, head over to the Troubleshooting notebook.

message = "Hello, My name is Napoliyan Nelson. How are you?"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

In [None]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [None]:
# Let's try one out. Change the website and add print statements to follow along.

aim = Website("https://aim.gov.in/")
print(aim.title)
print(aim.text)

In [None]:
# Define system prompt

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [None]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [None]:
print(user_prompt_for(aim))

## Build messages for GPT-4o-mini, using a function

In [None]:
# This function creates the format

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [None]:
# Trying out

messages_for(aim)

## Compiling Together

In [None]:
# Calling the OpenAI API.

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [None]:
summarize("https://aim.gov.in/")

In [None]:
# This function is to display the content nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [None]:
display_summary("https://aim.gov.in/")

# You can try more websites

In [None]:
display_summary("https://www.bbc.com/")