# Website Summerizer Using OpenAI API

This mini project use beautifulsoup to scrap the website user provided and convert the website info as a prompt to OpenAI to summarize

In [5]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [11]:

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

In [7]:
openai = OpenAI()

In [9]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

def messages_for(website):
    return[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

def summarize(url):
    webpage= Website(url)
    response = openai.chat.completions.create(
        model ="gpt-4o-mini",
        messages = messages_for(webpage)
    )
    return display(Markdown(response.choices[0].message.content))

In [10]:
summarize("https://cnn.com")

# Summary of CNN Website Contents

CNN is a leading news website that provides breaking news, in-depth analysis, and a variety of multimedia content covering a range of topics including U.S. politics, world news, business, health, entertainment, sports, science, and technology. The site features live TV broadcasts, video reports, and podcasts that delve into current events.

## Key Highlights:

- **Breaking News:**
  - AP has filed a lawsuit against Trump administration officials after being barred from presidential events.
  - Lawmakers are expressing concern over NASA's lack of transparency regarding DOGE.
  - Los Angeles DA opposes new evidence claims by the Menendez brothers.
  - Recent updates include a court ruling on Trump's push to expand power and ongoing debates in various political arenas.

- **U.S. Affairs:**
  - A federal judge has allowed the Trump administration to move forward with dismantling USAID.
  - Two individuals have been charged in connection to a transgender man's death in New York.

- **International Updates:**
  - Israel is preparing for the return of a coffin believed to contain the remains of Shiri Bibas.
  - U.S. aid freezes have put HIV-positive orphans in Kenya at risk.

- **Entertainment & Culture:**
  - Tributes are pouring in for individuals such as soul singer Jerry Butler and Notorious B.I.G.'s mother, Voletta Wallace, who recently passed away.

- **Health & Science:**
  - Insightful articles discuss the implications of ultraprocessed foods and health risks related to various conditions.

- **Sports:**
  - The New York Yankees have updated their facial hair policy, allowing beards.

CNN continues to engage its audience with live updates, podcasts, video segments, and in-depth reporting on global events and issues affecting society. The platform encourages user interaction through feedback mechanisms related to advertisements and website functionality.