In [25]:
import os
import requests
from dotenv import load_dotenv 
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import AzureOpenAI

In [26]:
load_dotenv(r"C:\Users\HP\OneDrive\Documents\credentials.env")
api_key = os.environ['API_KEY']
api_base = os.environ['RESOURCE_ENDPOINT']
chat_model_id=os.environ['chat_model_id']
chat_model=os.environ['chat_model']
api_type = "azure"
api_version = "2023-06-01-preview"

In [11]:
openai = AzureOpenAI(
    azure_endpoint=api_base,
    api_key=api_key, 
    api_version=api_version,
)

In [12]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [13]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [14]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [15]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [16]:
ed = Website("https://edwarddonner.com")
messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Home - Edward Donner\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nHome\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. We’re applying AI to a field where it can make a massive, posi

In [17]:
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=chat_model_id, 
        messages=messages_for(website))
    return response.choices[0].message.content

In [18]:
summarize("https://edwarddonner.com")

'The website titled "Home - Edward Donner" features several sections including "Home", "Outsmart", "About", "Posts", and "Get in touch". The "Home" section gives a brief introduction of the website owner Ed and his interests such as writing code, experimenting with LLMs, and enjoying music. In addition, the section mentions his professional experience as the co-founder and CTO of Nebula.io and also highlights his previous role as the founder and CEO of AI startup untapt (acquired in 2021). The "Outsmart" section includes information about an LLM battle arena that involves diplomacy and deviousness. The "About" section features a personal description of Ed Donner while the "Posts" section includes news and announcements on various topics like mastering AI and LLM engineering, choosing the right LLM, and the Outsmart LLM arena. Finally, the "Get in touch" section provides contact information for Ed Donner, including email and social media profiles, and also gives the option to subscribe 

In [19]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [20]:
display_summary("https://edwarddonner.com")

This is the personal website of Edward Donner, who is the co-founder and CTO of Nebula.io, an AI company focused on talent management. The website includes information about his interests (writing code, DJing, electronic music production), his background, and his company. The homepage includes links to an "Outsmart" LLM arena and an "About" section. The "Posts" section includes announcements and resources related to AI and LLM engineering. The website also includes navigation links and a way to subscribe to Edward's newsletter.

In [21]:
display_summary("https://cnn.com")

The website, Breaking News, Latest News and Videos | CNN, offers news coverage across a variety of topics such as US and world news, politics, business, health, entertainment, science, climate, weather, and sports. The site also features video content, live TV, and a variety of podcasts. It also offers sections on travel and style and includes features such as As Equals, Call to Earth, Freedom Project, and CNN Heroes. Additionally, the website provides a link to provide feedback.

In [22]:
display_summary("https://anthropic.com")

The website is titled Home \ Anthropic and it is devoted to AI safety and research. It provides information about their intelligent AI model, Claude 3.5 Sonnet, and offers the ability to use the API to drive efficiency and create new revenue streams. Anthropic is an AI safety and research company based in San Francisco, whose interdisciplinary team generates research, creates reliable, beneficial AI systems, and works on product development. The website has a section devoted to announcements, research, and news that includes recent updates on Constitutional AI, Core Views on AI Safety, and information on available open roles.

In [23]:
display_summary("https://en.wikipedia.org/wiki/Main_Page")

# Summary of Wikipedia

Wikipedia is a free online encyclopedia that can be edited by anyone. It has 6,924,464 articles in English. Its contents are organized through a navigation menu that includes options such as Main Page, Contents, Current events, Random article, About Wikipedia, Contact us, Contribute, Help, Learn to edit, and Community portal. The website also features news announcements and notable events in the categories of In the News, On this day, and Recently featured, which showcase articles, featured pictures, and newly added content to the website. In addition, it has links to sister projects and other Wikipedias, along with their respective article count.