In [17]:
import requests
from bs4 import BeautifulSoup
from IPython.display import display, clear_output, Markdown
import ollama
import time


In [18]:
MODEL = "llama3"

In [19]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)


In [20]:
# Let's try one out

ed = Website("https://edwarddonner.com")


In [21]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [22]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [23]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [25]:
def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    stream = ollama.chat(model=MODEL, messages=messages, stream=True)

    markdown_output = ""
    
    for chunk in stream:
        content = chunk['message']['content']
        markdown_output += content
        clear_output(wait=True)  # Clear previous output
        display(Markdown(markdown_output))  # Render Markdown
        time.sleep(0.01)  # Control speed of updates


In [27]:
url = "https://prakhr.github.io/Portfolio-Site/"
summarize(url)

I'm looking at the V.E.R.O.N.I.C.A website, and it appears to be a personal website of a software developer and data analyst named Prakhar. The website has several sections, including:

1. My Projects: This section showcases some of Prakhar's projects, including Image Duplicate Prediction, General Duties, and more.
2. Offered Services: This section lists the services Prakhar offers, including Python programming, backend development, data analysis, and more.
3. My Skills: This section highlights Prakhar's skills in various areas, such as machine learning, deep learning, natural language processing, and computer vision.
4. Education: This section briefly mentions Prakhar's educational background, including his college projects and training in algorithms like SGD, LR, LAR, k-means clustering, MAP, ML, RBF-NNs, MLPs, ELMs, SAEs, and CNNs.
5. Contact Me: This section provides various ways to get in touch with Prakhar, including email, phone number, Twitter, Quora, Facebook, and GitHub.
6. Resume: This section links to Prakhar's resume, which is not available on this website.
7. Hire Me: This section is a call-to-action for potential employers or clients who are interested in hiring Prakhar.

The website also includes some personal touches, such as a "My Thoughts" section where Prakhar shares his opinions and experiences, and a "Future Works" section that outlines his plans and goals for the future. Overall, V.E.R.O.N.I.C.A is a professional website that showcases Prakhar's skills, projects, and services as a software developer and data analyst.