# Summarize a website content using olama installed locally

Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)

In [2]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

In [3]:
# Constants

OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"

In [4]:
# Let's just make sure the model is loaded

!ollama pull llama3.2

[?25lpulling manifest â ‹ [?25h[?25l[2K[1Gpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¹ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ´ [?25h[?25l[2K[1Gpulling manifest â ¦ [?25h[?25l[2K[1Gpulling manifest â § [?25h[?25l[2K[1Gpulling manifest â ‡ [?25h[?25l[2K[1Gpulling manifest â � [?25h[?25l[2K[1Gpulling manifest â ‹ [?25h[?25l[2K[1Gpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest 
pulling dde5aa3fc5ff... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 2.0 GB                         
pulling 966de95ca8a6... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 1.4 KB                         
pulling fcc5a6bec9da... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 7.7 KB                         
pulling a70ff7e570d9... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 6.0 KB                         
pulling 56bb8bd477a5

In [8]:
import os

from dotenv import load_dotenv


# Step 1: Create your prompts
def scrape_website(url):
    # Fetch the webpage
    response = requests.get(url)
    
    # Check if the request was successful
    if response.status_code != 200:
        raise Exception("Failed to fetch the webpage: {response.status_code}")
    
    # Parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Extract all text, ignoring HTML and JavaScript
    text = soup.get_text(separator='\n')
    
    return text

def user_prompt_for_website(message):
    user_prompt = "You are looking at the contents of a website"
    user_prompt += "\nThe contents of this website is as follows.\n\n"
    user_prompt += message
    user_prompt += "\nprovide a concise summary of the website ignoring the html and javascript and any other navigational content\n\n"
    user_prompt += "\nIf it includes news or announcements, then summarize these too.\n\n"
    return user_prompt

system_prompt = "you are the website summary generator"

# Step 2: Make the messages list

def messages_for_website(message):
        return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for_website(message)}
    ]

# Step 3: Call OpenAI
def summarize_website(message):
    from openai import OpenAI
    ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

    response = ollama_via_openai.chat.completions.create(
        model=MODEL,
        messages=messages_for_website(message)
    )
    return response.choices[0].message.content

def display_summary_website(message):
    summary = summarize_website(message)
    display(Markdown(summary))

website_text = scrape_website("https://edition.cnn.com/")
response = display_summary_website(website_text)


**Summary:**

The website appears to be a personal or corporate blog/online publication from CNN. The content is primarily news-based, covering various international and local issues.

**News and Announcements:**

* No recent breaking news or press releases are available on the website.
However, there are some archives of past news articles and announcements that can be found on certain sections of the site.

**Other Content:**

The website features a mix of:

1. **Style articles**: About fashion, design, architecture, and luxury.
2. **Travel articles**: Featuring destinations, food and drink, stay, and videos related to travel.
3. **Science articles**: Covering space, life, and unearthed topics.
4. **Climate-related content**: Including solutions and weather updates.

**Opinions and Features:**

The website includes some opinion pieces and features, such as:

1. "As Equals" - CNN's commitment to diversity and inclusion
2. "Call to Earth" - A campaign focused on sustainability and environmental issues
3. "Freedom Project" - A series of articles highlighting freedom-related topics

**Audio podcasts:**

The website promotes various audio podcasts from CNN, including:

1. "Chasing Life with Dr. Sanjay Gupta"
2. "One Thing"
3. "Tug of War"
4. "CNN Political Briefing"
5. "The Axe Files"

**Video content:**

There are various video sections, such as live TV, CNN 10, Shows A-Z, and videos featuring experts and news anchors.