# LLM Project: Webpage Descriptor using Llama 3.2

This program fetches content of a webpage via user-input and then will output the title of the webpage as well as a small summary of the page with the help of a LLM Model (Llama 3.2).

## Step 1: Install Required Libraries
To begin, we need the following Python libraries:
- `requests`: To fetch the webpage content.
- `beautifulsoup4`: To parse and clean up the webpage HTML.
- `ollama`: To interface with the locally installed Llama 3.2 model.

Once the libraries has been installed in your environment, open up a Jupyter notebook and proceed to next steps.

## Step 2: Fetch Webpage Content
To retrieve webpage data, we use the `requests` library. This function:
- Takes a URL as input.
- Sends a request to fetch the webpage.
- Uses a user-agent header to mimic a real browser request.
- Returns the HTML content.

In [1]:
import requests

def fetch_webpage(url):
    try:
        response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}) # Header is included to mimic a browser, as some websites block automated requests if a user-agent header is missing
        
        response.raise_for_status()  # Ensure request was successful
        
        return response.text # Returns the webpage's HTML content as a string using "response.text", which contains the entire webpage source in raw HTML format.
        
    except requests.exceptions.RequestException as e:
        print("Error fetching the webpage:", e)
        return None

# Program initialization
url = input("Enter a website URL: ")
html_content = fetch_webpage(url)


Enter a website URL:  https://open.spotify.com/


## Step 3: Extract Title and Content Using BeautifulSoup
We now use `BeautifulSoup` to extract the webpage **title** and **main content** while removing unnecessary HTML elements.

**Functionality:**
- Extracts the `<title>` of the webpage.
- Removes unnecessary elements like `<script>` and `<style>`.
- Collects visible text into a readable format.

In [2]:
from bs4 import BeautifulSoup

def extract_content(html):
    if not html:
        return None, None
    soup = BeautifulSoup(html, "html.parser")

    # Extract title
    title = soup.title.string if soup.title else "No title found"

    # Extract text content while removing scripts and styles
    for tag in soup(["script", "style"]):
        tag.decompose()  # Remove script and style tags

    content = ' '.join(soup.stripped_strings)  # Extract all text content while ignoring empty lines and removing extra spaces. ' '.join() joins all the extracted text into a single string , seperated by spaces.

    return title, content

# Example Usage (Only for test purpose)
title, content = extract_content(html_content)
print("Title:", title)
print("Content (first 500 characters):", content[:500])  # Displaying only the first 500 chars


Title: Spotify - Web Player: Music for everyone
Content (first 500 characters): Spotify - Web Player: Music for everyone Popular artists Lady Gaga David Guetta Rihanna The Weeknd Eminem Billie Eilish Linkin Park Coldplay Apache 207 Taylor Swift Nina Chuba CRO Sido FiNCH Ed Sheeran Pitbull Beyoncé Kendrick Lamar Lost Frequencies AYLIVA Popular albums and singles DeBÍ TiRAR MáS FOToS HIT ME HARD AND SOFT Gesegnet Hurry Up Tomorrow Folge 231: und der Dreiäugige Schakal From Zero Regengeräusche zum Entspannung In Liebe GNX LID GOVA SABÍA QUE NO The Secret of Us (Deluxe) tau mic


## Step 4: Use Llama 3.2 to Generate a Summary
Now, we use the `ollama` library to process the webpage content using the locally installed **Llama 3.2** model.

**Functionality:**
- Sends the webpage title and extracted content (up to 2000 characters) to Llama 3.2.
- Requests a short summary of the content.

In [12]:
import ollama

def summarize_content(title, content):
    prompt = f"Here is a webpage titled '{title}'. Summarize its content briefly:\n\n{content[:2000]}"  # Limit to 2000 characters
    response = ollama.chat(model="llama3.2", messages=[{"role": "user", "content": prompt}])
    
    return response["message"]["content"]

''' "response" is a dictionary (JSON-like structure) that is returned by ollama.chat(). 
    A typical response from an AI chat model looks like this:

    {
    "message": {
        "role": "assistant",
        "content": "(Response by the AI model)"}
    }

    By using response["message"]["content"] we extract only the content of AI's response
'''

# Generating the summary of the webpage
if title and content:
    summary = summarize_content(title, content)
    print(" Website Title:\n",title)
    print("\n About the website:\n", summary)


 Website Title:
 Spotify - Web Player: Music for everyone

 About the website:
 The webpage 'Spotify - Web Player: Music for everyone' features a vast collection of popular music from various artists. It includes:

* Popular artists such as Lady Gaga, David Guetta, Rihanna, The Weeknd, Eminem, and Billie Eilish
* Albums and singles by multiple artists, including some in different languages (e.g., Spanish)
* Soundtracks for TV shows and movies like Arcane and League of Legends
* Radio stations dedicated to specific artists or genres

Overall, the webpage showcases a diverse range of music from various artists and styles.


## Step 4b: Use OpenAI to Generate a Summary
To generate summary using OpenAI API key, use following cell block. Make sure you have `openai` library installed to environment.


In [35]:
import openai
import os

# Load API key from environment variable (if set), otherwise ask for it
api_key = os.getenv("OPENAI_API_KEY") or input("Enter your OpenAI API key: ").strip()

# Set the API key
openai.api_key = api_key

def summarize_content(title, content):
    prompt = f"Here is a webpage titled '{title}'. Summarize its content briefly:\n\n{content[:2000]}"  # Limit content to 2000 chars
    
    try:
        response = openai.chat.completions.create(
            model="gpt-4",  # Use "gpt-3.5-turbo" if needed
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7,  # Controls randomness (0 = deterministic, 0.7 = balanced, 1 = creative)
            max_tokens=300  # Limits response length
        )

        # Extract the AI's response
        return response["choices"][0]["message"]["content"]

    except openai.OpenAIError as e:
        print("Error calling OpenAI API:", e)
        return "Error: Unable to generate summary."

# Generating the summary of the webpage
if title and content:
    summary = summarize_content(title, content)
    print(" Website Title:\n",title)
    print("\n About the website:\n", summary)


Enter your OpenAI API key:  434


Error calling OpenAI API: Error code: 401 - {'error': {'message': 'Incorrect API key provided: 434. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
Error: Unable to generate summary.


## Step 4c: Use local model with local-API key to generate a summary
To generate summary using a local model with locally hosted API-key, use following cell block.

In [36]:
import requests
import json

# Loading the local API key for the LLM model
local_api = input("Enter your local API key for the model: ").strip()
model_name = input("Enter the exact model name: ")
BASE_URL = local_api

def summarize_content(title, content):
    prompt = f"Here is a webpage titled '{title}'. Summarize its content briefly:\n\n{content[:2000]}"  # Limit content

    # Define the API request payload
    payload = {
        "model": model_name,  
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.7,
        "max_tokens": 300
    }

    try:
        # Send the request to API
        response = requests.post(f"{BASE_URL}/chat/completions", json=payload)

        # Check if response is successful
        if response.status_code == 200:
            result = response.json()
            return result["choices"][0]["message"]["content"]  # Extract the text
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return "Error: Unable to generate summary."

    except Exception as e:
        print("Error calling local API:", e)
        return "Error: Unable to generate summary."

# Example Usage
if title and content:
    summary = summarize_content(title, content)
    print(summary)


Enter your local API key for the model:  http://127.0.0.1:1234
Enter the exact model name:  mistralrp-noromaid-nsfw-mistral-7b


Error calling local API: 'choices'
Error: Unable to generate summary.
