# LLM Project: Webpage Descriptor using Llama 3.2

This program fetches content of a webpage via user-input and then will output the title of the webpage as well as a small summary of the page with the help of a LLM Model (Llama 3.2).

## Step 1: Install Required Libraries
To begin, we need the following Python libraries:
- `requests`: To fetch the webpage content.
- `beautifulsoup4`: To parse and clean up the webpage HTML.
- `ollama`: To interface with the locally installed Llama 3.2 model.

Once the libraries has been installed in your environment, open up a Jupyter notebook and proceed to next steps.

## Step 2: Fetch Webpage Content
To retrieve webpage data, we use the `requests` library. This function:
- Takes a URL as input.
- Sends a request to fetch the webpage.
- Uses a user-agent header to mimic a real browser request.
- Returns the HTML content.

In [None]:
import requests

def fetch_webpage(url):
    try:
        response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}) # Header is included to mimic a browser, as some websites block automated requests if a user-agent header is missing
        
        response.raise_for_status()  # Ensure request was successful
        
        return response.text # Returns the webpage's HTML content as a string using "response.text", which contains the entire webpage source in raw HTML format.
        
    except requests.exceptions.RequestException as e:
        print("Error fetching the webpage:", e)
        return None

# Program initialization
url = input("Enter a website URL: ")
html_content = fetch_webpage(url)


## Step 3: Extract Title and Content Using BeautifulSoup
We now use `BeautifulSoup` to extract the webpage **title** and **main content** while removing unnecessary HTML elements.

In [2]:
from bs4 import BeautifulSoup

def extract_content(html):
    if not html:
        return None, None
    soup = BeautifulSoup(html, "html.parser")

    # Extract title
    title = soup.title.string if soup.title else "No title found"

    # Extract text content while removing scripts and styles
    for tag in soup(["script", "style"]):
        tag.decompose()  # Remove script and style tags

    content = ' '.join(soup.stripped_strings)  # Extract all text content while ignoring empty lines and removing extra spaces. ' '.join() joins all the extracted text into a single string , seperated by spaces.

    return title, content

# Example Usage (Only for test purpose)
title, content = extract_content(html_content)
print("Title:", title)
print("Content (first 500 characters):", content[:500])  # Displaying only the first 500 chars


Title: ULKASEMI – We are integrating your ideas
Content (first 500 characters): ULKASEMI – We are integrating your ideas Primary Menu Know More About Us Partnership Advantage News, Events And Gallery Legal Management Team Global Presence QMS Policy ISMS Policy Blog FAQs Our Clients Services Offerings Custom IC Design IC Design Services Circuit Design IC Design Verification Functional Verification AMS Verification Digital Verification PCB Design Physical Design SOC Design Foundry Design Services Software Software Development Software Reseller Industry served Career Contacts 


In [4]:
## **Step 4: Use Llama 3.2 to Generate a Summary**

import ollama

def summarize_content(title, content):
    prompt = f"Here is a webpage titled '{title}'. Summarize its content briefly:\n\n{content[:2000]}"  # Limit to 2000 characters
    response = ollama.chat(model="llama3.2", messages=[{"role": "user", "content": prompt}])
    
    return response["message"]["content"]

# Example Usage
if title and content:
    summary = summarize_content(title, content)
    print(summary)


The webpage "ULKASEMI – We are integrating your ideas" appears to be a company website for ULKASEMI, a global leader in semiconductor design services. The content is structured into various sections:

1. **About Us**: Introduces ULKASEMI as a 17-year-old company with expertise in semiconductor design services.
2. **Services Offerings**: Lists the company's services, including custom IC design, circuit design, and verification, as well as software development and reseller services.
3. **Partnership Advantage**: Highlights ULKASEMI's partnerships and collaborations with clients across various industries.
4. **Global Presence**: Informally mentions that ULKASEMI has a global presence.

The primary focus of the website seems to be promoting ULKASEMI's capabilities in semiconductor design services, emphasizing its expertise in areas such as custom IC design, physical design, verification, and software development.
