<a href="https://colab.research.google.com/github/sufiyansayyed19/LLM_Learning/blob/main/Day5_Business_Brochure_Solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

All imports  
import os  
import json  
from IPython.display import Markdown, display, update_display  
from google.colab import userdata  
from openai import OpenAI   

## 1.Scraper Functions

This section defines functions to scrape website content and extract links. These functions are crucial for gathering the necessary information to create the brochure.

### Get website content

In [1]:
# 1. Install necessary libraries
!pip install beautifulsoup4 requests markdownify

import requests
from bs4 import BeautifulSoup
import re
from urllib.parse import urljoin

# --- THESE ARE THE FUNCTIONS ED HAS IN 'scraper.py' ---

def fetch_website_contents(url):
    """
    Fetches the text content of a website, stripping out scripts and styles.
    """
    print(f"Scraping: {url}...")
    try:
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
        response = requests.get(url, headers=headers, timeout=10)
        soup = BeautifulSoup(response.content, 'html.parser')

        # Remove script and style elements
        for script in soup(["script", "style", "nav", "footer"]):
            script.extract()

        # Get text
        text = soup.get_text()

        # Break into lines and remove leading/trailing space on each
        lines = (line.strip() for line in text.splitlines())
        # Break multi-headlines into a line each
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        # Drop blank lines
        text = '\n'.join(chunk for chunk in chunks if chunk)

        return text[:5000] # Limit to 5000 chars to save tokens
    except Exception as e:
        return f"Error fetching {url}: {e}"



Collecting markdownify
  Downloading markdownify-1.2.2-py3-none-any.whl.metadata (9.9 kB)
Downloading markdownify-1.2.2-py3-none-any.whl (15 kB)
Installing collected packages: markdownify
Successfully installed markdownify-1.2.2


### Get website links

In [2]:
def fetch_website_links(url):
    """
    Fetches all links from a website and converts relative links to absolute.
    """
    try:
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
        response = requests.get(url, headers=headers, timeout=10)
        soup = BeautifulSoup(response.content, 'html.parser')

        links = []
        for a_tag in soup.find_all('a', href=True):
            href = a_tag['href']
            # Convert relative links (e.g. "/about") to full links
            full_url = urljoin(url, href)
            if full_url.startswith('http'):
                links.append(full_url)

        # Remove duplicates
        return list(set(links))
    except Exception as e:
        print(f"Error fetching links: {e}")
        return []

print("Scraper functions loaded!")

Scraper functions loaded!


## 2.API Key Set Up for colab

### ADD key value in secret and check with code below

In [3]:
from google.colab import userdata

try:
    key = userdata.get('OPENAI_API_KEY')
    print(f"Success! Key found. It starts with: {key[:8]}...")
except Exception as e:
    print("Error: Could not find key. Did you turn the toggle switch ON?")

Success! Key found. It starts with: sk-or-v1...


## 3.API Call Setup

In [9]:
import os
from openai import OpenAI

 #1. Setup API Key (Make sure you added OPENAI_API_KEY in Colab Secrets)
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# 2. Setup Client (OpenRouter)
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENAI_API_KEY"],
)

# 3. Define Models
# We use a cheap, fast model for link selection
LINK_MODEL = "openai/gpt-4o-mini"
# We use a smarter model for writing the brochure
WRITER_MODEL = "openai/gpt-4o-mini"

print("Client and Models configured!")

Client and Models configured!


## 4.Prompt Design

### i-System Prompts

#### Link system prompt

In [10]:
# --- SYSTEM PROMPTS ---
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

#### Brochure System prompt

In [11]:

brochure_system_prompt = """
You are an assistant that analyzes the contents of several relevant pages from a company website
and creates a short brochure about the company for prospective customers, investors and recruits.
Respond in markdown without code blocks.
Include details of company culture, customers and careers/jobs if you have the information.
"""


### ii- User Prompt

#### User prompt to select relevent links

In [12]:
def get_links_user_prompt(url):
    raw_links = fetch_website_links(url)
    # Take only first 30 links to save tokens
    links_text = "\n".join(raw_links[:30])
    user_prompt = f"""
    Here is the list of links on the website {url} -
    Please decide which of these are relevant web links for a brochure about the company,
    respond with the full https URL in JSON format.
    Do not include Terms of Service, Privacy, email links.

    Links:
    {links_text}
    """
    return user_prompt

#### User prompt to Create brochure

In [13]:
def get_brochure_user_prompt(company_name, url):
    print("üì• Fetching website content (this takes a moment)...")
    content = fetch_page_and_all_relevant_links(url)

    user_prompt = f"""
    You are looking at a company called: {company_name}
    Here are the contents of its landing page and other relevant pages;
    use this information to build a short brochure of the company in markdown without code blocks.\n\n
    {content[:10000]}
    """
    # (Limited to 10k chars to ensure we don't overflow context)
    return user_prompt

## 5.Content Get functions

### Page and all Links function

In [14]:
def fetch_page_and_all_relevant_links(url):
    # 1. Get Main Page
    contents = fetch_website_contents(url)

    # 2. Get Relevant Links
    relevant_links = select_relevant_links(url)

    result = f"## Landing Page:\n\n{contents}\n## Relevant Links:\n"

    # 3. Get Content of Relevant Links
    for link in relevant_links['links']:
        print(f"   Reading: {link['type']} ({link['url']})")
        link_content = fetch_website_contents(link["url"])
        result += f"\n\n### Link: {link['type']}\n{link_content}"

    return result

### Select relevant links from all links

In [15]:
import json

def select_relevant_links(url):
    print(f"üîç Analyzing links for {url}...")
    response = client.chat.completions.create(
        model=LINK_MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    try:
        links = json.loads(result)
        print(f"‚úÖ Found {len(links['links'])} relevant links.")
        return links
    except:
        print("Error parsing JSON")
        return {"links": []}

### 6.Brochure Generator function

In [16]:
from IPython.display import Markdown, display, update_display

def stream_brochure(company_name, url):
    print(f"üöÄ Generating brochure for {company_name}...")
    stream = client.chat.completions.create(
        model=WRITER_MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    response = ""
    display_handle = display(Markdown("Wait for it..."), display_id=True)
    for chunk in stream:
        content = chunk.choices[0].delta.content or ''
        response += content
        update_display(Markdown(response), display_id=display_handle.display_id)

## 7.Driver

In [17]:
# --- RUN IT! ---
# Use standard HuggingFace URL
stream_brochure("HuggingFace", "https://huggingface.co")

üöÄ Generating brochure for HuggingFace...
üì• Fetching website content (this takes a moment)...
Scraping: https://huggingface.co...
üîç Analyzing links for https://huggingface.co...
‚úÖ Found 6 relevant links.
   Reading: about page (https://huggingface.co/docs)
Scraping: https://huggingface.co/docs...
   Reading: blog page (https://huggingface.co/blog)
Scraping: https://huggingface.co/blog...
   Reading: careers page (https://huggingface.co/enterprise)
Scraping: https://huggingface.co/enterprise...
   Reading: pricing page (https://huggingface.co/pricing)
Scraping: https://huggingface.co/pricing...
   Reading: models page (https://huggingface.co/models)
Scraping: https://huggingface.co/models...
   Reading: learn page (https://huggingface.co/learn)
Scraping: https://huggingface.co/learn...


# Hugging Face: The AI Community Building the Future

Welcome to **Hugging Face**, a pioneering platform designed for collaboration in the machine learning community. Our mission is to empower developers, researchers, and organizations to build and share cutting-edge AI models, datasets, and applications. With over 2 million models and numerous powerful applications, we are at the forefront of AI innovation.

## Company Culture

At Hugging Face, we foster a collaborative and open community that encourages diversity, creativity, and innovation. Our culture is built on shared knowledge and a passion for pushing the boundaries of AI technology. We believe in the influence of collective intelligence, and that‚Äôs why we prioritize open-source contributions, welcoming individuals from all backgrounds to join us in shaping the future of machine learning. 

## Our Community

More than 50,000 organizations rely on Hugging Face to advance their projects, including tech giants such as Microsoft, Google, Amazon, and Meta. Our platform hosts a vibrant ecosystem where users can find and share information to accelerate their machine learning journey. The community-driven nature of our platform ensures continuous evolution, with a vast repository of state-of-the-art models and resources.

## Products and Services

- **Collaboration Platform**: Host and collaborate on unlimited public models and datasets.
- **Open Source Tools**: Access state-of-the-art libraries like Transformers for PyTorch and Diffusers.
- **Compute Solutions**: Paid options for enhanced computational power and enterprise-level support starting at just $20 per user per month.
- **Enterprise Features**: Advanced security and access controls for teams, tailored to meet the needs of organizations.

## Careers at Hugging Face

We are always on the lookout for passionate and innovative talent to join our growing team. At Hugging Face, you will have the opportunity to work with cutting-edge machine learning technologies and collaborate with industry experts. We value diversity and encourage applicants from all walks of life to apply and bring their unique perspectives to our community.

## Join Us

Ready to be a part of something groundbreaking? 

- **Explore AI Apps**  
- **Browse Models and Datasets**  
- **Contribute to Open Source**  
- **Grow your Career with Us**  

At Hugging Face, we believe the future of AI lies in collaboration and accessibility. Let‚Äôs build it together!