# A full business solution

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

### Inputs for our solution : company name and their primary website.
### Output : Formatted Brochure


In [40]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [41]:
# Initialize and constants

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [42]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we want to scrape, with internal links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content 
        soup = BeautifulSoup(self.body, 'html.parser') 
        self.title = soup.title.string if soup.title else "No title found" 
        
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose() 
            self.text = soup.body.get_text(separator="\n", strip=True) 
        else:
            self.text = ""
        
        #We want the links as well which are mentioned on the front page
        links = [link.get('href') for link in soup.find_all('a')] 
        self.links = [link for link in links if link] 

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [43]:
ed = Website("https://huggingface.co/")
print (ed.links)

['/', '/models', '/datasets', '/spaces', '/posts', '/docs', '/enterprise', '/pricing', '/login', '/join', '/deepseek-ai/DeepSeek-V3', '/deepseek-ai/DeepSeek-V3-Base', '/PowerInfer/SmallThinker-3B-Preview', '/black-forest-labs/FLUX.1-dev', '/hexgrad/Kokoro-82M', '/models', '/spaces/JeffreyXiang/TRELLIS', '/spaces/osanseviero/gemini-coder', '/spaces/lllyasviel/iclight-v2', '/spaces/Kwai-Kolors/Kolors-Virtual-Try-On', '/spaces/Qwen/QVQ-72B-preview', '/spaces', '/datasets/agibot-world/AgiBotWorld-Alpha', '/datasets/fka/awesome-chatgpt-prompts', '/datasets/PowerInfer/QWQ-LONGCOT-500K', '/datasets/HuggingFaceTB/finemath', '/datasets/O1-OPEN/OpenO1-SFT', '/datasets', '/join', '/pricing#endpoints', '/pricing#spaces', '/pricing', '/enterprise', '/enterprise', '/enterprise', '/enterprise', '/enterprise', '/enterprise', '/enterprise', '/allenai', '/facebook', '/amazon', '/google', '/Intel', '/microsoft', '/grammarly', '/Writer', '/docs/transformers', '/docs/diffusers', '/docs/safetensors', '/docs

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

Trying to code this without LLMs by parsing and analyzing the webpage would be hard!

In [44]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [45]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [46]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [47]:
get_links_user_prompt(ed)

'Here is the list of links on the website of https://huggingface.co/ - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.\nLinks (some might be relative links):\n/\n/models\n/datasets\n/spaces\n/posts\n/docs\n/enterprise\n/pricing\n/login\n/join\n/deepseek-ai/DeepSeek-V3\n/deepseek-ai/DeepSeek-V3-Base\n/PowerInfer/SmallThinker-3B-Preview\n/black-forest-labs/FLUX.1-dev\n/hexgrad/Kokoro-82M\n/models\n/spaces/JeffreyXiang/TRELLIS\n/spaces/osanseviero/gemini-coder\n/spaces/lllyasviel/iclight-v2\n/spaces/Kwai-Kolors/Kolors-Virtual-Try-On\n/spaces/Qwen/QVQ-72B-preview\n/spaces\n/datasets/agibot-world/AgiBotWorld-Alpha\n/datasets/fka/awesome-chatgpt-prompts\n/datasets/PowerInfer/QWQ-LONGCOT-500K\n/datasets/HuggingFaceTB/finemath\n/datasets/O1-OPEN/OpenO1-SFT\n/datasets\n/join\n/pricing#endpoints\n/pricing#spaces\n/pricing\n/enterprise\n/enterprise\n/enterpr

### Using OpenAI GPT-4o-mini

In [48]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [49]:
get_links("https://huggingface.co/")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'models page', 'url': 'https://huggingface.co/models'},
  {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'},
  {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'},
  {'type': 'docs page', 'url': 'https://huggingface.co/docs'},
  {'type': 'community page', 'url': 'https://discuss.huggingface.co'}]}

## Second step: Making the brochure!

Assemble all the details into another prompt to GPT4-o

In [50]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url) # All the associated links on the website in JSON format, calling API here
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents() 
        ## Getting contents from all the useful links, after filtering from LLM
    return result

In [51]:
# get_all_details("https://huggingface.co/")

In [52]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [53]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20_000] # Truncate if more than 20,000 characters
    return user_prompt

In [54]:
# get_brochure_user_prompt("HuggingFace", "https://huggingface.co/")

In [55]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [56]:
create_brochure("HuggingFace", "https://huggingface.co/")

# Hugging Face Company Brochure

### Building the Future of AI Together

**Website:** [Hugging Face](https://huggingface.co)

---

## About Us

Hugging Face is a transformative force in the field of artificial intelligence, creating a collaborative platform dedicated to advancing machine learning (ML). Our community is committed to democratizing access to cutting-edge technologies, enabling developers, researchers, and enterprises to contribute to and benefit from open-source AI innovations.

### Mission
**"Democratizing good machine learning, one commit at a time."**

---

## Our Offerings

- **Models:** Over **400k models** encompassing text, image, video, and audio, including state-of-the-art ML solutions for frameworks like PyTorch and TensorFlow.
- **Datasets:** Access to **100k+ datasets** curated for diverse tasks in NLP, computer vision, and audio generation.
- **Spaces:** A user-friendly way to deploy and showcase ML applications with **150k+ applications** already live.
- **Enterprise Solutions:** Customized, secure platforms for organizations to leverage AI, starting at **$20/user/month**.

### Popular Models
- deepseek-ai/DeepSeek-V3
- PowerInfer/SmallThinker-3B-Preview
- black-forest-labs/FLUX.1-dev

### Pricing Overview
- **HF Hub:** Free access for collaboration.
- **Pro Account:** Unlock advanced features for **$9/month**.
- **Enterprise Hub:** All capabilities plus enhanced security at **$20/user/month**.

---

## Company Culture

At Hugging Face, we foster a culture of **collaboration, innovation, and openness**. Our team consists of over **221 members**, driven by a shared passion for making advanced AI accessible to all. We encourage our members to share their work, collaborate on projects, and contribute to the AI community, creating a supportive and enriching environment.

### Diversity and Inclusion
We believe in the power of diverse talents and perspectives. Our initiatives to create an inclusive workplace ensure everyone can thrive and contribute to groundbreaking ML advancements.

---

## Join Us

We are always on the lookout for passionate individuals who share our vision. Explore current openings and join a community that values your growth and contributions:

- **Open Positions:** We regularly update our career page with exciting roles. Check out opportunities that match your skills and interests!

### Employee Benefits
- Work in a **dynamic environment** with flexible arrangements.
- **Professional development** opportunities, including training sessions and workshops.
- A chance to work with industry leaders and cutting-edge technology.

---

## Our Clients

Over **50,000 organizations** are leveraging Hugging Face's platform, including prominent names such as:
- **Google**
- **Microsoft**
- **Amazon Web Services**
- **Intel**
- **Meta AI**

Join a network of innovative companies shaping the AI landscape!

---

### Conclusion

At Hugging Face, we believe everyone should have the ability to participate in the AI revolution. Whether you’re a developer, researcher, or organization looking to integrate AI into your operations, we provide the tools, resources, and community support to help you succeed.

**Contact us today to learn more or to get started!**

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [None]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [None]:
stream_brochure("HuggingFace", "https://huggingface.co")