# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [4]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [5]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [6]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [7]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/10/16/from-soft

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [8]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [10]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [11]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/
https:/

In [12]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [13]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/blog/inference-providers',
 '/deepseek-ai/DeepSeek-R1',
 '/deepseek-ai/Janus-Pro-7B',
 '/deepseek-ai/DeepSeek-V3',
 '/hexgrad/Kokoro-82M',
 '/mistralai/Mistral-Small-24B-Instruct-2501',
 '/models',
 '/spaces/deepseek-ai/Janus-Pro-7B',
 '/spaces/tencent/Hunyuan3D-2',
 '/spaces/lllyasviel/iclight-v2',
 '/spaces/deepseek-ai/deepseek-vl2-small',
 '/spaces/ReverseImageSearch/Reverse-Image-Search2',
 '/spaces',
 '/datasets/open-thoughts/OpenThoughts-114k',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/cognitivecomputations/dolphin-r1',
 '/datasets/simplescaling/s1K',
 '/datasets/ServiceNow-AI/R1-Distill-SFT',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/micros

In [14]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community discussion page',
   'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [15]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [16]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community discussion page', 'url': 'https://discuss.huggingface.co'}, {'type': 'github page', 'url': 'https://github.com/huggingface'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'join page', 'url': 'https://huggingface.co/join'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
NEW
Welcome to Inference Providers on the Hub 🔥
smolagents - a smol library to build great agents
Use models from the HF Hub in LM Studio
The AI community building the futur

In [17]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [18]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [19]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'github page', 'url': 'https://github.com/huggingface'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface'}, {'type': 'twitter page', 'url': 'https://twitter.com/huggingface'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nNEW\nWelcome to Inference Providers on the Hub 🔥\nsmolagents - a smol library to build great agents\nUse models from the HF Hub in LM Studio\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nTrending on\nthis week\nModels\ndeepseek-ai/DeepSeek-R1\nUpdated\n7 days ago\n•\n2.15M\n•\n7.72k\ndeepseek-ai/Janus-Pro-7B\nUpdated\n7 days ago\n•\n324k\n•\n2.75k\ndeepseek-ai/DeepSeek-V3\nUpdated\n15 days ago\n•\n1.15M\n•\n3.29k\nhexgrad/Kokoro-82M\nUpdated\n7 days ago\n•\n248k\n•\n2.92k\nValueFX9507/Tifa-Deepsex-14b-CoT-GG

In [22]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [23]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'join page', 'url': 'https://huggingface.co/join'}, {'type': 'community discussion page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Company Brochure

## Welcome to Hugging Face! 
Hugging Face is the AI community driving innovation in machine learning. Our platform is a collaborative space where researchers and developers can create, share, and utilize models, datasets, and applications. Join us in shaping the future of AI!

---

## What We Offer

### Models
Explore our extensive library of over **1 million models**, updated weekly to ensure the latest in machine learning advancements. From deep learning to multimodal understanding, our models cater to all your AI needs.

### Datasets
Access and contribute to our collection of **250,000+ datasets**. These resources cover various domains, including text, audio, and computer vision tasks, allowing you to train models more effectively.

### Spaces
Discover and interact with **400,000+ applications** in our community. Whether it's generating images or text, our Spaces provide a platform for users to collaborate and innovate.

---

## Company Culture

At Hugging Face, we foster a vibrant and inclusive community where creativity and collaboration thrive. Our culture emphasizes open-source principles, encouraging transparency and knowledge sharing among our users. We're passionate about making machine learning accessible to everyone, from hobbyists to enterprises.

---

## Our Customers

More than **50,000 organizations** trust Hugging Face as their AI partner, including:

- **Meta**
- **Amazon Web Services**
- **Google**
- **Microsoft**
- **Grammarly**

We are committed to providing enterprise-grade solutions tailored to the diverse needs of these organizations, ensuring secure and efficient access to machine learning tools.

---

## Join Us - Careers at Hugging Face

We are constantly looking for talented individuals to join our team. Whether you are a developer, researcher, or business professional, there are opportunities for you at Hugging Face. Our team is dedicated to pushing boundaries in AI while cultivating a supportive work environment.

### Current Open Positions:

- Machine Learning Engineer
- Data Scientist
- Software Developer
- Community Manager

Explore our [Careers Page](https://huggingface.co/jobs) for more details and to apply.

---

## Get Started Today!

Ready to dive into the world of machine learning? Whether you're looking to leverage our models for your projects, explore our datasets, or join our team, **Hugging Face** is here to support you every step of the way. 

### Sign Up Now!

Visit our website to get started:
[Hugging Face - The AI community building the future.](https://huggingface.co)

--- 

For more information, don't hesitate to reach out! We’re excited to see what you can create with Hugging Face!

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [24]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [25]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'company LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}



# Hugging Face Brochure

## Welcome to Hugging Face
### The AI Community Building the Future

At Hugging Face, we are pioneering the development of cutting-edge AI technology and fostering a community where machine learning enthusiasts, researchers, and enterprises can collaborate. Our platform empowers users to access a vast array of **models**, **datasets**, and **applications**, transforming the landscape of machine learning.

## What We Offer

- **Models**: Access over **1 million models** for a variety of tasks including natural language processing, computer vision, and audio analysis. Our platform showcases trending models like *DeepSeek-R1* and *Janus-Pro-7B*.
  
- **Datasets**: Browse **250,000+ datasets** geared towards enhancing machine learning capabilities. Engage with popular datasets such as *OpenThoughts* and *Dolphin-R1*.

- **Spaces**: Create and explore interactive applications. Currently hosting 400,000+ applications, enabling users to build seamlessly in a collaborative environment.

- **Enterprise Solutions**: Our enterprise service provides high-grade security, dedicated support, and optimized solutions for organizations looking to implement AI effectively. Trusted by over **50,000 organizations** including industry leaders such as Google, Microsoft, and AWS.

## Company Culture
At Hugging Face, we foster a **collaborative, innovative, and community-driven culture**. We believe that diversity of thought and experience drives excellence in AI. Our team is passionate about creating an inclusive environment where everyone can express their ideas and creativity. Emphasizing openness, we encourage contributions from the community, promoting the continuous learning and sharing of knowledge within the field.

## Careers at Hugging Face
We are always on the lookout for talented individuals who are enthusiastic about artificial intelligence and machine learning. Whether you are a developer, researcher, or a specialist in data science, there are numerous opportunities to join our team and work on pioneering projects.

- **Open Positions**: Explore our **Jobs** page for the latest openings and become part of a team that is making a profound impact in the AI community.

## Join Us
### Collaborate, Create, and Innovate
- **Sign Up**: Join the Hugging Face community to access tools, models, and resources to advance your AI projects.
- **Connect with Us**: Follow us on [Twitter](https://twitter.com/huggingface), [LinkedIn](https://www.linkedin.com/company/huggingface/), and [GitHub](https://github.com/huggingface) to stay updated with our latest developments and contributions.

**Hugging Face** is dedicated to building the foundations of machine learning with the community. Together, we can shape the future of AI.



In [26]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}


# Hugging Face Brochure

## Welcome to Hugging Face
**The AI community building the future.**

At Hugging Face, we are proud to be at the forefront of machine learning innovation. We offer a collaborative platform where the machine learning community can create, discover, and contribute to a vast array of models, datasets, and applications. With over **1 million models** and **250,000 datasets**, our platform serves as the home for researchers, developers, and AI enthusiasts alike.

---

## Company Overview
Hugging Face is more than just a technology company; it’s a thriving community. We believe in the power of open-source collaboration, which is why we have created tools and resources that everyone can access. Our mission is to democratize AI, making it available and effective for everyone.

---

## Culture at Hugging Face
- **Collaboration:** Our platform thrives on community contributions, encouraging collaboration through shared models and datasets.
- **Innovation:** We are committed to pushing the boundaries of what's possible in AI and machine learning.
- **Supportive Environment:** Hugging Face fosters a culture where all team members are encouraged to share ideas, grow, and enhance their skills.

---

## Our Customers
We cater to a wide range of clients—from startups to Fortune 500 companies. Over **50,000 organizations** leverage Hugging Face for their AI needs, including industry giants such as:
- Google
- Microsoft
- Amazon Web Services
- Meta
- Grammarly

Our enterprise solutions equip these organizations with cutting-edge tools for efficient machine learning implementation.

---

## Career Opportunities
Join our dynamic team at Hugging Face! We offer exciting career opportunities in various fields, including:
- Machine Learning Engineering
- Data Science
- Software Development
- Community Engagement

At Hugging Face, we value diversity and strive to create an environment where everyone feels welcomed and empowered to contribute.

---

## Why Choose Hugging Face?
- **Open Source Focus:** Contribute to or utilize state-of-the-art tools such as Transformers, Diffusers, and more.
- **Robust Training & Resources:** Access a wealth of documentation, tutorials, and community forums to enhance your knowledge and skills.
- **Enterprise Solutions:** Take advantage of our Compute and specialized enterprise-grade security features to ensure your projects are backed by the best technology.

---

For more information or to explore opportunities with us, visit our [website](https://huggingface.co).

**Join the Hugging Face community and help us build the future of AI!**