# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 '

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [8]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edward

In [9]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [10]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/tencent/SRPO',
 '/baidu/ERNIE-4.5-21B-A3B-Thinking',
 '/tencent/HunyuanImage-2.1',
 '/Qwen/Qwen3-Next-80B-A3B-Instruct',
 '/Qwen/Qwen3-Next-80B-A3B-Thinking',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/zerogpu-aoti/wan2-2-fp8da-aoti-faster',
 '/spaces/multimodalart/wan-2-2-first-last-frame',
 '/spaces/tencent/HunyuanImage-2.1',
 '/spaces/IndexTeam/IndexTTS-2-Demo',
 '/spaces',
 '/datasets/HuggingFaceFW/finepdfs',
 '/datasets/HuggingFaceM4/FineVision',
 '/datasets/LucasFang/FLUX-Reason-6M',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/Josephgflowers/Finance-Instruct-500k',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/

In [11]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'company page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community page', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [12]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [13]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
tencent/SRPO
Updated
1 day ago
•
3.61k
•
756
baidu/ERNIE-4.5-21B-A3B-Thinking
Updated
5 days ago
•
112k
•
706
tencent/HunyuanImage-2.1
Updated
4 days ago
•
5.59k
•
600
Qwen/Qwen3-Next-80B-A3B-Instruct
Updated
1 day ago
•
305k
•
556
Qwen/Qwen3-Next-80B-A3B-Thinking
Updated
1 day ago
•
160k
•
337
Browse 1M+ m

In [14]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [15]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [16]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\ntencent/SRPO\nUpdated\n1 day ago\n•\n3.61k\n•\n756\nbaidu/ERNIE-4.5-21B-A3B-Thinking\nUpdated\n5 days ago\n•\n112k\n•\n706\ntencent/HunyuanImage-2.1\nUpdated\n4 days ago\n•\n5.59k\n•\n600\nQwen/Qwen3-Next-80B-A3B-Instruct\nUpdated\n1 day ago\n•\n305k\n•\n556\nQwen/Qwen3-Next-80B-A3B-Thinking\nUpdated\n1 day ago\n•\n160k\n•\n337\nBrowse 1M+ models\nSpaces\nRunning\n13.5k\n13.5k\nDeepSite v2

In [17]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [18]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'homepage', 'url': 'https://huggingface.co'}, {'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## About Us
**Hugging Face** is at the forefront of the AI revolution, dedicated to building a collaborative community that shapes the future of machine learning. Our platform unites developers, researchers, and organizations to create, share, and improve upon models, datasets, and applications.

## Our Offerings
- **Models & Datasets:** Explore over **1 million models** and **250,000 datasets** to discover state-of-the-art tools in AI.
- **Spaces:** Run **400,000+ AI applications** in an optimized environment.
- **Enterprise Solutions:** Tailored services with enterprise-grade security and dedicated support starting at $20/user/month.

## Community
With a strong commitment to open-source collaboration, Hugging Face powers an ecosystem embraced by over **50,000 organizations**, including industry giants like Amazon, Google, and Microsoft. Our community-driven approach ensures that everyone, from independent developers to large enterprises, can contribute and benefit from state-of-the-art machine learning resources.

## Careers
Join our dynamic team! Hugging Face prioritizes innovation, creativity, and diversity, fostering a workplace culture where everyone can thrive. We offer various positions compatible with different career paths in AI and tech, encouraging candidates who are passionate about making a difference in the AI community.

## Company Culture
At Hugging Face, we pride ourselves on our inclusive and collaborative culture. We believe in:
- **Innovation:** Encouraging creative solutions in machine learning.
- **Community Engagement:** Building an open environment where sharing knowledge is critical.
- **Work-Life Balance:** Supporting our team in maintaining a healthy work-life integration.

## Get Involved
Explore our platform to collaborate on models and applications, contribute to datasets, or simply engage with the community. If you're interested in joining our team or learning more about our services, visit our website or reach out to us.

**Discover the future of AI with Hugging Face—where collaboration leads to innovation!** 

[Visit Us](https://huggingface.co)

---

© 2023 Hugging Face. All rights reserved.

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [19]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [20]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community discussion page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Company Brochure

## **About Us**
Welcome to Hugging Face, the premier AI community dedicated to building the future of artificial intelligence. We provide an open-source platform that facilitates collaboration among machine learning enthusiasts, researchers, and professionals across the globe.

---

## **What We Offer**

- **Models**: Explore over **1 million** machine learning models, including cutting-edge solutions for various needs such as natural language processing, image generation, and more.
  
- **Datasets**: Access an extensive library of **250,000+ datasets** for every machine learning task imaginable. Collaborate with the community to create and fine-tune datasets that drive innovation.

- **Spaces**: Build and showcase **applications** using Hugging Face’s robust infrastructure. Utilize our cloud-based environment for infinite possibilities in AI.

- **Enterprise Solutions**: Our business offerings start at **$20/user/month**, tailored for teams needing enterprise-grade support, security, and access to optimized compute resources.

---

## **Our Community**
Join a vibrant community of **over 50,000 organizations**, including leading names like Google, Microsoft, and Amazon. Collaborate, share knowledge, and contribute to a rich ecosystem that is shaping the future of machine learning.

---

## **Company Culture**
At Hugging Face, we uphold values that prioritize collaboration, openness, and innovation. Our culture fosters creativity and encourages our members to share their work and ideas freely. With an emphasis on teamwork and community engagement, we empower individuals to contribute to the collective growth of AI tools and solutions.

---

## **Careers at Hugging Face**
We are always on the lookout for passionate, talented individuals to join our team. Whether you’re a machine learning engineer, community manager, or developer, we offer opportunities for you to grow and make a significant impact. Visit our careers page to explore current job openings and become a part of our mission to advance AI by fostering a collaborative environment.

---

## **Join Us**
Ready to be part of an exciting journey? Sign up today at [Hugging Face](https://huggingface.co) to dive into the world of collaboration and innovation in AI. Whether you're here to build, learn, or share, the future of AI awaits you!

---

**Contact Us**:  
[Twitter](https://twitter.com/huggingface) | [LinkedIn](https://linkedin.com/company/huggingface) | [Discord](https://discord.com/invite/huggingface)  
**Follow our progress, engage in discussions, and unlock the potential of AI with us!**

---

*Hugging Face – The AI community building the future.*

In [21]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## Welcome to Hugging Face

### The AI Community Building the Future

Hugging Face is at the forefront of the artificial intelligence (AI) revolution. We provide a collaborative platform where the machine learning community can create, discover, and share cutting-edge models, datasets, and applications.

### Our Offerings
- **Models**: Explore over **1 million** models created by our vibrant community, including state-of-the-art architectures for various AI tasks.
- **Datasets**: Access **250,000+ datasets** to power your AI projects.
- **Spaces**: Engage with **400,000+ applications** built on our community-driven platform.
- **Enterprise Solutions**: Tailored compute and support solutions for organizations looking to scale their AI efforts.

### Notable Customers
More than **50,000 organizations** globally trust Hugging Face, including:

- **Amazon**
- **Google**
- **Microsoft**
- **Meta**
- **Intel**

### Technology Highlights
Our platform leverages an open-source stack designed for collaboration, including:
- **Transformers**: A library hosting hundreds of thousands of models for PyTorch.
- **Diffusers**: Advanced diffusion models tailored for various ML tasks.
- **Tokenizers**: Fast and efficient tokenization solutions.

Explore our tools and build your ML portfolio with ease.

### Company Culture
At Hugging Face, we foster a culture of collaboration, transparency, and community engagement. Our team is diverse and passionate about machine learning, encouraging creativity and innovation. We believe in the power of sharing knowledge and supporting one another to drive the AI landscape forward.

### Career Opportunities
Join our talented team of innovators. We are always on the lookout for dedicated individuals who want to contribute to the future of AI. If you are passionate about machine learning and technology, check out our careers page for open positions.

### Join Us on Our Journey
Become part of a thriving community dedicated to shaping the future of AI. Sign up today and start collaborating with industry leaders and fellow enthusiasts. 

### Connect with Us
For more information, visit our website at [huggingface.co](https://huggingface.co) or follow us on social media:
- GitHub
- Twitter
- LinkedIn
- Discord

---

Let’s build the future of artificial intelligence together!

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.

This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.

Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">A reminder on 3 useful resources</h2>
            <span style="color:#f71;">1. The resources for the course are available <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">here.</a><br/>
            2. I'm on LinkedIn <a href="https://www.linkedin.com/in/eddonner/">here</a> and I love connecting with people taking the course!<br/>
            3. I'm trying out X/Twitter and I'm at <a href="https://x.com/edwarddonner">@edwarddonner<a> and hoping people will teach me how it's done..  
            </span>
        </td>
    </tr>
</table>