# Day 5 Project: Brochure Builder

Now we will take our project from Day 1 to the next level

## BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

### Imports

In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

### OpenAI Setup and Initialize

In [3]:
# Load environment variables from env

load_dotenv(override=True)
api_key = os.environ['OPENAI_API_KEY']

# Check key
if not api_key:
    print("No API Key found.")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start with sk-proj-;")
elif api_key.strip() != api_key:
    print("An API key was found, but was a space or tab.")
else:
    print("API key found and looks good so far!")


MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key found and looks good so far!


### Class to Represent Webpage (w Links)

In [4]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """A Utility class to represent website that we have scraped, now w links"""

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["sccript", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [5]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'ht

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

### Prompts - System & User

#### Build System Prompts

In [6]:
link_system_prompt = "You are provided with a list of links found on a webpage." \
"You are able to decide which of the links would be most relevant to include in a brochure about the company," \
"such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [7]:
print(link_system_prompt)

You are provided with a list of links found on a webpage.You are able to decide which of the links would be most relevant to include in a brochure about the company,such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



#### Build User Prompts

In [10]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company. "
    user_prompt += "Respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links. \n"
    user_prompt += "Links (some might be relative links): \n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [11]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company. Respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links. 
Links (some might be relative links): 
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2024/12/2

### Function to build call to OpenAI

In [12]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": link_system_prompt
            },
            {
                "role": "user",
                "content": get_links_user_prompt(website)
            }
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [13]:
huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/agentica-org/DeepCoder-14B-Preview',
 '/HiDream-ai/HiDream-I1-Full',
 '/moonshotai/Kimi-VL-A3B-Thinking',
 '/meta-llama/Llama-4-Scout-17B-16E-Instruct',
 '/deepseek-ai/DeepSeek-V3-0324',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/jamesliu1217/EasyControl_Ghibli',
 '/spaces/bytedance-research/UNO-FLUX',
 '/spaces/Efficient-Large-Model/SanaSprint',
 '/spaces/HiDream-ai/HiDream-I1-Dev',
 '/spaces',
 '/datasets/nvidia/OpenCodeReasoning',
 '/datasets/openai/mrcr',
 '/datasets/agentica-org/DeepCoder-Preview-Dataset',
 '/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset',
 '/datasets/divaroffical/real_estate_ads',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',


In [14]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community forum', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second Step: Make the Brochure

Assemble all the details into another prompt for GPT4-o

### Function to Get Contents from All Links

In [None]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    # get relevant links from openai
    links = get_links(url)
    print("Found links:", links)
    # get contents from each link from relevant links
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [16]:
hug_url = "https://huggingface.co"

In [17]:
print(get_all_details(hug_url))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'models page', 'url': 'https://huggingface.co/models'}, {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'}, {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'}, {'type': 'contact page', 'url': 'https://huggingface.co/join'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending

### Prompts - Brochure

#### Build System Prompts

In [18]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
 and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
 Include details of company culture, customers and careers/jobs if you have the information."

#### Build User Prompts

In [19]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += "Here are the contents of its landing page and other relevant pages;"
    user_prompt += "use this information to build a short brochure of the company in markdown. \n"
    user_prompt += get_all_details(url)
    user_prompt += user_prompt[:5_000] # Truncate of more than 5,000 characters
    return user_prompt

In [20]:
get_brochure_user_prompt("HuggingFace", hug_url)

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages;use this information to build a short brochure of the company in markdown. \nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nagentica-org/DeepCoder-14B-Preview\nUpdated\n6 days ago\n•\n12.7k\n•\n521\nHiDream-ai/HiDream-I1-Full\nUpdated\n2 days ago\n•\n16k\n•\n437\nmoonshotai/Kimi-VL-A3B-Thinking\nUpdated\n1 day ago\n•\n10.2k\n•\n303\nmeta-llama/Llama-4-Scout-17B-16E-Instruct\nUpdated\n6 days ago\n•\n657k\n•\n779\ndeepseek-ai/DeepSeek-V3-0324\nUpdated\n20 days ago\n•\n221k\n•\n2.6k\nBrowse 1M+ models\nSpaces\nRunning

### Function to Build Call to Open AI for Brochure

In [21]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": get_brochure_user_prompt(company_name, url)
            }
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [22]:
create_brochure("HuggingFace", hug_url)

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}]}


```markdown
# 📢 Welcome to Hugging Face! 🤗

## The AI Community Building the Future!

Are you ready to hug robots, models, and datasets? At Hugging Face, we’re crafting the future of AI with more excitement than a cat in a laser pointer factory! 🎉

---

### 🛠️ What's in the Box?
- **1M+ Models:** For every task you can think of (and some you can’t)!
- **250k+ Datasets:** Because what's AI without good data? 
- **Innovative Spaces:** Where creativity meets computational power!

### 🚀 Trending This Week:
- **DeepCoder-14B:** Matched with just a hint of sass. 
- **Kimi-VL-A3B-Thinking:** For when you need your model to ponder life's mysteries...
- **Llama-4-Scout:** Not just any llama, but the **Llama 4 Scout** 🌟

---

### 🎉 Join the Community!
Don’t just dip a toe in the AI pool, cannonball into the Hugging Face community! We've got **entries, exits, and plenty of debates about whether pineapple belongs on pizza**... and of course, all things machine learning. 

#### 🌍 Who's Using Us?
From **Meta** to **Amazon**, over **50,000 organizations** have hopped on our hug train. Don’t miss your chance to join the ranks of industry giants - now that's a cozy hug! 

---

### 🚪 Careers at Hugging Face
If you're passionate about demystifying AI and making it accessible (and a splash of fun), we want you! 🌈

#### Open Positions:
- **Machine Learning Engineers**
- **Data Scientists**
- **Robotics Enthusiasts** (yes, we’re selling open-source robots now 🤖)

**Why work for us?** Because who wouldn’t love getting paid to talk to robots and machine learning models all day? 💼 

---

### 🤔 Company Culture
At Hugging Face, we   
- **Share More than Just Hugs:** Collaboration is our middle name – well, it’s actually *AI*, but you get the picture!
- **Savor Snacks:** Coffee may fuel our creativity, but snacks are where the real power lies. 
- **Foster Innovation:** Get comfortable with failure; it’s just the foundation of future success!

---

### 🤩 Explore More!
Check out our 🤗 [Blog](https://huggingface.co/blog) to learn how AI can take things to new heights. Whether it’s ethics, novel applications, or just good ol’ community fun, we’ve got posts that will tickle your brain.

### 🎈 Ready for a Hug?
**Sign Up Now** and join the Hugging Face adventure! Whether you're a user, developer, or a future colleague, let's make AI open and friendly together. 🌍💕

---

# Let’s go make some magic in AI: one hug at a time!
```


## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [None]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": get_brochure_user_prompt(company_name, url)
            }
        ],
        stream=True
    )

    # To stream text only without markdown, use this instead of everything active below this:
    # for chunk in stream:
    #     print(chunk.choices[0].delta.content or '', end='')

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [24]:
stream_brochure("HuggingFace", hug_url)

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'github page', 'url': 'https://github.com/huggingface'}, {'type': 'twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}



# Welcome to the Hugging Face Spectacle! 🎉

![Hugging Face](https://huggingface.co/logo.png)

**The AI Community Building the Future**  
At Hugging Face, we believe in a future where artificial intelligence gives you more than just a witty chatbot. Here, we're working together to unleash AI’s full potential – even if that means explaining the joke different ways for the 100th time.

---

## 🤖 What We Do

Think of us as the cozy corner café of machine learning, but instead of lattes, we serve up **1 million+ models** and **250k datasets** for your picking. Whether you're looking to generate text, images, or even 3D masterpieces, we've got just the model for you!

- **Models?** Oh, yeah! (We've got the agentica-org/DeepCoder-14B-Preview, with 12.7k downloads.)
- **Amazing Datasets?** Naturally! (Check out our nvidia/OpenCodeReasoning – only 5.7k visitors last hour!)
- **Spaces for Creativity?** We've got spaces like no other to let your app ideas run wild! 🌌 

---

## 🥳 Our Stellar Community

We're not just a company; we're more like a community of machine learning wizards (and some mighty fine wizards at that). More than **50,000 organizations**—including the titans at Google, Microsoft, and Amazon—have decided to join our adventure. Together, we tackle challenges with a sprinkle of positive vibes.

### Featured Customers:
- **AI2**: Not just a non-profit; they're practically superheroes.
- **Meta & Google**: They follow our models like the rest of us follow cat memes.
- **Grammarly**: Because who doesn't want perfect grammar alongside AI innovations?

---

## 🌟 Company Culture

### Weird and Wonderful
Our culture is vibrant, diverse, and a little quirky—kind of like a cactus wearing a sombrero. We encourage creativity, collaboration, and maybe a little harmless chaos. I mean, if you manage to make a robot sing show tunes, you've officially made it.

### Opportunities Await!
- Want to work with data? We’ve got AI roles that would make your heart race more than a caffeinated squirrel on roller-skates!
- Are you passionate about open-source? Join a platoon of tech-savvy volunteers pushing boundaries together. Check out our **current openings**!

---

## 🎤 Join Us or Just Say Hi!

Want to know more about how we’re turning future tech into a community-driven reality? Whether you're looking to snag a career with us, invest in groundbreaking tech, or simply want to talk shop, head on over to [Hugging Face Careers](https://huggingface.co/careers) or shoot us a tweet on [Twitter](https://twitter.com/huggingface). 

Join us in making the world a smarter place, one Hugging Face at a time! 

---

### Our Motto: 
> *"Good vibes, great models, and endless possibilities!"*

Let’s get your AI dreams off the ground! 🚀✨

