# Brochure establishment method using LLM model

## BUSINESS CHALLENGE:
Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

## Import libraries

In [15]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

## Initialize environment

In [16]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key found!")
else:
    print("There's a problem with API key!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key found!


In [17]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

## Class of Website Scraper

In [18]:
class Website:
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found."
        if soup.body:
            for irr in soup.body(["script", "style", "img", "input"]):
                irr.decompose()
            
            self.text = soup.body.get_text(separator="\n", strip=True)
            
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]
    
    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [19]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 '

## Using GPT-4o mini figure out whichs links are relevant
It should decide which links are relevant, and replace relative links such as `/about` with "https://company.com/about".
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

In [20]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [21]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



Get links for user prompt

In [22]:
def get_link_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [23]:
print(get_link_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edward

### Get links for user prompt

In [24]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role":"system", "content":link_system_prompt},
            {"role":"user", "content":get_link_user_prompt(website=website)}
        ],
        response_format={"type":"json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)
    

In [25]:
huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/tencent/SRPO',
 '/Qwen/Qwen3-Next-80B-A3B-Instruct',
 '/baidu/ERNIE-4.5-21B-A3B-Thinking',
 '/Qwen/Qwen3-Next-80B-A3B-Thinking',
 '/google/vaultgemma-1b',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/zerogpu-aoti/wan2-2-fp8da-aoti-faster',
 '/spaces/multimodalart/wan-2-2-first-last-frame',
 '/spaces/IndexTeam/IndexTTS-2-Demo',
 '/spaces/tencent/HunyuanImage-2.1',
 '/spaces',
 '/datasets/HuggingFaceFW/finepdfs',
 '/datasets/HuggingFaceM4/FineVision',
 '/datasets/LucasFang/FLUX-Reason-6M',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/InternRobotics/OmniWorld',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/W

In [26]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'company page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'join page', 'url': 'https://huggingface.co/join'}]}

## Make the brochure

### Build a function to read details of website and links

In [27]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print(f"Found links: ", links)
    for link in links['links']:
        result += f"\n\n{link['type']}\n"
        result += Website(link['url']).get_contents()
        
    return result

In [28]:
print(get_all_details("https://huggingface.co"))

Found links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company blog', 'url': 'https://huggingface.co/blog'}, {'type': 'social media', 'url': 'https://twitter.com/huggingface'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
tencent/SRPO
Updated
2 days ago
•
3.61k
•
788
Qwen/Qwen3-Next-80B-A3B-Instruct
Updated
about 3 hours ago
•
305k
•
614
baidu/ERNIE-4.5-21B-A3B-Thinking
Updated
about 6 hours ago
•
112k
•
719
Qwen/Qwen3-Next-80B-A3B-Thinking


### System prompt for reading website

In [29]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [30]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [31]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\ntencent/SRPO\nUpdated\n2 days ago\n•\n3.61k\n•\n788\nQwen/Qwen3-Next-80B-A3B-Instruct\nUpdated\nabout 3 hours ago\n•\n305k\n•\n614\nbaidu/ERNIE-4.5-21B-A3B-Thinking\nUpdated\nabout 6 hours ago\n•\n112k\n•\n719\nQwen/Qwen3-Next-80B-A3B-Thinking\nUpdated\n2 days ago\n•\n160k\n•\n351\ngoogle/vaultgemma-1b\nUpdated\n5 days ago\n•\n1.28k\n•\n313\nBrowse 1M+ models\nSpaces\nRunning\n13.6k\n13.6k

## Create brochure with function and prompts

In [32]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [33]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}


```markdown
# Hugging Face Brochure

## Welcome to Hugging Face
### The AI Community Building the Future

Hugging Face is a collaborative platform at the forefront of artificial intelligence and machine learning (AI/ML) innovation. Our mission is to empower machine learning enthusiasts, researchers, and enterprises by providing access to cutting-edge models, datasets, and applications that transform how we interact with technology.

## Who We Are
At Hugging Face, we are more than just a technology company; we are a vibrant community of over 50,000 organizations, including notable names like Meta, Google, Amazon, and Microsoft. Our platform encourages collaboration, where users can create, discover, and share resources—fostering transparency and an open-source culture.

## What We Offer
- **Models:** Explore our extensive library of **1M+ models** including state-of-the-art AI tools for text, image, video, and audio modalities.
- **Datasets:** Access **250k+ datasets** tailored for a variety of machine learning tasks.
- **Spaces:** Engage with a variety of applications, ranging from deep learning to natural language processing, making it easy to build and deploy machine learning projects.

## Company Culture
At Hugging Face, our culture is built on collaboration, respect, and innovation. We believe in providing an inclusive environment for our team members and the larger AI community:
- **Open Source Philosophy:** We contribute to democratizing AI by enabling the sharing and development of ML tooling.
- **Community-Driven:** Our platform is designed for collaboration, where users can share knowledge, best practices, and support each other in their ML journeys.
- **Innovation:** Constantly pushing boundaries, we encourage experimentation and the development of new ideas.

## Careers at Hugging Face
Join a team of passionate individuals who are dedicated to shaping the future of artificial intelligence. We are always looking for talented and enthusiastic professionals to fill various roles in engineering, data science, and community engagement. Opportunities at Hugging Face include:
- Competitive salaries and benefits
- An inclusive work environment that values diversity
- Opportunities for professional development and growth

If you are interested in joining our mission, visit our [Jobs Page](https://huggingface.co/jobs).

## Connect With Us
Feel free to explore our offerings and join our community! For further information about our services or to start your journey with Hugging Face, visit our [Website](https://huggingface.co).

### Stay Updated
Follow us on social media to stay informed about the latest developments and community updates:
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://www.linkedin.com/company/hugging-face/)
- [GitHub](https://github.com/huggingface)
- [Discord](https://discord.gg/huggingface)

### Conclusion 
Hugging Face is not just about technology; it’s about building a future where everyone can contribute to and benefit from AI. Come join our thriving community today!

---

**Hugging Face – Your partner in AI/ML for today and tomorrow!**
```

In [34]:
## Try another company
create_brochure("University of Technology - VNU-HCM", "https://hcmut.edu.vn")

Found links:  {'links': [{'type': 'about page', 'url': 'https://hcmut.edu.vn/about'}, {'type': 'careers page', 'url': 'https://hcmut.edu.vn/careers'}]}


# University of Technology - VNU-HCM

## Welcome to ĐHBK HCM

The University of Technology (ĐHBK HCM), part of the Vietnam National University - Ho Chi Minh City (VNU-HCM), is a premier institution of higher education, dedicated to advancing knowledge and fostering innovation in technology and engineering.

### About Us

At ĐHBK HCM, we are committed to providing high-quality education and research opportunities that equip our students with the skills and competencies necessary to excel in the competitive global landscape. Our state-of-the-art facilities and diverse academic programs attract talented students from across Vietnam and beyond. With a strong emphasis on practical skills and research, we prepare our graduates to meet the demands of the fast-evolving job market.

### Company Culture

Our culture is built on collaboration, creativity, and a relentless pursuit of excellence. We pride ourselves on fostering an inclusive environment that encourages diverse perspectives and innovative thinking. Our faculty and staff are dedicated to mentoring our students, ensuring they are not only academically prepared but also ready to contribute positively to society.

### Customers and Community Engagement

We serve a diverse student body, ranging from local to international students. Our collaborations with industries and research institutions enhance our educational offerings and provide our students with real-world experiences. We actively engage with the community, working on projects and initiatives that address local challenges and contribute to the region's development.

### Careers at ĐHBK HCM

Join us in shaping the future! We are always on the lookout for passionate educators, researchers, and administrative professionals who share our vision of educational excellence and innovation. Here, you will find a supportive environment that values professional growth and development.

If you are interested in pursuing a fulfilling career with us, visit our careers page for current job openings and application details.

---

Join the University of Technology - VNU-HCM, where education meets innovation! For more information, visit our website [here](#).

In [None]:
#With a small adjustment, we can change this so that the results stream back from OpenAI, with the familiar typewriter animation
def stream_brochure(company, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company, url)}
        ],
        stream = True
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or '' # Append the new content
        response = response.replace("```","").replace("markdown", "") # We need to strip out any markdown code blocks to show full content
        update_display(Markdown(response), display_id=display_handle.display_id)

In [37]:
stream_brochure("University of Technology - VNU-HCM", "https://hcmut.edu.vn")

Found links:  {'links': [{'type': 'about page', 'url': 'https://hcmut.edu.vn/about'}, {'type': 'careers page', 'url': 'https://hcmut.edu.vn/careers'}, {'type': 'company page', 'url': 'https://hcmut.edu.vn/company'}]}



# University of Technology - VNU-HCM

## About Us
The University of Technology - VNU-HCM, known in Vietnamese as Trường Đại học Bách khoa ĐHQG HCM, is a leading educational institution in Vietnam that focuses on engineering and technology. Established with a vision to advance the country’s technological capabilities and workforce, our university offers a wide range of programs emphasizing innovation, practical experience, and research excellence.

Our commitment to higher education is reflected in our state-of-the-art facilities, dedicated faculty, and a vibrant student community that encourages collaboration and knowledge exchange. 

## Company Culture
At VNU-HCM, we cultivate a culture of creativity, inclusiveness, and integrity. We thrive on collaboration and continuous improvement, ensuring that both our faculty and students can excel in their pursuits. Our approach empowers individuals to think critically and embrace challenges, preparing them for the demands of the rapidly evolving technological landscape.

We believe in fostering relationships within our community, which includes current students, alumni, and industry partners—enhancing both academic growth and career opportunities.

## Our Customers
Our primary customers include students, parents, educational partners, and the broader community. We serve students aspiring to gain knowledge and skills in various fields of engineering and technology. Our collaboration with businesses highlights our commitment to aligning our educational programs with industry needs, ensuring that our graduates are ready to contribute effectively in their chosen fields.

## Careers & Opportunities
VNU-HCM is not just a place of learning; it’s a dynamic environment for educators and support staff as well. We invite passionate individuals to be part of our mission. Openings are available across various departments, from teaching positions in engineering and technology to administrative roles that support our academic endeavors.

We value diversity and inclusivity in our hiring practices and strive to build a workforce that reflects the vibrant community we serve. Joining our team means being part of a transformative educational experience that shapes the future of technology in Vietnam.

## Join Us
Whether you are a prospective student looking to expand your horizons, an investor eager to partner in innovation, or a recruitment interested in shaping the leaders of tomorrow, University of Technology - VNU-HCM welcomes you to be part of our journey. Together, we can inspire the next generation of innovators and change-makers!

For more information, please visit our [website](#).

--- 
*This brochure aims to encapsulate the essence of the University of Technology - VNU-HCM, emphasizing our mission, culture, and the opportunities we offer. We look forward to welcoming you!*


In [38]:
stream_brochure("Công ty CP Thế giới di động", url="https://thegioididong.com")

Found links:  {'links': [{'type': 'home page', 'url': 'https://www.thegioididong.com'}, {'type': 'history of purchases page', 'url': 'https://www.thegioididong.com/lich-su-mua-hang'}, {'type': 'careers page', 'url': 'https://www.thegioididong.com/tin-tuc'}, {'type': 'online shopping page', 'url': 'https://www.thegioididong.com/mua-online-gia-re'}]}


# Brochure: Công ty CP Thế giới di động

---

## Company Overview

Công ty CP Thế giới di động, operating under the brand **Thegioididong.com**, is a leading retailer in Vietnam specializing in a wide variety of consumer electronics, including mobile phones, laptops, tablets, and related accessories. With a commitment to quality and customer satisfaction, our company offers products from top global brands like **Apple**, **Samsung**, **JBL**, and **Anker**.

---

## Product Range

At **Thegioididong.com**, we pride ourselves on providing an extensive selection of products tailored to meet the needs of modern consumers. Our product categories include:

- **Mobile Phones**: The latest models from renowned brands.
- **Laptops**: A wide array of choices for personal and professional use.
- **Accessories**: Mobile and laptop accessories, including cases, chargers, and audio devices.
- **Smartwatches**: Stylish and functional wearables.
- **Refurbished Devices**: High-quality pre-owned phones and laptops with great value.
- **Utility Services**: Bill payments, insurance, and even travel eSim purchases.

---

## Customer Experience

We aim to create a seamless shopping experience by offering services tailored to our customer locations. Promos and pricing are dynamically adjusted according to the customer’s locality, ensuring the best deals. Our online platform allows users to easily navigate and shop with confidence, while our in-store experience is designed to provide hands-on assistance to every shopper.

---

## Company Culture

At **Thế giới di động**, our culture is built on valuing our employees and fostering their growth within a collaborative environment. We emphasize:

- **Innovation**: Encouraging new ideas and adaptive solutions in a fast-evolving market.
- **Teamwork**: Working together to achieve common goals and enhance customer satisfaction.
- **Integrity**: Upholding the highest ethical standards in all our interactions.

---

## Careers at Thế giới di động

We believe our people are our greatest asset. As a rapidly growing company, we are constantly on the lookout for passionate individuals to join our team. Opportunities exist in various areas, from retail positions to IT and supply chain management. We provide competitive compensation, professional development programs, and a vibrant workplace that embraces diversity.

**Join Us!** Discover exciting career opportunities at [Thegioididong Jobs](#).

---

## Connect with Us

To learn more about our products, services, and career opportunities, visit our website at [Thegioididong.com](https://thegioididong.com). Follow us on social media for the latest updates and promotions.

---

**Công ty CP Thế giới di động**  
Your trusted source for technology in Vietnam!  
**Contact Us:** info@thegioididong.com

--- 

Thank you for considering Công ty CP Thế giới di động! We look forward to serving you and welcoming you to our team.