# Brochure establishment method using LLM model

## BUSINESS CHALLENGE:
Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

## Import libraries

In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

## Initialize environment

In [4]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key found!")
else:
    print("There's a problem with API key!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key found!


In [5]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

## Class of Website Scraper

In [6]:
class Website:
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found."
        if soup.body:
            for irr in soup.body(["script", "style", "img", "input"]):
                irr.decompose()
            
            self.text = soup.body.get_text(separator="\n", strip=True)
            
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]
    
    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [7]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 '

## Using GPT-4o mini figure out whichs links are relevant
It should decide which links are relevant, and replace relative links such as `/about` with "https://company.com/about".
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

In [9]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [10]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



Get links for user prompt

In [11]:
def get_link_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [12]:
print(get_link_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edward

### Get links for user prompt

In [14]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role":"system", "content":link_system_prompt},
            {"role":"user", "content":get_link_user_prompt(website=website)}
        ],
        response_format={"type":"json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)
    

In [15]:
huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/tencent/SRPO',
 '/baidu/ERNIE-4.5-21B-A3B-Thinking',
 '/Qwen/Qwen3-Next-80B-A3B-Instruct',
 '/Qwen/Qwen3-Next-80B-A3B-Thinking',
 '/google/embeddinggemma-300m',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/zerogpu-aoti/wan2-2-fp8da-aoti-faster',
 '/spaces/multimodalart/wan-2-2-first-last-frame',
 '/spaces/IndexTeam/IndexTTS-2-Demo',
 '/spaces/tencent/HunyuanImage-2.1',
 '/spaces',
 '/datasets/HuggingFaceFW/finepdfs',
 '/datasets/HuggingFaceM4/FineVision',
 '/datasets/LucasFang/FLUX-Reason-6M',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/Josephgflowers/Finance-Instruct-500k',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 

In [16]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community page', 'url': 'https://discuss.huggingface.co'},
  {'type': 'github page', 'url': 'https://github.com/huggingface'},
  {'type': 'twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'linkedin page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Make the brochure

### Build a function to read details of website and links

In [19]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print(f"Found links: ", links)
    for link in links['links']:
        result += f"\n\n{link['type']}\n"
        result += Website(link['url']).get_contents()
        
    return result

In [20]:
print(get_all_details("https://huggingface.co"))

Found links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
tencent/SRPO
Updated
2 days ago
•
3.61k
•
779
baidu/ERNIE-4.5-21B-A3B-Thinking
Updated
6 days ago
•
112k
•
716
Qwen/Qwen3-Next-80B-A3B-Instruct
Updated
2 days ago
•
305k
•
606
Qwen/Qwen3-Next-80B-A3B-Thinking
Updated
2 days ago
•
160k
•
347
google/embeddinggemma-300m
Updated
7 days ago
•
177k
•

### System prompt for reading website

In [21]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [22]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [23]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'documentation page', 'url': 'https://huggingface.co/docs'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\ntencent/SRPO\nUpdated\n2 days ago\n•\n3.61k\n•\n779\nbaidu/ERNIE-4.5-21B-A3B-Thinking\nUpdated\n6 days ago\n•\n112k\n•\n716\nQwen/Qwen3-Next-80B-A3B-Instruct\nUpdated\n2 days ago\n•\n305k\n•\n606\nQwen/Qwen3-Next-80B-A3B-Thinking\nUpdated\n2 days ago\n•\n160k\n•\n347\ngoogle/embeddinggemma-300m\nUpdated\n7 days ago\n•\n177k\n•\n813\nBrowse 1M+ models\nSpaces\nRunning\n13.6k\n13.6k\nDeepSit

## Create brochure with function and prompts

In [24]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [25]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links:  {'links': [{'type': 'home page', 'url': 'https://huggingface.co'}, {'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'status page', 'url': 'https://status.huggingface.co/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'Zhihu page', 'url': 'https://www.zhihu.com/org/huggingface'}]}


```markdown
# Hugging Face: The AI Community Building the Future

**Welcome to Hugging Face!**  
At Hugging Face, we are redefining the way the AI community collaborates, shares, and innovates. With a vast platform featuring over 1 million models and 250,000 datasets, we offer the tools necessary to discover, build, and deploy state-of-the-art machine learning applications.

## Our Offerings

- **Models**: Explore **1M+** machine learning models that cater to various tasks and domains. From text generation to image processing, our models are continuously updated and shared by a vibrant community.
  
- **Datasets**: Access and share more than **250,000 datasets** specially curated for diverse machine learning tasks, ensuring a rich and robust foundation for your projects.
  
- **Spaces**: Collaborate in **Spaces**, a user-friendly environment designed for creating, sharing, and experimenting with apps as well as showcasing your innovative work.
  
- **Enterprise Solutions**: Enhance your team’s productivity with enterprise-grade tools, optimized performance, and dedicated support, all while ensuring security and ease of access.

## Our Community

With over **50,000 organizations** harnessing our platform, Hugging Face is the beating heart of the AI community. Some of our notable collaborators include tech giants such as **Google, Microsoft, Amazon**, and **Meta**. Together, we are creating a collaborative ecosystem where knowledge is shared, and innovations thrive.

## Company Culture

At Hugging Face, we foster a vibrant, inclusive, and open-minded culture. Our team is passionate about AI and committed to building a supportive community where everyone can contribute their unique perspectives. We believe collaboration is key to progress in machine learning, and we encourage a culture of learning, openness, and teamwork.

## Careers at Hugging Face

Join us in building the future of AI! We are always searching for talented and motivated individuals to join our diverse team. Opportunities span various fields, from engineering to community support, and we welcome all who share our passion for innovation and collaboration.

### Open Positions
- Software Engineers
- Data Scientists
- Community Managers
- Product Specialists

Explore our current openings and be part of something incredible—help us shape the future of machine learning!

## Connect with Us

Discover more about Hugging Face, explore opportunities, or start collaborating today. Visit us at [Hugging Face](https://huggingface.co) and join the conversation on social media: 

- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://www.linkedin.com/company/huggingface)
- [Discord](https://discord.com/invite/huggingface)

### Let's build the future together!
```


In [26]:
## Try another company
create_brochure("University of Technology - VNU-HCM", "https://hcmut.edu.vn")

Found links:  {'links': [{'type': 'about page', 'url': 'https://hcmut.edu.vn/en/about'}, {'type': 'careers page', 'url': 'https://hcmut.edu.vn/en/careers'}, {'type': 'programs page', 'url': 'https://hcmut.edu.vn/en/academic-programs'}, {'type': 'news page', 'url': 'https://hcmut.edu.vn/en/news'}, {'type': 'events page', 'url': 'https://hcmut.edu.vn/en/events'}]}


# University of Technology - VNU-HCM Brochure

## Welcome to ĐHBK HCM

The University of Technology - VNU-HCM (ĐHBK HCM) stands as a leading institution in higher education, dedicated to equipping students with exceptional knowledge and skills in engineering and technology. As part of the Vietnam National University system, we are committed to fostering innovation and excellence.

### About Us

At ĐHBK HCM, we value the pursuit of knowledge and the development of creative solutions for the challenges of our time. Our faculty members are renowned experts in their fields, and our programs are designed to encourage critical thinking and practical experience.

### Our Programs

We offer a wide range of undergraduate and postgraduate programs focused on engineering, technology, and applied sciences. Our curriculum is continuously updated to reflect the latest industry trends and technological advancements, ensuring our graduates are well-prepared for the workforce.

### Our Culture

We are committed to creating an inclusive and dynamic educational environment where creativity and collaboration thrive. Our campus culture emphasizes respect, integrity, and mutual support. We encourage students to engage in extracurricular activities, fostering a balanced approach to personal and professional development.

### Our Customers

Our primary customers include ambitious students seeking a top-tier education, as well as industries looking for skilled graduates ready to tackle real-world challenges. We maintain strong partnerships with various organizations to ensure our programs meet the evolving needs of the job market.

### Careers at ĐHBK HCM

Join our team of dedicated professionals! At ĐHBK HCM, we are always looking for passionate faculty and staff who are eager to contribute to the advancement of education and research. We offer a supportive work environment and opportunities for continuous professional development.

### Connect With Us

Stay updated on our latest news and events by visiting our website regularly. We look forward to welcoming future innovators, researchers, and leaders to our vibrant academic community at ĐHBK HCM.

---

For more information, please visit our [website](http://example-url.com) or contact us directly. 

**University of Technology - VNU-HCM**
- **Location:** Ho Chi Minh City, Vietnam
- **Contact Email:** info@dhbk.edu.vn
- **Phone:** +84-28-xxxx-xxxx

---

Together, let’s shape the future through education and technology!

In [27]:
#With a small adjustment, we can change this so that the results stream back from OpenAI, with the familiar typewriter animation
def stream_brochure(company, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company, url)}
        ],
        stream = True
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [28]:
stream_brochure("University of Technology - VNU-HCM", "https://hcmut.edu.vn")

Found links:  {'links': [{'type': 'about page', 'url': 'https://hcmut.edu.vn/en/about'}]}


# University of Technology - VNU-HCM Brochure

## About Us
Welcome to the **University of Technology - VNU-HCM**, one of the leading academic institutions in Vietnam. As part of the Vietnam National University, Ho Chi Minh City, we are dedicated to providing high-quality education and fostering innovation in technology and engineering.

### Mission
Our mission is to equip students with the necessary skills and knowledge to succeed in their fields, contribute to society, and drive technological advancement in Vietnam and beyond.

### Vision
We aspire to be recognized internationally for excellence in research and teaching, preparing our graduates to meet the challenges of an ever-evolving global landscape.

---

## Company Culture
At the University of Technology, we pride ourselves on a collaborative and inclusive environment that encourages creativity, critical thinking, and lifelong learning. Our faculty and staff are committed to fostering a holistic educational experience, where students feel empowered to explore their interests and passions.

### Core Values
- **Innovation**: We embrace new ideas and technologies to enhance the educational experience.
- **Integrity**: We uphold the highest standards of honesty and ethical conduct in all our endeavors.
- **Collaboration**: We believe in the power of teamwork to achieve greater outcomes.
- **Excellence**: We strive for the highest quality in teaching, research, and community engagement.

---

## Our Customers
We serve a diverse community of students from various backgrounds, industries, and regions. Our graduates are highly sought after by employers for their technical expertise and soft skills, which are developed through rigorous academic programs and real-world experiences.

### Key Stakeholders
- **Students**: Our primary focus is to nurture the next generation of technology leaders.
- **Employers**: We maintain strong partnerships with local and international businesses to align our curriculum with market needs.
- **Research Communities**: We engage in collaborative research efforts to drive innovation and address real-world problems.

---

## Careers at UOT
Joining the University of Technology means becoming part of a vibrant academic community. We are always looking for passionate educators, researchers, and administrative staff who are committed to making a difference.

### Opportunities
- **Teaching Positions**: We invite applications from experienced and highly skilled teachers in various engineering and technology disciplines.
- **Research Roles**: Researchers can engage in groundbreaking projects that contribute to our knowledge base and encourage innovation.
- **Administrative Staff**: Essential to our mission, administrative roles help maintain the university's operations and support student success.

### Benefits
- Competitive salary and benefits package
- Opportunities for professional development and continuous learning
- A collaborative atmosphere that values diversity and inclusion

---

## Join Us
Ready to be part of a dynamic institution that stands at the forefront of educational excellence? Explore our programs, faculty, and values, and discover how you can contribute to our community. 

For more information, visit our official website.

**Contact Us:**
Email: info@ut.edu.vn  
Phone: +84 123 456 789  
Address: 268 Ly Thuong Kiet, District 10, Ho Chi Minh City, Vietnam

---

**University of Technology - VNU-HCM**  
Empowering Minds, Inspiring Innovation.