### Business Problem : Company Sales Brochure Generator

- Create a product that can generate marketing brochures about a company 
    - For prospective clients
    - For investors
    - For recruitment

- Can use Technology like : OpenAI API, One shot prompting, Stream back results and show with formatting

In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Load environment variables from .env

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found")
elif api_key[:8]!= "sk-proj-":
    print("An API Key was found, but it doesn't start with 'sk-proj-', please check you're using right api key")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [14]:
openai = OpenAI()

In [3]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scrapped, now lets try to collect the links as well
    """
    url : str
    title : str
    body : str
    links : List[str]
    text : str

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()

            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title : \n {self.title} \n Webpage Contents : \n {self.text} \n\n"

In [4]:
web = Website("https://en.wikipedia.org/wiki/DeepSeek")
print(web.title)
print(web.text)


DeepSeek - Wikipedia
Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Contribute
Help
Learn to edit
Community portal
Recent changes
Upload file
Search
Search
Appearance
Donate
Create account
Log in
Personal tools
Donate
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Contents
move to sidebar
hide
(Top)
1
Background
2
Development and release history
Toggle Development and release history subsection
2.1
DeepSeek Coder
2.2
DeepSeek LLM
2.3
V2
2.4
V3
2.5
R1
3
Assessment and reactions
4
Concerns
Toggle Concerns subsection
4.1
Censorship
4.2
Security and privacy
5
See also
6
Notes
7
References
8
External links
Toggle the table of contents
DeepSeek
58 languages
Afrikaans
العربية
Aragonés
অসমীয়া
Azərbaycanca
বাংলা
Български
Català
Čeština
Dansk
الدارجة
Deutsch
Ελληνικά
Español
Esperanto
Euskara
فارسی
Français
Frysk
Fulfulde
Gaeilge
Galego
한국어
Ido
Bahasa Indonesia
Ita

In [5]:
print(web.links)

['#bodyContent', '/wiki/Main_Page', '/wiki/Wikipedia:Contents', '/wiki/Portal:Current_events', '/wiki/Special:Random', '/wiki/Wikipedia:About', '//en.wikipedia.org/wiki/Wikipedia:Contact_us', '/wiki/Help:Contents', '/wiki/Help:Introduction', '/wiki/Wikipedia:Community_portal', '/wiki/Special:RecentChanges', '/wiki/Wikipedia:File_upload_wizard', '/wiki/Main_Page', '/wiki/Special:Search', 'https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en', '/w/index.php?title=Special:CreateAccount&returnto=DeepSeek', '/w/index.php?title=Special:UserLogin&returnto=DeepSeek', 'https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en', '/w/index.php?title=Special:CreateAccount&returnto=DeepSeek', '/w/index.php?title=Special:UserLogin&returnto=DeepSeek', '/wiki/Help:Introduction', '/wiki/Special:MyContributions', '/wiki/Special:MyTalk', '#', '#Background', '#Development_and_release_history', '#DeepSe

#### First Step : Have GPT-4o-mini figure out which links are relevant

- Use a call to gpt-4o-mini to read the links on a webpage and respond in structured json
- It should decide which links are relevant 

In [6]:
link_system_prompt = " You are provided with a list of links found on a webpage. \
    You are able to decide which of the links would be most relevant to include in a brochure about the company, \
    such as links to an About page, or a Company page, or Careers/Job pages. \n"

link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt +=  """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [8]:
print(link_system_prompt)

 You are provided with a list of links found on a webpage.     You are able to decide which of the links would be most relevant to include in a brochure about the company,     such as links to an About page, or a Company page, or Careers/Job pages. 
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [12]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
                    Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links are : ) \n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [13]:
print(get_links_user_prompt(web))

Here is the list of links on the website of https://en.wikipedia.org/wiki/DeepSeek - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format.                     Do not include Terms of Service, Privacy, email links.
Links (some might be relative links are : ) 
#bodyContent
/wiki/Main_Page
/wiki/Wikipedia:Contents
/wiki/Portal:Current_events
/wiki/Special:Random
/wiki/Wikipedia:About
//en.wikipedia.org/wiki/Wikipedia:Contact_us
/wiki/Help:Contents
/wiki/Help:Introduction
/wiki/Wikipedia:Community_portal
/wiki/Special:RecentChanges
/wiki/Wikipedia:File_upload_wizard
/wiki/Main_Page
/wiki/Special:Search
https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en
/w/index.php?title=Special:CreateAccount&returnto=DeepSeek
/w/index.php?title=Special:UserLogin&returnto=DeepSeek
https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia

In [23]:
# Put all of this into a function which calls LLM model
# take response in json_format

def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = [
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
        ],
        response_format= {"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [25]:
get_links("https://anthropic.com")

{'links': [{'type': 'about page', 'url': 'https://www.anthropic.com/company'},
  {'type': 'careers page', 'url': 'https://www.anthropic.com/careers'},
  {'type': 'team page', 'url': 'https://www.anthropic.com/team'},
  {'type': 'research page', 'url': 'https://www.anthropic.com/research'},
  {'type': 'enterprise page', 'url': 'https://www.anthropic.com/enterprise'}]}

Second Step : make the brochure!
- Assemble all the details into another prompt to LLM

In [26]:
def get_all_details(url):
    result = "Landing page : \n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links : ", links)
    for link in links["links"]:
        result += f"\n\n {link['type']} \n"
        result += Website(link["url"]).get_contents()
    return result

In [27]:
print(get_all_details("https://anthropic.com"))

Found links :  {'links': [{'type': 'about page', 'url': 'https://www.anthropic.com/company'}, {'type': 'careers page', 'url': 'https://www.anthropic.com/careers'}, {'type': 'team page', 'url': 'https://www.anthropic.com/team'}, {'type': 'research page', 'url': 'https://www.anthropic.com/research'}]}
Landing page : 
Webpage Title : 
 Home \ Anthropic 
 Webpage Contents : 
 Claude
Overview
Team
Enterprise
API
Pricing
Research
Company
Careers
News
Try Claude
AI
research
and
products
that put safety at the frontier
Claude.ai
Meet Claude 3.5 Sonnet
Claude 3.5 Sonnet, our most intelligent AI model, is now available.
Talk to Claude
API
Build with Claude
Create AI-powered applications and custom experiences using Claude.
Learn more
Announcements
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
Oct 22, 2024
Model updates
3.5 Sonnet
3.5 Haiku
Our Work
Product
Claude for Enterprise
Sep 4, 2024
Alignment
·
Research
Constitutional AI: Harmlessness from AI Feedback
Dec 15, 202

In [28]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [29]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a comany called : {company_name} \n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [30]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links :  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}


'You are looking at a comany called : HuggingFace \nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page : \nWebpage Title : \n Hugging Face – The AI community building the future. \n Webpage Contents : \n Hugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nNEW\nWelcome to Inference Providers on the Hub 🔥\nsmolagents - a smol library to build great agents\nUse models from the HF Hub in LM Studio\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nTrending on\nthis week\nModels\ndeepseek-ai/DeepSeek-R1\nUpdated\n1 day ago\n•\n845k\n•\n6k\ndeepseek-ai/Janus-Pro-7B\nUpdated\n1 day ago\n•\n134k\n•\n2.36k\ndeepseek-ai/DeepSeek-V3\nUpdated\n9 days ago\n•\n877k\n•\n3.01k\nmistralai/Mistral-Small-24B-Instruct-2501\nUpdated\nabout 2 hours ago\n•\n12.1k\n•\n474\nuns

In [31]:
def create_broucher(company_name, url):
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [32]:
create_broucher("HuggingFace", "https://huggingface.co")


Found links :  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}]}


# Hugging Face Brochure

## Welcome to Hugging Face!

### The AI Community Building the Future
Hugging Face is a collaborative platform where the machine learning community can come together to develop, share, and improve upon models, datasets, and applications. With over **400K+ models** and **100K+ datasets**, we aim to accelerate machine learning innovation.

---

### Our Offerings

- **Models**: Explore a wide variety of models tailored for diverse tasks, including deep learning for images, text generation, and more. 
- **Datasets**: Access a robust collection of datasets to facilitate your machine learning projects.
- **Spaces**: Create and discover applications that utilize state-of-the-art machine learning models.
- **Enterprise Solutions**: Unlock advanced tools for your team, complete with enterprise-grade security, dedicated support, and access to optimized computing resources.

---

### Community Engagement
At Hugging Face, we believe in the power of open-source collaboration. We are building foundational AI tools alongside a vibrant community of over **50,000 organizations** that trust our platform, including leading names like Amazon Web Services, Google, and Microsoft.

- **Open Source**: Engage with a variety of open-source libraries such as Transformers, Diffusers, and more, which have fostered significant contributions from users worldwide.

---

### Culture & Values

- **Collaboration**: We prioritize community efforts to ensure machine learning progresses through shared knowledge and contributions.
- **Innovation**: Hugging Face is dedicated to pushing the boundaries of what's possible in AI, encouraging creativity and cutting-edge research.
- **Supportive Environment**: We foster a culture of inclusiveness and ongoing learning, where every voice matters.

---

### Careers at Hugging Face
Join us on our mission to create a future driven by AI! We are always looking for talented individuals passionate about machine learning. 

- **Current Openings**: From research scientists to engineering roles, a variety of opportunities await those eager to innovate in the AI space.
- **Why Join Us?**: Enjoy a flexible work environment, competitive benefits, and the chance to work alongside industry-leading experts.

---

### Let's Connect
Become a part of the Hugging Face community:
- **Website**: [huggingface.co](https://huggingface.co)
- **Social Media**: Follow us on Twitter, LinkedIn, and Discord to stay connected.

---

Embark on your journey in the AI community with Hugging Face. Together, we can build the future!

Add Streaming response

In [33]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model= "gpt-4o-mini",
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        stream= True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```", "").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)
        

In [34]:
stream_brochure("HuggingFace", "https://huggingface.co")


Found links :  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'documentation page', 'url': 'https://huggingface.co/docs'}]}


# Hugging Face Brochure

Welcome to **Hugging Face**, the AI community building the future. Our mission is to create a collaborative platform where the machine learning community can come together to contribute models, datasets, and applications.

## Company Overview

**Hugging Face** serves as a home for machine learning, offering resources for developers, researchers, and enterprises. Our platform hosts over 400,000 models and 100,000 datasets, making it a treasure cache for anyone interested in advancing their AI capabilities.

## What We Offer

- **Models**: Access to a diverse collection of state-of-the-art AI models across various domains.
- **Datasets**: Browse an extensive library with over 100k datasets tailored for machine learning tasks.
- **Spaces**: Collaborate and run ML applications seamlessly on our robust infrastructure.
- **Enterprise Solutions**: Custom solutions with enterprise-grade security and dedicated support.

### Community Engagement

We believe in the power of collaboration. Our open-source contributions allow users to:
- Build and share their ML projects publicly.
- Explore different modalities including text, image, video, audio, and even 3D.
- Create and enhance their personal ML portfolios.

## Who Uses Hugging Face?

More than **50,000 organizations** rely on Hugging Face, including industry giants like:
- Meta
- Amazon Web Services
- Google
- Microsoft
- Intel

Join a vibrant community that includes non-profits like **AI2** and innovative enterprises to drive the future of AI.

## Company Culture

At Hugging Face, we foster a culture of openness and innovation. Our community is built on transparency, collaboration, and a commitment to pushing the boundaries of what AI can achieve. We believe that great ideas come from everyone, and we encourage all members to share their insights and contributions.

### Careers at Hugging Face

We are always on the lookout for passionate individuals to join our diverse team. If you are enthusiastic about machine learning and want to contribute to a transformative field, check out our **[Jobs Page](https://huggingface.co/jobs)** for current openings and learn how you can be a part of our mission.

---

For more information, visit our website at [Hugging Face](https://huggingface.co), and join us in building the future of AI. 

**Connect with Us:**
- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://linkedin.com/company/huggingface)
- [Discord](https://discord.gg/huggingface)

Together, let’s accelerate the advancement of machine learning and facilitate a collective journey towards innovation in AI!