## BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.


In [1]:
# imports

import os
import requests
import json
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

load_dotenv()

True

In [2]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

## First step: Have GPT-4o-mini figure out which links are relevant

In [4]:
def get_links_user_prompt(website):
    return f"Here is the list of links on the website of {website.url} - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format.\n\nLinks (some might be relative links):\n{website.links}\n"

In [6]:
client = OpenAI()

def get_links(url):
    website = Website(url)
    
    with open("prompt.txt", "r", encoding="utf-8") as f:
        instructions = f.read()

    response = client.responses.create(
        model="gpt-4o-mini",
        instructions=instructions,
        input=get_links_user_prompt(website),
        text={"format": {"type": "json_object"}},
        store=False
    )
    return json.loads(response.output_text)

## Second step: make the brochure!

In [8]:
def get_all_details(url):
    result = ""
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [10]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [26]:
def create_brochure(company_name, url):
    with open("prompt_2.txt", "r", encoding="utf-8") as f:
        instructions = f.read()

    response = client.responses.create(
        model="gpt-4o-mini",
        instructions=instructions,
        input=get_brochure_user_prompt(company_name, url),
        store=False
    )
    result = response.output_text
    display(Markdown(result))

In [27]:
# create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'home page', 'url': 'https://huggingface.co/'}, {'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'discuss forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'documentation', 'url': 'https://huggingface.co/docs'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## **Welcome to Hugging Face**
### The AI Community Building the Future

Hugging Face is a revolutionary platform designed for the machine learning community to collaborate on models, datasets, and applications. With over 1 million models and 250,000 datasets, we empower developers and researchers to create, explore, and innovate in AI.

### **Our Mission**
Our mission is to democratize machine learning by building an accessible and open-source ecosystem. We strive to support the global AI community by providing powerful tools and resources to enhance creativity and efficiency in machine learning projects.

---

## **What We Offer**
### **Models & Datasets**
- **1M+ Models**: Discover, deploy, and collaborate on a vast library of ML models ranging from text, image, video, audio, to 3D.
- **250k+ Datasets**: Access and contribute to a diverse array of datasets for every ML task imaginable.

### **Spaces & Applications**
Explore innovative applications on our platform, with a collection of over 400,000 applications that include dialogue generation, character customization, and more.

### **Open Source Community**
Our initiatives focus on open-source tools such as:
- **Transformers**: State-of-the-art ML models for various frameworks.
- **Diffusers**: Cutting-edge diffusion models optimized for performance.
- **Accelerate & Tokenizers**: Robust libraries for efficient training and usage of ML models.

---

## **Customer Base**
Join a diverse community of over **50,000 organizations**, including tech giants like:
- **Meta**
- **Amazon**
- **Google**
- **Microsoft**

Our platform is tailored for enterprises, offering premium solutions with advanced security and dedicated support.

---

## **Company Culture**
At Hugging Face, we foster a friendly and inclusive environment where innovation thrives. Collaboration is at our core – we engage with our community through forums, discussions, and open feedback channels. We are committed to ensuring that everyone, from seasoned machine learning experts to newcomers, feels welcome and supported.

---

## **Careers at Hugging Face**
We are constantly on the lookout for talented individuals to join our team! If you are passionate about AI, machine learning, and want to contribute to a significant mission of democratizing technology, we encourage you to consider a career with us. Explore our [careers page](https://huggingface.co/jobs) for current openings.

---

## **Conclusion**
Join us at Hugging Face and be a part of the future of AI. Whether you’re looking to contribute your expertise, utilize powerful tools for your projects, or explore cutting-edge innovations, Hugging Face is the place to be. 

### **Connect with Us**
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://linkedin.com/company/huggingface)
- [Discord](https://discord.gg/huggingface)

**Let’s build the future together!**

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [30]:
def stream_brochure(company_name, url):
    with open("prompt_2.txt", "r", encoding="utf-8") as f:
        instructions = f.read()

    stream = client.responses.create(
        model="gpt-4o-mini",
        instructions=instructions,
        input=get_brochure_user_prompt(company_name, url),
        store=False,
        stream=True,
    )

    # Initialize the display handle with an empty string
    display_handle = display(Markdown(""), display_id=True)
    output_text = ""

    for event in stream:
        if event.type == "response.output_text.delta":
            output_text += event.delta
            update_display(Markdown(output_text), display_id=display_handle.display_id)

In [31]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'home page', 'url': 'https://huggingface.co/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discussion forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## Welcome to Hugging Face

### The AI Community Building the Future

At Hugging Face, we are dedicated to creating a platform where the machine learning community can collaborate, innovate, and build the future of AI together. With over 1 million models and 250,000 datasets, our platform serves as the go-to resource for developers, researchers, and businesses worldwide.

---

### What We Offer

- **Models**: Access to a diverse range of models for natural language processing, computer vision, and more.
- **Datasets**: Explore a vast library of datasets tailored for various ML tasks.
- **Spaces**: Share and discover applications that use our tools and libraries.
- **Enterprise Solutions**: Tailored solutions for businesses, complete with enterprise-grade security, dedicated support, and access controls.

---

### Our Community

More than **50,000 organizations**, including industry giants like Google, Microsoft, Amazon, and Meta, utilize Hugging Face to enhance their machine learning capabilities. Our open-source projects, like Transformers and Diffusers, empower developers to innovate rapidly and efficiently.

**Key Customers**:
- AI at Meta
- Amazon
- Google
- Microsoft
- Grammarly

---

### Company Culture

At Hugging Face, we believe in the power of collaboration and inclusivity. Our community-centric culture champions open-source principles, knowledge sharing, and transparency. We foster an environment where creativity can thrive, allowing each member to contribute to the rapidly evolving field of AI.

- **Innovation-Driven**: Encouraging creative solutions and new ideas.
- **Collaborative Spirit**: Supporting teamwork within and beyond our team.
- **Inclusive Environment**: Valuing diversity and welcoming all voices.

---

### Careers at Hugging Face

Join us in our mission to democratize AI! We are always seeking talented individuals who are passionate about machine learning and software development. At Hugging Face, you can expect:

- Opportunities for growth and skill development.
- A chance to collaborate with industry leaders and innovators.
- A flexible work environment that promotes work-life balance.

**Visit our [careers page](#) to explore current job openings!**

---

### Connect With Us

Stay updated with our latest features, projects, and community news:

- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://www.linkedin.com/company/hugging-face)
- [Discord](https://discord.gg/huggingface)

Join the Hugging Face community today and help shape the future of AI!