## BUSINESS CHALLENGE:
#### Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits into English, Spanish and Marathi Or jokey brochure.

#### We will be provided a company name and their primary website.

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [8]:
def get_links(url):
    website = Website(url)
    current_prompt = get_links_user_prompt(website)

    # Multi-shot prompt
    examples = [
        {
            "user": "Links: https://companyx.com, https://companyx.com/about, https://companyx.com/careers",
            "assistant": json.dumps({
                "links": [
                    {"type": "about page", "url": "https://companyx.com/about"},
                    {"type": "careers page", "url": "https://companyx.com/careers"}
                ]
            })
        },
        {
            "user": "Links: https://site.org, https://site.org/our-story, https://site.org/jobs, https://site.org/contact",
            "assistant": json.dumps({
                "links": [
                    {"type": "about page", "url": "https://site.org/our-story"},
                    {"type": "careers page", "url": "https://site.org/jobs"}
                ]
            })
        }
    ]

    # Construct the conversation with multi-shot examples
    messages = [{"role": "system", "content": link_system_prompt}]
    for ex in examples:
        messages.append({"role": "user", "content": ex["user"]})
        messages.append({"role": "assistant", "content": ex["assistant"]})

    # Add real user prompt
    messages.append({"role": "user", "content": current_prompt})

    # Call the model
    response = openai.chat.completions.create(
        model=MODEL,
        messages=messages,
        response_format={"type": "json_object"}
    )
    
    result = response.choices[0].message.content
    return json.loads(result)


## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [10]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [11]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [12]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [13]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))    
    return result

In [14]:
def translate_to_spanish(text):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": "Translate the following markdown brochure to Spanish."},
            {"role": "user", "content": text}
        ]
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [16]:
def translate_to_Hindi(text):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": "Translate the following markdown brochure to Hindi."},
            {"role": "user", "content": text}
        ]
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [17]:
# --- EXECUTION ---
brochure_md = create_brochure("Hugging Face", "https://huggingface.co")
translate_to_spanish(brochure_md)
translate_to_Hindi(brochure_md)

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'documentation page', 'url': 'https://huggingface.co/docs'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## About Us
Welcome to **Hugging Face**, the heart of the AI community driving innovation for the future. We are a collaborative platform where machine learning enthusiasts and professionals come together to explore, create, and innovate. With over **1 million models** and **250,000 datasets**, we provide everything you need to unleash the power of AI.

## Our Offerings

### Models & Datasets
- **1M+ Models**: A revolutionary library where you can browse trending machine learning models like Nanonets-OCRs and Google Magenta.
- **250k+ Datasets**: Discover extensive datasets for various machine learning tasks.

### Spaces
Hugging Face Spaces is where creativity unfolds! Host, run, and showcase applications ranging from high-resolution 3D model generation to powerful code generation from text prompts.

### Enterprise Solutions
We offer enterprise-grade solutions to help teams accelerate their AI initiatives with dedicated support, access controls, and robust security. Starting at **$20/user/month**, our plans cater to businesses of all sizes.

## Our Community
With more than **50,000 organizations** relying on Hugging Face, including tech giants like Google, Microsoft, and Amazon, our community has an established network of users and contributors dedicated to advancing AI.

### Engaged Collaborators
- **Non-Profits**: Such as AI2, promoting ethical AI development.
- **Technology Leaders**: Collaborations with organizations like Meta, Grammarly, and Intel emphasize our commitment to innovation and quality.

## Company Culture
At Hugging Face, we pride ourselves on fostering a vibrant and inclusive company culture that values collaboration, creativity, and transparency. Our team is passionate about technological advancement and is dedicated to building open-source tools that benefit everyone in the AI ecosystem.

### Career Opportunities
Join us in our mission to revolutionize AI development! We are always looking for talent across various fields. Explore our current openings and become part of a dynamic team pushing the boundaries. 

## Get Involved
- **Explore AI Apps**: Jump right into machine learning by exploring our curated AI applications.
- **Sign Up**: Create your profile and begin building, sharing, and collaborating on your own ML projects.

For further information, visit our website and connect on social media platforms like GitHub, Twitter, LinkedIn, and Discord.

---

**Hugging Face: The AI Community Building the Future.**

# Folleto de Hugging Face

## Sobre Nosotros
Bienvenido a **Hugging Face**, el corazón de la comunidad de IA que impulsa la innovación hacia el futuro. Somos una plataforma colaborativa donde entusiastas y profesionales del aprendizaje automático se reúnen para explorar, crear e innovar. Con más de **1 millón de modelos** y **250,000 conjuntos de datos**, proporcionamos todo lo que necesitas para liberar el poder de la IA.

## Nuestras Ofertas

### Modelos y Conjuntos de Datos
- **Más de 1M de Modelos**: Una biblioteca revolucionaria donde puedes explorar modelos de aprendizaje automático de tendencia como Nanonets-OCRs y Google Magenta.
- **Más de 250k Conjuntos de Datos**: Descubre conjuntos de datos extensos para diversas tareas de aprendizaje automático.

### Espacios
¡Hugging Face Spaces es donde se despliega la creatividad! Aloja, ejecuta y muestra aplicaciones que van desde la generación de modelos 3D de alta resolución hasta la generación de código potente a partir de indicaciones de texto.

### Soluciones Empresariales
Ofrecemos soluciones de nivel empresarial para ayudar a los equipos a acelerar sus iniciativas de IA con soporte dedicado, controles de acceso y seguridad robusta. A partir de **$20/usuario/mes**, nuestros planes se adaptan a empresas de todos los tamaños.

## Nuestra Comunidad
Con más de **50,000 organizaciones** confiando en Hugging Face, incluidas gigantes tecnológicos como Google, Microsoft y Amazon, nuestra comunidad cuenta con una red establecida de usuarios y contribuidores dedicados a avanzar en la IA.

### Colaboradores Comprometidos
- **Organizaciones sin Fines de Lucro**: Como AI2, promoviendo el desarrollo ético de la IA.
- **Líderes Tecnológicos**: Colaboraciones con organizaciones como Meta, Grammarly e Intel enfatizan nuestro compromiso con la innovación y la calidad.

## Cultura Empresarial
En Hugging Face, nos enorgullecemos de fomentar una cultura empresarial vibrante e inclusiva que valora la colaboración, la creatividad y la transparencia. Nuestro equipo es apasionado por el avance tecnológico y está dedicado a construir herramientas de código abierto que beneficien a todos en el ecosistema de IA.

### Oportunidades de Carrera
¡Únete a nosotros en nuestra misión de revolucionar el desarrollo de la IA! Siempre estamos buscando talento en diversos campos. Explora nuestras ofertas actuales y conviértete en parte de un equipo dinámico que está empujando los límites.

## Involúcrate
- **Explora Aplicaciones de IA**: Sumérgete en el aprendizaje automático explorando nuestras aplicaciones de IA seleccionadas.
- **Regístrate**: Crea tu perfil y comienza a construir, compartir y colaborar en tus propios proyectos de ML.

Para más información, visita nuestro sitio web y conéctate en plataformas de redes sociales como GitHub, Twitter, LinkedIn y Discord.

---

**Hugging Face: La Comunidad de IA Construyendo el Futuro.**

# हगिंग फेस ब्रोशर

## हमारे बारे में
**हगिंग फेस** में आपका स्वागत है, जो AI समुदाय के दिल में है और भविष्य के लिए नवाचार को बढ़ावा देता है। हम एक सहयोगी मंच हैं जहां मशीन लर्निंग के शौकीन और पेशेवर एक साथ आकर खोजते हैं, बनाते हैं और नवोन्मेष करते हैं। **1 मिलियन से अधिक मॉडल** और **250,000 डेटा सेट** के साथ, हम AI की शक्ति को उजागर करने के लिए आवश्यक सभी चीजें प्रदान करते हैं।

## हमारी पेशकशें

### मॉडल और डेटा सेट
- **1M+ मॉडल**: एक क्रांतिकारी पुस्तकालय जहां आप Nanonets-OCRs और Google Magenta जैसे ट्रेंडिंग मशीन लर्निंग मॉडलों को ब्राउज़ कर सकते हैं।
- **250k+ डेटा सेट**: विभिन्न मशीन लर्निंग कार्यों के लिए विशाल डेटा सेट खोजें।

### स्पेस
हगिंग फेस स्पेस वह जगह है जहां रचनात्मकता विकसित होती है! उच्च-रिज़ॉल्यूशन 3D मॉडल जनरेशन से लेकर टेक्स्ट प्रम्पट्स से कोड जनरेशन तक के APPLICATIONS को होस्ट, रन और प्रदर्शित करें।

### एंटरप्राइज समाधान
हम एंटरप्राइज-ग्रेड समाधान प्रदान करते हैं जो टीमों को समर्पित समर्थन, पहुंच नियंत्रण और मजबूत सुरक्षा के साथ अपने AI पहलों को तेज करने में मदद करते हैं। **$20/उपयोगकर्ता/महीने** से शुरू होने वाली हमारी योजनाएं सभी आकार के व्यवसायों के लिए हैं।

## हमारा समुदाय
**50,000 से अधिक संगठन** हगिंग फेस पर निर्भर हैं, जिनमें Google, Microsoft और Amazon जैसे तकनीकी दिग्गज शामिल हैं। हमारा समुदाय उपयोगकर्ताओं और योगदानकर्ताओं का एक स्थापित नेटवर्क है जो AI को उन्नत करने के लिए समर्पित है।

### संलग्न सहयोगी
- **गैर-लाभकारी संगठन**: जैसे AI2, नैतिक AI विकास को बढ़ावा देता है।
- **तकनीकी नेता**: Meta, Grammarly, और Intel जैसी संगठनों के साथ सहयोग हमारे नवाचार और गुणवत्ता के प्रति समर्पण को उजागर करता है।

## कंपनी संस्कृति
हगिंग फेस में, हम एक जीवंत और समावेशी कंपनी संस्कृति को बढ़ावा देने पर गर्व करते हैं जो सहयोग, रचनात्मकता और पारदर्शिता को महत्व देती है। हमारी टीम तकनीकी उन्नति के प्रति उत्साही है और AI पारिस्थितिकी तंत्र में सभी के लिए फायदेमंद ओपन-सोर्स टूल बनाने के लिए समर्पित है।

### करियर के अवसर
AI विकास में क्रांति लाने के हमारे मिशन में हमारे साथ शामिल हों! हम विभिन्न क्षेत्रों में प्रतिभा की हमेशा तलाश करते हैं। हमारे वर्तमान पदों को खोजें और एक गतिशील टीम का हिस्सा बनें जो सीमाओं को खींच रही है।

## संलग्न हों
- **AI ऐप्स का अन्वेषण करें**: हमारे क्यूरेटेड AI एप्लिकेशन को खोजकर मशीन लर्निंग में तुरंत कूदें।
- **साइन अप करें**: अपने प्रोफ़ाइल को बनाएँ और अपने स्वयं के ML प्रोजेक्ट्स पर निर्माण, साझा करने और सहयोग करना शुरू करें।

अधिक जानकारी के लिए, हमारी वेबसाइट पर जाएं और GitHub, Twitter, LinkedIn और Discord जैसे सोशल मीडिया प्लेटफार्मों पर जुड़ें।

---

**हगिंग फेस: AI समुदाय जो भविष्य का निर्माण कर रहा है।**