This is a program to create brochure from a company website in several languages. I will explain several function that involved in this program
-

In [12]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [13]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [14]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

*First Step*
Get relevant link that can be included in the brochure, using OPENAI

In [15]:
# SYSTEM PROMPT
# one shot prompting on this system prompt
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [16]:
# USER PROMPT function
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [17]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [None]:
get_links("https://www.cmu.edu/iii/")

*2nd STEP*
Get all detail from link that we've got

In [21]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [None]:
*3rd STEP*
call OPENAI to create brochure from the detail

In [22]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [28]:
def get_brochure_user_prompt(company_name, url, language):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"\n\nBecause the audience in come from different region, you must create this brochure content in {language} languages, so the audience from that region can understand\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [None]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co", "English")

In [30]:
# ordinary prompt
def create_brochure(company_name, url, language):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url, language)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [31]:
# with stream ON
def stream_brochure(company_name, url, language):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url, language)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [36]:
stream_brochure("MIIPS", "https://www.cmu.edu/iii/graduate-programs/miips/index.html", "Tagalog")

Found links: {'links': [{'type': 'about page', 'url': 'https://www.cmu.edu/iii/about/news/index.html'}, {'type': 'graduate programs page', 'url': 'https://www.cmu.edu/iii/graduate-programs/miips.html#toolkit'}, {'type': 'admissions page', 'url': 'https://www.cmu.edu/iii/admissions/miips.html#current-students'}, {'type': 'careers page', 'url': 'https://www.cmu.edu/iii/about/news/2023/alumni-spotlight-product-manager-internal-systems.html'}, {'type': 'careers page', 'url': 'https://www.cmu.edu/iii/about/news/2022/alumni-spotlight-manchit-rajani-service-design-banking.html'}, {'type': 'news page', 'url': 'https://www.cmu.edu/iii/about/news/2023/changemaker-design-thinking-pittsburgh.html'}, {'type': 'news page', 'url': 'https://www.cmu.edu/iii/about/news/2023/healthcare-innovation-commercialization.html'}]}



# MIIPS Brochure

## Tungkol sa MIIPS
Ang **Master of Integrated Innovation for Products & Services (MIIPS)** mula sa **Integrated Innovation Institute ng Carnegie Mellon University** ay isang programa na naglalayong sanayin ang susunod na henerasyon ng mga inobador, disruptor, at mga tagapagbago sa mundo. Sa programang ito, matututuhan mong harapin ang pinakamalaking hamon ng industriya at lipunan kasabay ng pagsasanib ng teknolohiya at sangkatauhan.

## Kultura ng Kumpanya
Sa MIIPS, ang inobasyon ay nagmumula sa mga tao, hindi sa mga produkto. Ang bawat estudyante ay nakikipagtulungan kasama ang mga pinakamahusay na inhinyero, propesyonal sa negosyo, at mga designer upang lumampas sa kanilang mga disiplina at makipagtulungan sa iba’t ibang larangan. Ang aming layunin ay bumuo ng mga one-of-a-kind solutions na may positibong epekto sa komunidad.

## Mga Kliyente
Ang mga graduate ng MIIPS ay tinatanggap sa mga nangungunang kumpanya sa iba’t ibang industriya, kabilang ang:
- Disney
- JP Morgan Chase & Co
- Canon
- Honda
- Volvo

## Mga Karera at Oportunidad
Ang MIIPS ay nag-aalok ng dalawang format ng programa:
- **9 Buwan na Format:** Para sa mga may karanasang propesyonal, nagsisimula ng Agosto at nagtatapos ng Mayo.
- **16 Buwan na Format:** Para sa mga nagnanais ng karagdagang karanasan sa trabaho, nagsisimula ng Agosto at may summer internship.

### Mga Resulta sa Karera
- **Median Base Salary:** $112,000
- **88% ng mga nagtapos ay umamin ng trabaho sa loob ng anim na buwan pagkatapos ng graduation.**

### Mga Posibleng Posisyon
- Product Manager
- UX Designer
- Innovation Engineer
- Mechanical Design Engineer
- Senior Product Analyst

## Mag-apply Ngayon!
Huwag palampasin ang pagkakataong maging bahagi ng makabagong programang ito. Magsimula ng iyong aplikasyon para sa MIIPS bago ang **January 20, 2025** at maging bahagi ng isang malakas na network ng mga inobador na may kakayahang magdulot ng tunay na pagbabago.

### Makipag-ugnayan
Ikaw ba ay handa nang itulak ang hangganan ng mga posibilidad? Kumunekta sa kasalukuyang MIIPS na estudyante, dumalo sa isang virtual info session, o humiling ng higit pang impormasyon sa aming website!

**Integrated Innovation Institute - Carnegie Mellon University**

