### BUSINESS CHALLENGE:

Creating a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits. We will be provided a company name and their primary website.

In [1]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
# Specify OpenAI model
MODEL = "gpt-5-nano"

openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        
        # Fetch all links from the website
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [31]:
test_url = "https://cnn.com"
site_obj = Website(test_url)

# Print links
#site_obj.links

## Step 1: Have GPT figure out which links are relevant

### Use a call to gpt-4.1-nano to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

In [6]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [7]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



In [8]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [11]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [26]:
# Get the most relevant links
get_links(test_url)

{'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'},
  {'type': 'company profile', 'url': 'https://edition.cnn.com/profiles'},
  {'type': 'leadership',
   'url': 'https://edition.cnn.com/profiles/cnn-leadership'},
  {'type': 'careers', 'url': 'https://careers.wbd.com/cnnjobs'}]}

## Step 2: make the brochure!

Assemble all the details into another prompt to GPT 4.1-nano

In [27]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    
    # Get relevant links identified by llm
    links = get_links(url)
    print("Found links:", links)
    
    # Navigate to each relevant link and fetch content of each page on the link
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [29]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [32]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters (to maintain cost)
    return user_prompt

In [33]:
brochure_user_prompt = get_brochure_user_prompt("CNN", test_url)

Found links: {'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}]}


In [22]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [34]:
create_brochure("CNN", test_url)

Found links: {'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'}, {'type': 'company profile', 'url': 'https://edition.cnn.com/profiles'}, {'type': 'leadership', 'url': 'https://edition.cnn.com/profiles/cnn-leadership'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}]}


# CNN Company Brochure

## About CNN

CNN (Cable News Network) is a premier global news organization committed to delivering breaking news, in-depth analysis, and comprehensive coverage of international and domestic affairs. As a trusted source of timely information, CNN serves millions of viewers worldwide across numerous digital platforms, including live TV, website, and mobile applications.

---

## Our Values

- **Accuracy & Integrity:** We prioritize factual reporting and journalistic ethics.
- **Inclusivity & Diversity:** We aim to present diverse perspectives and voices.
- **Innovation:** We continually adapt with cutting-edge technology to improve how news is delivered.
- **Responsiveness:** We value viewer feedback to enhance the quality and relevance of our content.

---

## Company Culture

At CNN, we foster a dynamic, collaborative, and innovative work environment. Our teams are passionate about journalism and committed to upholding the highest standards of integrity. We encourage creativity, continuous learning, and respectful engagement across all our departments. CNN values employee feedback and strives to create an inclusive culture that supports diversity and professional growth.

---

## Our Global Reach & Audience

CNN's global presence includes coverage of major topics such as politics, health, science, climate change, and entertainment. Whether breaking news from Ukraine, Gaza, or the latest on the US elections, CNN provides timely updates and in-depth analysis to an international audience. The platform also offers various multimedia content including videos, podcasts, and live TV to cater to diverse viewer preferences.

---

## Careers & Opportunities

Join CNN and become part of a world-class team dedicated to impactful journalism. We offer a variety of career opportunities for talented professionals in journalism, technology, production, digital media, and more. CNN promotes a culture of innovation and excellence, providing a stimulating environment for growth and development.

### Why Work with CNN?
- Be part of a global leader in news and media.
- Collaborate with passionate, talented colleagues.
- Contribute to impactful journalism that informs and influences millions.
- Enjoy ongoing professional development and career advancement opportunities.

---

## Connect With Us

- Sign in or create an account to personalize your news experience.
- Subscribe to newsletters or follow topics of interest.
- Engage with interactive content and participate in feedback surveys to shape future coverage.

---

**Discover the world with CNN — Your trusted source for news, analysis, and stories that matter.**

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [36]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [37]:
stream_brochure("CNN", test_url)

Found links: {'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'}, {'type': 'company page', 'url': 'https://edition.cnn.com/profiles'}, {'type': 'leadership page', 'url': 'https://edition.cnn.com/profiles/cnn-leadership'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}]}


# CNN: A Premier Source for Global News and Information

---

## About CNN

CNN (Cable News Network) is a leading global news organization dedicated to providing timely, relevant, and comprehensive coverage of world events. From breaking news to in-depth analysis, CNN keeps audiences informed and engaged on issues that matter most. With a commitment to journalistic integrity and innovation, CNN serves millions of viewers worldwide through TV, digital platforms, and multimedia content.

---

## Our Culture

At CNN, we foster a dynamic and inclusive environment that values feedback, innovation, and integrity. Our team is committed to delivering accurate news while embracing technological advancements to enhance storytelling. We prioritize transparency, diversity, and a proactive approach to journalism, making us a trusted name in media worldwide.

---

## Our Audience & Coverage

CNN caters to a diverse global audience, providing coverage across various sectors including:

- Politics
- Business and Economy
- Health and Science
- Entertainment and Lifestyle
- Travel and Culture
- Sports
- Climate and Environment
- Ongoing World Events such as the Ukraine-Russia War and Israel-Hamas War

Our content includes news articles, live updates, videos, in-depth analyses, and special features, ensuring comprehensive understanding of current events.

---

## Careers at CNN

Join our team and be part of a forward-thinking organization at the forefront of news media. CNN offers exciting career opportunities for professionals passionate about journalism, technology, media, and communications. We value innovation, teamwork, and a commitment to truth — qualities that help us continue delivering impactful news worldwide.

**Working at CNN means:**

- Contributing to a global news legacy
- Collaborating with industry leaders
- Embracing a culture of learning and growth
- Engaging in meaningful work that informs and influences

---

## Why Choose CNN?

- Trusted global news organization
- Cutting-edge digital and multimedia presence
- Inclusive and progressive workplace environment
- Opportunities for growth and professional development

Join CNN and be part of a legacy that informs, inspires, and impacts the world.

---

### Contact and Learn More

Visit our website and follow us on social media to stay updated with the latest careers, news, and innovations at CNN.

---

**© CNN**  
Your trusted source for news and information.