In [None]:
!pip install requests python-dotenv beautifulsoup4 openai


Collecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.1.1


In [None]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [None]:
# Initialize and constants

import os
from openai import OpenAI
from dotenv import load_dotenv

# Set environment variable directly (alternative to .env file)
os.environ['OPENAI_API_KEY'] = "sk......"

# Load environment variables
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Sanity check for API key
if api_key and api_key.startswith('sk-proj-') and len(api_key) > 10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")

# Model name and OpenAI client initialization
MODEL = 'gpt-4o-mini'
openai = OpenAI()


API key looks good so far


In [None]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Wikipedia page,
    extracting main content, title, and relevant links.
    """

    def __init__(self, url):
        self.url = url

        try:
            response = requests.get(url, headers=headers, timeout=10)
            response.raise_for_status()
        except requests.exceptions.RequestException as e:
            print(f"[ERROR] Request failed: {e}")
            self.body = ""
            self.text = ""
            self.links = []
            self.title = "No title found"
            return

        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')

        # Page title
        self.title = soup.title.string.strip() if soup.title else "No title found"

        # Extract from Wikipedia's main content section
        content_div = soup.find("div", id="mw-content-text")
        if content_div:
            # Remove irrelevant tags
            for tag in content_div(["script", "style", "img", "input", "table", "sup"]):
                tag.decompose()
            self.text = content_div.get_text(separator="\n", strip=True)
        else:
            self.text = ""

        # Collect and resolve relative links
        raw_links = [link.get('href') for link in soup.find_all('a') if link.get('href')]
        self.links = [urljoin(self.url, link) for link in raw_links if link.startswith('/wiki/')]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\n\nWebpage Contents:\n{self.text}\n\n"


In [None]:
wiki = Website("https://en.wikipedia.org/wiki/OpenAI")
print(wiki.get_contents())

Webpage Title:
OpenAI - Wikipedia

Webpage Contents:
Artificial intelligence research organization
Not to be confused with
OpenAL
,
OpenAPI
, or
Open-source artificial intelligence
.
OpenAI, Inc.
is an American
artificial intelligence
(AI) organization founded in December 2015 and headquartered in
San Francisco
, California. It aims to develop "safe and beneficial"
artificial general intelligence
(AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work".
As a leading organization in the ongoing
AI boom
,
OpenAI is known for the GPT family of
large language models
, the
DALL-E
series of
text-to-image models
, and a
text-to-video model
named
Sora
.
Its release of
ChatGPT
in November 2022 has been credited with catalyzing widespread interest in
generative AI
.
The organization has a complex corporate structure. As of April 2025, it is led by the
non-profit
OpenAI, Inc.,
registered in Delaware
, and has multiple for-profit subsidiaries

In [None]:
wiki.links

['https://en.wikipedia.org/wiki/Main_Page',
 'https://en.wikipedia.org/wiki/Wikipedia:Contents',
 'https://en.wikipedia.org/wiki/Portal:Current_events',
 'https://en.wikipedia.org/wiki/Special:Random',
 'https://en.wikipedia.org/wiki/Wikipedia:About',
 'https://en.wikipedia.org/wiki/Help:Contents',
 'https://en.wikipedia.org/wiki/Help:Introduction',
 'https://en.wikipedia.org/wiki/Wikipedia:Community_portal',
 'https://en.wikipedia.org/wiki/Special:RecentChanges',
 'https://en.wikipedia.org/wiki/Wikipedia:File_upload_wizard',
 'https://en.wikipedia.org/wiki/Special:SpecialPages',
 'https://en.wikipedia.org/wiki/Main_Page',
 'https://en.wikipedia.org/wiki/Special:Search',
 'https://en.wikipedia.org/wiki/Help:Introduction',
 'https://en.wikipedia.org/wiki/Special:MyContributions',
 'https://en.wikipedia.org/wiki/Special:MyTalk',
 'https://en.wikipedia.org/wiki/OpenAI',
 'https://en.wikipedia.org/wiki/Talk:OpenAI',
 'https://en.wikipedia.org/wiki/OpenAI',
 'https://en.wikipedia.org/wiki/O

In [None]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [None]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [None]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [None]:
print(get_links_user_prompt(wiki))

NameError: name 'wiki' is not defined

In [None]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [None]:
wiki = Website("https://en.wikipedia.org/wiki/OpenAI")
get_links("https://en.wikipedia.org/wiki/OpenAI")

{'links': [{'type': 'about page',
   'url': 'https://en.wikipedia.org/wiki/OpenAI'},
  {'type': 'products and applications',
   'url': 'https://en.wikipedia.org/wiki/Products_and_applications_of_OpenAI'},
  {'type': 'careers page', 'url': 'https://en.wikipedia.org/wiki/OpenAI'},
  {'type': 'company page', 'url': 'https://en.wikipedia.org/wiki/OpenAI'},
  {'type': 'removal of Sam Altman',
   'url': 'https://en.wikipedia.org/wiki/Removal_of_Sam_Altman_from_OpenAI'}]}

In [None]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [None]:
print(get_all_details("https://en.wikipedia.org/wiki/OpenAI"))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Edge
,
Microsoft 365
and other products.
On March 3, 2023,
Reid Hoffman
resigned from his board seat, citing a desire to avoid conflicts of interest with his investments in AI companies via
Greylock Partners
, and his co-founding of the AI startup
Inflection AI
. Hoffman remained on the board of Microsoft, a major investor in OpenAI.
On March 14, 2023, OpenAI released
GPT-4
, both as an API (with a waitlist) and as a feature of ChatGPT Plus.
Altman and Sutskever at
Tel Aviv University
in 2023
On May 22, 2023, Sam Altman, Greg Brockman and Ilya Sutskever posted recommendations for the governance of
superintelligence
.
They consider that superintelligence could happen within the next 10 years, allowing a "dramatically more prosperous future" and that "given the possibility of existential risk, we can't just be reactive". They propose creating an international watchdog organization similar to
IAEA
to oversee AI systems above

In [None]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [None]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [None]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [None]:
stream_brochure("IBM", "https://en.wikipedia.org/wiki/IBM")

Found links: {'links': [{'type': 'about page', 'url': 'https://en.wikipedia.org/wiki/IBM'}, {'type': 'history page', 'url': 'https://en.wikipedia.org/wiki/History_of_IBM'}, {'type': 'careers page', 'url': 'https://en.wikipedia.org/wiki/List_of_IBM_CEOs'}, {'type': 'company page', 'url': 'https://en.wikipedia.org/wiki/IBM_Consulting'}, {'type': 'products page', 'url': 'https://en.wikipedia.org/wiki/List_of_IBM_products'}]}


# IBM Brochure

## Company Overview

**International Business Machines Corporation (IBM)**, commonly known as **Big Blue**, is an esteemed American multinational technology corporation headquartered in Armonk, New York. Established in 1911 as the Computing-Tabulating-Recording Company and rebranded in 1924, IBM has played a pivotal role in the technology landscape, being a leader in server solutions, software, and computer services.

- **Founded**: 1911
- **Headquarters**: Armonk, New York, USA
- **Presence**: Over 175 countries
- **Industry**: Technology and Consulting
- **Ticker Symbol**: IBM (Publicly traded, part of the Dow Jones Industrial Average)

## Innovator in Technology

IBM is renowned for pioneering technological advancements that have shaped today's digital world. As the largest industrial research organization globally, IBM holds the record for generating the most U.S. patents from 1993 to 2021, with research facilities spread across a dozen countries. 

**Notable Innovations**:
- Automated Teller Machine (ATM)
- Dynamic Random-Access Memory (DRAM)
- Floppy Disk and Hard Disk Drive
- Relational Database and SQL Programming Language
- Universal Product Code (UPC) Barcode

With a shift towards cutting-edge fields, IBM is currently focused on quantum computing, artificial intelligence, and data infrastructure solutions.

## Company Culture

IBM promotes a unique and inclusive work environment characterized by respect, innovation, and community engagement. The company is committed to maintaining a culture of continuous improvement and values diverse perspectives, encouraging employees to think critically and creatively. The famous mantra "THINK," introduced by Thomas J. Watson, Sr., emphasizes the company's dedication to innovation and excellence.

### Employee Recognition

IBM's commitment to innovation has led to numerous accolades for its employees and alumni, including:
- 6 Nobel Prizes
- 6 Turing Awards

## Customers and Impact

IBM serves a diverse clientele ranging from small businesses to Fortune 500 companies. Leveraging its extensive range of products and services, IBM enables organizations to maximize their technological potential, streamline operations, and enhance decision-making capabilities. 

## Careers at IBM

IBM is always looking for fresh talent to join its ranks in various technical, business, and research roles. Working at IBM means being part of a global team dedicated to innovation, collaboration, and impact. Employees are encouraged to pursue continuous learning and take on challenges that push the boundaries of technology.

**Potential Opportunities Include**:
- Software Development
- IT Consulting
- Data Science
- Cloud Solutions
- Research & Development

## Join Us

Explore a career that not only offers competitive benefits and professional development but also contributes to making a genuine impact on the world through technology. 

---

For more information about IBM, our services, or our career opportunities, please visit [IBM Official Website](https://www.ibm.com).

*Discover the future with IBM – where innovation meets opportunity.*

In [None]:
stream_brochure("OpenAI", "https://en.wikipedia.org/wiki/OpenAI")

Found links: {'links': [{'type': 'about page', 'url': 'https://en.wikipedia.org/wiki/OpenAI'}, {'type': 'company page', 'url': 'https://en.wikipedia.org/wiki/Products_and_applications_of_OpenAI'}, {'type': 'careers page', 'url': 'https://en.wikipedia.org/wiki/ChatGPT'}]}


# OpenAI Company Brochure

---

## Welcome to OpenAI

**Headquarters:** San Francisco, California  
**Founded:** December 2015  

OpenAI is a pioneering artificial intelligence research organization committed to developing "safe and beneficial" artificial general intelligence (AGI). Our mission is to ensure that AGI benefits all of humanity by creating highly autonomous systems that outperform humans in most economically valuable tasks.

---

## Our Innovations

At the forefront of the AI revolution, OpenAI is renowned for its groundbreaking technologies:

- **GPT Family of Language Models:** Leading the way in natural language processing.
- **DALL-E Series:** Innovative text-to-image generation models.
- **Sora:** Our cutting-edge text-to-video model.

Since the launch of **ChatGPT** in November 2022, we have seen unprecedented global interest in generative AI technologies.

---

## Our Corporate Structure

OpenAI operates with a complex corporate framework:

- **Non-Profit Arm:** OpenAI, Inc. (registered in Delaware).
- **For-Profit Subsidiaries:** OpenAI Holdings, LLC and OpenAI Global, LLC.

With a significant investment of **$13 billion from Microsoft**, we collaborate closely and leverage Microsoft's cloud computing resources through Azure.

---

## Company Culture and Values

At OpenAI, we value:

- **Collaboration:** We believe in freely sharing our research and collaborating with other institutions to advance AI in a way that is beneficial to society.
- **Innovation:** We prioritize cutting-edge research while actively addressing AI safety and ethical concerns.
- **Diversity:** We foster an inclusive environment that encourages a wide range of perspectives in our pursuit of top-tier AI solutions.

---

## Join Our Team

OpenAI is continuously seeking passionate individuals who want to contribute to the future of AI. We offer enriching career opportunities in various fields, including:

- AI Research
- Software Engineering
- Product Management
- AI Safety and Ethics

If you are interested in making a meaningful impact in the realm of artificial intelligence, explore our current job openings and be a part of our mission.

---

## Connect with Us

To learn more about our research, innovations, and career opportunities, visit our website or reach out via our social media channels.

**Together, let’s shape the future of artificial intelligence for the betterment of all humanity.**

# **THE ABOVE WAS FOR TO CREATE A NORMAL COMPANY BROCHURE, BELOW IS TO GENERATE A BROCHURE FOR INVESTORS**