# BrochureCraft Pro

This tool allows users to generate professional, humorous, or concise brochures for companies by analyzing the contents of their websites. Utilizing the paid AI-powered GPT model `gpt-4o-mini`, the script extracts and formats company information into user-friendly Markdown content for prospective customers, investors, and recruits.

In [1]:
# Importing necessary modules
import os               # Provides functions for interacting with the operating system
import requests                # For making HTTP requests to fetch web page data
import json                    # For handling JSON data
from typing import List         # Type hinting for list of elements
from dotenv import load_dotenv  # To load environment variables from .env file
from openai import OpenAI        # Interface for OpenAI API
from bs4 import BeautifulSoup   # HTML parser for scraping and extracting data from web pages
from IPython.display import Markdown, display, update_display  # Provides enhanced display for Jupyter/IPython environments

In [2]:
# constants
MODEL = "gpt-4o-mini"  # Define the AI model to be used (e.g., gpt-4o-mini)

HEADERS = {  # Default headers for web requests
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

In [3]:
# Setting up environment 
load_dotenv()                        # Load environment variables from .env file
openai = OpenAI()                    # Initialize OpenAI client

# Initialize OpenAI client with API key
openai.api_key = os.getenv('OPENAI_API_KEY')

In [4]:
# Website class to handle webpage content and link extraction
class Website:
    _url: str = None  # Private attribute for URL
    title: str = None  # Title of the webpage
    content: str = None  # Raw content of the webpage
    links: List[str] = []  # List to store hyperlinks found on the webpage
    text: str = None  # Cleaned text content of the webpage

    @property
    def url(self) -> str:
        """Property for getting the URL."""
        return self._url

    @url.setter
    def url(self, value: str) -> None:
        """Property setter for setting and validating the URL."""
        if not value or value.strip() == "":
            raise ValueError("Invalid URL: URL cannot be None or empty")
        self._url = value.strip()

    def __init__(self, url, headers=None) -> None:
        """
        Initialize the Website class by fetching and parsing the web page.

        Args:
            url (str): The URL of the website to fetch.
            headers (dict, optional): Headers to be used in the request. Defaults to None.
        """
        self.url = url  # Set the provided URL
        the_headers = headers if headers else HEADERS  # Use provided headers or default headers
        
        try:
            response = requests.get(url, headers=the_headers)  # Make a GET request to the URL
            response.raise_for_status()  # Raise an error for bad status codes
            response.encoding = response.apparent_encoding  # Automatically detect encoding
            self.content = response.content  # Store the raw HTML content
            
            soup = BeautifulSoup(self.content, 'html.parser')  # Parse HTML using BeautifulSoup
            self.title = soup.title.string if soup.title else "No title found"  # Extract webpage title
            
            # Clean up unnecessary elements (scripts, styles, images, inputs)
            if soup.body:
                for irrelevant in soup.body.find_all(["script", "style", "img", "input"]):
                    irrelevant.decompose()
                self.text = soup.body.get_text(separator="\n", strip=True)  # Extract text content
            else:
                self.text = "No body content found"  # Fallback if no body content is available

            # Extract all links from anchor tags
            self.links = [a["href"] for a in soup.find_all("a") if a.has_attr("href")]

        except requests.RequestException as e:
            # Handle errors during request
            print(f"Failed to fetch {self.url}: {e}")

    def get_contents(self) -> str:
        """
        Returns a formatted string containing the title and text content of the webpage.

        Returns:
            str: Formatted webpage content with title and text.
        """
        return f"- Webpage Title:\n{self.title}\n- Webpage Contents:\n{self.text}\n\n"


In [5]:
# Prompt used by the AI to filter relevant links for a brochure
links_system_prompt = """You are provided with a list of links from a webpage. 
Your task is to evaluate which links are most relevant for inclusion in a company brochure, such as links to an About page, Company page, or Careers/Jobs pages.

Your response should be in JSON format, exactly as shown in the example below, without any introduction, summary, or additional text:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

# User prompt construction for links filtering
def links_user_prompt(website):
    """
    Constructs a user prompt for filtering relevant links from a website for a brochure.

    Args:
        website (Website): The Website object representing the webpage to process.

    Returns:
        str: User prompt in string format.
    """
    user_prompt = f"Here is the list of links from the website {website.url}:\n"
    user_prompt += "Please identify the links that are relevant for a company brochure and respond with the full HTTPS URLs in JSON format. \n"
    user_prompt += "Exclude links such as Terms of Service, Privacy Policies, and email links.\n"
    
    if not website.links:
        user_prompt += "Unfortunately, no relevant links were found for creating a brochure about the company. Please respond with an empty JSON object.\n"
        return user_prompt

    user_prompt += "Here are the links available on the website (some may be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [6]:
# Function to get links using the AI model
def get_links(url):
    """
    Retrieves and processes links using the AI model.

    Args:
        url (str): The URL of the website.

    Returns:
        dict: JSON response containing filtered relevant links.
    """
    website = Website(url)  # Create a Website instance for the given URL
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": links_system_prompt},
            {"role": "user", "content": links_user_prompt(website)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [7]:
# Function to get all details (landing page and relevant links) for a given URL
def get_all_details(url):
    """
    Retrieves details (landing page and relevant links) for a given URL.

    Args:
        url (str): The URL of the website.

    Returns:
        str: Combined content of the landing page and all relevant links.
    """
    result = "Landing page:\n"
    result += Website(url).get_contents()  # Get contents of the landing page
    links = get_links(url)  # Get relevant links using the AI model
    # print("Found links:", links)  # Debugging
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()  # Get contents for each relevant link
    return result

In [8]:
# System prompt for brochure generation
brochure_system_prompt = """You are an AI assistant tasked with analyzing the contents of several relevant pages from a company website and crafting a concise, professional brochure aimed at prospective customers, investors, and recruits. Your response should be formatted in Markdown and include the following information, if available:  
- Details about the company's culture  
- Information about its customers  
- Careers or job opportunities  

Focus on creating a compelling and informative tone that highlights the company's strengths and vision.
"""

# # Alternatively, below there is a more humorous brochure - (incorporate 'tone'):
# brochure_system_prompt = """You are an AI assistant tasked with analyzing the contents of several relevant pages from a company website and creating a humorous, entertaining, and lighthearted brochure for prospective customers, investors, and recruits. Your response should be formatted in Markdown and include the following information, if available:  
# - Witty insights about the company's culture  
# - Fun facts about its customers  
# - Playful descriptions of careers or job opportunities  

# Infuse your response with a cheeky tone while ensuring it remains informative and engaging.
# """

# User prompt for generating a company brochure
def brochure_user_prompt(company_name, url):
    """
    Constructs a user prompt for generating a company brochure.

    Args:
        company_name (str): The name of the company.
        url (str): The URL of the company's website.

    Returns:
        str: User prompt in string format, with content limited to 5,000 characters.
    """
    user_prompt = f"You are analyzing a company called {company_name}.\n"
    user_prompt += "Below is the content from its landing page and other relevant pages. Using this information, create a concise brochure about the company in Markdown format.\n"
    user_prompt += get_all_details(url)  # Generate details for the entire URL
    user_prompt = user_prompt[:5_000]  # Truncate if more than 5000 characters (to save credits consumption)
    return user_prompt

In [9]:
# Function to create a brochure using the AI model
def create_brochure(company_name, url):
    """
    Generates a brochure for a company using the AI model.

    Args:
        company_name (str): The name of the company.
        url (str): The URL of the company's website.
    """
    try:
        response = openai.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": brochure_system_prompt},
                {"role": "user", "content": brochure_user_prompt(company_name, url)}
            ]
        )
        result = response.choices[0].message.content
        display(Markdown(result))  # Display the brochure using display Markdown rendering
    except Exception as e:
        print(f"An error occurred while generating the brochure for {company_name}: {e}")

In [10]:
# Function to stream the creation of a brochure dynamically
def stream_brochure(company_name, url):
    """
    Dynamically streams the creation of a brochure using the AI model.

    Args:
        company_name (str): The name of the company.
        url (str): The URL of the company's website.
    """
    try:   
        stream = openai.chat.completions.create(
            model=MODEL,
            messages=[{"role": "system", "content": brochure_system_prompt},
                      {"role": "user", "content": brochure_user_prompt(company_name, url)}],
            stream=True  # Enable streaming mode for dynamic output
        )

        response = ""
        display_handle = display(Markdown(""), display_id=True)  # Display placeholder Markdown in Jupyter/IPython
        for chunk in stream:
            response += chunk.choices[0].delta.content or ""
            response = response.replace("```","").replace("markdown", "")  # Clean up Markdown formatting
            update_display(Markdown(response), display_id=display_handle.display_id)  # Update display dynamically
    except Exception as e:
        print(f"An error occurred while streaming the brochure for {company_name}: {e}")

## Test the Streaming Brochure of a certain company

In [11]:
url = "https://appbrewery.com/"
company_name = "App Brewery"

# # Test create_brochure()
# create_brochure(company_name, url)

# Test stream_brochure()
stream_brochure(company_name, url)


# Welcome to The App Brewery

## Who We Are
The App Brewery is London’s highest-rated programming bootcamp, providing world-class education since 2015. With over 1.4 million students trained globally, we are dedicated to empowering the next generation of coders with the skills to thrive in the technology industry.

## Our Courses
We offer a range of industry-leading courses tailored for aspiring developers looking to master essential programming skills. Our top-rated courses include:

- **100 Days of Python Coding Bootcamp**  
  Master Python through 100 engaging projects, covering data science, automation, web development, and more.
  
- **Complete Web Development Bootcamp**  
  Transform into a Full-Stack Web Developer with one comprehensive course covering HTML, CSS, JavaScript, Node, React, MongoDB, and more.
  
- **iOS Development Bootcamp**  
  Begin your journey to becoming an iOS App Developer with an updated curriculum focusing on Swift and SwiftUI.
  
- **Flutter Crossplatform Development**  
  Learn Flutter from scratch in collaboration with the Google Flutter team, and create beautiful cross-platform apps.

## Our Culture
At The App Brewery, we foster a culture of innovation, collaboration, and continuous improvement. Our supportive environment encourages both students and instructors to engage with cutting-edge technologies and apply their skills in real-world scenarios. We pride ourselves on the success of our graduates, many of whom have secured positions with top companies like Amazon.

## Our Customers
Our diverse student body ranges from beginners to seasoned professionals seeking to upskill. We also partner with organizations to create bespoke training solutions that equip teams with the latest technology insights.

## Careers at The App Brewery
We believe in investing in our team as much as we do in our students. We are always on the lookout for passionate individuals who share our vision of empowering people through technology. If you are eager to make an impact, check our careers page for current job openings.

---

Join us at The App Brewery and take the first step toward a brighter future in tech!
Learn more at [The App Brewery](https://www.appbrewery.co.uk)


This brochure captures essential information about The App Brewery, showcasing its mission, offerings, culture, and career opportunities in a professional and engaging manner.