# Company Brochure Generation Tool

This tool allows users to generate professional, humorous, or concise brochures for companies by analyzing the contents of their websites. Utilizing the free, open-source AI-powered Ollama model `llama3.2`, the script extracts and formats company information into user-friendly Markdown content for prospective customers, investors, and recruits.

In [1]:
# Importing necessary modules
import requests                # For making HTTP requests to fetch web page data
import json                    # For handling JSON data
import ollama                  # Import for interacting with Ollama AI model
from typing import List         # Type hinting for list of elements
from bs4 import BeautifulSoup   # HTML parser for scraping and extracting data from web pages
from IPython.display import Markdown, display, update_display  # For rich display of content in Jupyter/IPython environments

In [2]:
# constants
MODEL = "llama3.2"  # Define the AI model to be used (e.g., Llama 3.2)

HEADERS = {  # Default headers for web requests
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

In [3]:
# Website class to handle webpage content and link extraction
class Website:
    _url: str = None  # Private attribute for URL
    title: str = None  # Title of the webpage
    content: str = None  # Raw content of the webpage
    links: List[str] = []  # List to store hyperlinks found on the webpage
    text: str = None  # Cleaned text content of the webpage

    @property
    def url(self) -> str:
        """Property for getting the URL."""
        return self._url

    @url.setter
    def url(self, value: str) -> None:
        """Property setter for setting and validating the URL."""
        if not value or value.strip() == "":
            raise ValueError("Invalid URL: URL cannot be None or empty")
        self._url = value.strip()

    def __init__(self, url, headers=None) -> None:
        """
        Initialize the Website class by fetching and parsing the web page.

        Args:
            url (str): The URL of the website to fetch.
            headers (dict, optional): Headers to be used in the request. Defaults to None.
        """
        self.url = url  # Set the provided URL
        the_headers = headers if headers else HEADERS  # Use provided headers or default headers
        
        try:
            response = requests.get(url, headers=the_headers)  # Make a GET request to the URL
            response.raise_for_status()  # Raise an error for bad status codes
            response.encoding = response.apparent_encoding  # Automatically detect encoding
            self.content = response.content  # Store the raw HTML content
            
            soup = BeautifulSoup(self.content, 'html.parser')  # Parse HTML using BeautifulSoup
            self.title = soup.title.string if soup.title else "No title found"  # Extract webpage title
            
            # Clean up unnecessary elements (scripts, styles, images, inputs)
            if soup.body:
                for irrelevant in soup.body.find_all(["script", "style", "img", "input"]):
                    irrelevant.decompose()
                self.text = soup.body.get_text(separator="\n", strip=True)  # Extract text content
            else:
                self.text = "No body content found"  # Fallback if no body content is available

            # Extract all links from anchor tags
            self.links = [a["href"] for a in soup.find_all("a") if a.has_attr("href")]

        except requests.RequestException as e:
            # Handle errors during request
            print(f"Failed to fetch {self.url}: {e}")

    def get_contents(self) -> str:
        """
        Returns a formatted string containing the title and text content of the webpage.

        Returns:
            str: Formatted webpage content with title and text.
        """
        return f"- Webpage Title:\n{self.title}\n- Webpage Contents:\n{self.text}\n\n"


In [4]:
# Prompt used by the AI to filter relevant links for a brochure
links_system_prompt = """You are provided with a list of links from a webpage. 
Your task is to evaluate which links are most relevant for inclusion in a company brochure, such as links to an About page, Company page, or Careers/Jobs pages.

Your response should be in JSON format, exactly as shown in the example below, without any introduction, summary, or additional text:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

# User prompt construction for links filtering
def links_user_prompt(website):
    """
    Constructs a user prompt for filtering relevant links from a website for a brochure.

    Args:
        website (Website): The Website object representing the webpage to process.

    Returns:
        str: User prompt in string format.
    """
    user_prompt = f"Here is the list of links from the website {website.url}:\n"
    user_prompt += "Please identify the links that are relevant for a company brochure and respond with the full HTTPS URLs in JSON format. \n"
    user_prompt += "Exclude links such as Terms of Service, Privacy Policies, and email links.\n"

    if not website.links:
        user_prompt += "Unfortunately, no relevant links were found for creating a brochure about the company. Please respond with an empty JSON object.\n"
        return user_prompt

    user_prompt += "Here are the links available on the website (some may be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [5]:
# Function to get links using the AI model
def get_links(url):
    """
    Retrieves and processes links using the AI model.

    Args:
        url (str): The URL of the website.

    Returns:
        dict: JSON response containing filtered relevant links.
    """
    website = Website(url)  # Create a Website instance for the given URL
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": links_system_prompt},
            {"role": "user", "content": links_user_prompt(website)}
        ]
    )
    result = response["message"]["content"]
    return json.loads(result)

In [6]:
# Function to get all details (landing page and relevant links) for a given URL
def get_all_details(url):
    """
    Retrieves details (landing page and relevant links) for a given URL.

    Args:
        url (str): The URL of the website.

    Returns:
        str: Combined content of the landing page and all relevant links.
    """
    result = "Landing page:\n"
    result += Website(url).get_contents()  # Get contents of the landing page
    links = get_links(url)  # Get relevant links using the AI model
    # print("Found links:", links)  # Debugging
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()  # Get contents for each relevant link
    return result

In [7]:
# System prompt for brochure generation
brochure_system_prompt = """You are an AI assistant tasked with analyzing the contents of several relevant pages from a company website and crafting a concise, professional brochure aimed at prospective customers, investors, and recruits. Your response should be formatted in Markdown and include the following information, if available:  
- Details about the company's culture  
- Information about its customers  
- Careers or job opportunities  

Focus on creating a compelling and informative tone that highlights the company's strengths and vision.
"""

# # Alternatively, below there is a more humorous brochure - (incorporate 'tone'):
# brochure_system_prompt = """You are an AI assistant tasked with analyzing the contents of several relevant pages from a company website and creating a humorous, entertaining, and lighthearted brochure for prospective customers, investors, and recruits. Your response should be formatted in Markdown and include the following information, if available:  
# - Witty insights about the company's culture  
# - Fun facts about its customers  
# - Playful descriptions of careers or job opportunities  

# Infuse your response with a cheeky tone while ensuring it remains informative and engaging.
# """

# User prompt for generating a company brochure
def brochure_user_prompt(company_name, url):
    """
    Constructs a user prompt for generating a company brochure.

    Args:
        company_name (str): The name of the company.
        url (str): The URL of the company's website.

    Returns:
        str: User prompt in string format, with content limited to 5,000 characters.
    """
    user_prompt = f"You are analyzing a company called {company_name}.\n"
    user_prompt += "Below is the content from its landing page and other relevant pages. Using this information, create a concise brochure about the company in Markdown format.\n"
    user_prompt += get_all_details(url)  # Generate details for the entire URL
    user_prompt = user_prompt[:5_000]  # Truncate if more than 5000 characters
    return user_prompt

In [8]:
# Function to create a brochure using the AI model
def create_brochure(company_name, url):
    """
    Generates a brochure for a company using the AI model.

    Args:
        company_name (str): The name of the company.
        url (str): The URL of the company's website.
    """
    try:
        response = ollama.chat(
            model=MODEL,
            messages=[
                {"role": "system", "content": brochure_system_prompt},
                {"role": "user", "content": brochure_user_prompt(company_name, url)}
            ]
        )
        result = response["message"]["content"]
        display(Markdown(result))  # Display the brochure using display Markdown rendering
    except Exception as e:
        print(f"An error occurred while generating the brochure for {company_name}: {e}")

In [9]:
# Function to stream the creation of a brochure dynamically
def stream_brochure(company_name, url):
    """
    Dynamically streams the creation of a brochure using the AI model.

    Args:
        company_name (str): The name of the company.
        url (str): The URL of the company's website.
    """
    try:   
        stream = ollama.chat(
            model=MODEL,
            messages=[{"role": "system", "content": brochure_system_prompt},
                      {"role": "user", "content": brochure_user_prompt(company_name, url)}],
            stream=True  # Enable streaming mode for dynamic output
        )

        response = ""
        display_handle = display(Markdown(""), display_id=True)  # Display placeholder Markdown in Jupyter/IPython
        for chunk in stream:
            response += chunk["message"]["content"] or ''
            response = response.replace("```","").replace("markdown", "")  # Clean up Markdown formatting
            update_display(Markdown(response), display_id=display_handle.display_id)  # Update display dynamically
    except Exception as e:
        print(f"An error occurred while streaming the brochure for {company_name}: {e}")

## Test the Streaming Brochure of a certain company

In [10]:
url = "https://appbrewery.com/"
company_name = "App Brewery"

# # Testing create_brochure()
# create_brochure(company_name, url)

# Testing stream_brochure()
stream_brochure(company_name, url)

**The App Brewery Brochure**
=====================================

**Welcome to The App Brewery**

We are London's highest rated programming bootcamp since 2015, teaching over 1.4 million students worldwide. Our mission is to provide cutting-edge education in software development, empowering individuals to succeed in the tech industry.

**Our Culture**
---------------

At The App Brewery, we value innovation, teamwork, and continuous learning. We foster a collaborative environment that encourages experimentation and creativity. Our team of experienced instructors and mentors are dedicated to helping you achieve your career goals.

**Who We Help**
----------------

We cater to individuals from all walks of life, providing training in:

* Python programming
* Web development
* iOS development
* Flutter cross-platform development

Our courses are designed to be engaging, interactive, and effective. With a focus on hands-on learning, you'll build projects, complete with step-by-step video tutorials, to develop practical skills.

**Success Stories**
-------------------

Our graduates have achieved remarkable success:

* Evan Templeton: "Taught myself with the help of @yu_angela while delivering for Amazon! Then got hired by Amazon!"
* Martin Chammah: "a year and a half ago I started 's full stack course on udemy. This Monday I start a new job as a developer."
* Cosmin Anghel: "Angela also helped me to get my first iOS job 2 years ago! Thanks @yu_angela"

**Bespoke Courses**
-----------------

We offer bespoke courses for companies with cutting-edge technologies. Our experience in creating customized solutions has helped 18-hour course building, adding 32,735 new members to the Flutter discord channel.

**Get Started Today!**
------------------------

Join The App Brewery community today and embark on a journey to become a skilled software developer. With our affordable courses starting at $19, you can get started with:

* 100 Days of Python Coding Bootcamp
* Complete Web Development Bootcamp
* iOS Development Bootcamp

Don't miss out on this opportunity to transform your career. Enroll now and take the first step towards success!

**Contact Us**
-----------------

Stay in touch with us:

* Email: [info@appbrewery.com](mailto:info@appbrewery.com)
* Phone: +44 (0)20 1234 5678
* Website: appbrewery.com

Join our community and become a part of The App Brewery family!