## üìÑ Digital Brochure Generator

An automated solution designed to generate high-impact corporate brochures from minimal data. Simply provide the company name and website, and the tool creates informative material tailored to different strategic audiences.

#### üöÄ Project Description
This tool streamlines the creation of marketing and sales materials. Using the company website as the primary source, it extracts, synthesizes, and structures key information to generate customized brochures for three main profiles:
* **Potential Clients:** Focus on products, services, and value proposition.

* **Investors:** Focus on metrics, vision, market, and business stability.

* **Potential Candidates (Recruiting):** Focus on culture, benefits, and mission.

#### üõ† Inputs and Outputs
* **Input:** Company name and website URL.
* **Output:** Structured digital brochure (PDF/Markdown).

In [132]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [133]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
openai = OpenAI()

API key looks good so far


In [None]:
# List the available models
models = openai.models.list()

for model in models:
    print(model.id)

In [134]:
MODEL = "gpt-4.1-mini"
SITE = "HuggingFace"
URL = "https://huggingface.co"

In [135]:
# Create this Website object from the given url using the BeautifulSoup library
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to scrape and parse website content.
    """
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            # Remove irrelevant elements like scripts, styles, and images to reduce token usage
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            # Extract clean text
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        
        links = [link.get("href") for link in soup.find_all("a")]
        self.links = [link for link in links if link]
        
    def get_contents(self):
        return f"Website({self.url}, {self.title}, {self.text}, {self.links})"

#### First step: This is where the LLM decides which links are relevant to the brochure

In [None]:
# System Prompt using One-shot Prompting
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, or a Company page, or Careers/Jobs pages.

You should respond in JSON as in this example:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [136]:
# System Prompt using Multi-shot (Few-Shot) Prompting
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, Company page, Careers/Jobs, or specific Product pages.

You should respond in JSON. Here are examples of how to handle different scenarios:

Example 1 (Corporate Site):
Input Links: ["/home", "/about-us", "/blog/article-1", "/careers", "/privacy-policy", "/contact"]
Response:
{
    "links": [
        {"type": "about page", "url": "https://company.com/about-us"},
        {"type": "careers page", "url": "https://company.com/careers"},
        {"type": "contact page", "url": "https://company.com/contact"}
    ]
}

Example 2 (E-commerce/Product Focus):
Input Links: ["/", "/shop/men", "/shop/women", "/terms", "/returns", "/our-story"]
Response:
{
    "links": [
        {"type": "about page", "url": "https://shop.com/our-story"},
        {"type": "product category", "url": "https://shop.com/shop/men"},
        {"type": "product category", "url": "https://shop.com/shop/women"}
    ]
}

Example 3 (Service Provider):
Input Links: ["/services/consulting", "/team", "/login", "/signup", "/legal"]
Response:
{
    "links": [
        {"type": "services page", "url": "https://service.com/services/consulting"},
        {"type": "team page", "url": "https://service.com/team"}
    ]
}
"""

In [137]:
def get_links_user_prompt(url):
    site = Website(url)
    user_prompt = f"""
    Here is the list of links on the website: {site.url}
    Please decide which of these are relevant web links for a brochure about the company,
    respond with the full https URL in JSON format.
    Do not include Terms of Service, Privacy, email links.
    
    Links (some might be relative links):
    {site.links}
    """
    return user_prompt

In [None]:
site = Website(URL)
print(site.get_contents())
site.links

In [None]:
print(get_links_user_prompt(URL))

In [139]:
def get_links(url):
    """
    Uses the LLM to analyze and filter relevant links from the website.
    """
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [None]:
get_links(URL)

#### Second step: Creating the brochure!

This section compiles the content from all selected pages and generates the final brochure

In [140]:
def get_all_details(url):
    """
    Aggregates content from the landing page and all identified relevant sub-pages.
    """
    result = "Landing Page:\n"
    result += Website(url).get_contents()
    result += "\n\nRelevant Links:\n"
    relevant_links = get_links(url)
    # Iterate through each relevant link, scrape it, and append its content
    for link in relevant_links['links']:
        result += f"\n\n### Link: {link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [None]:
print(get_all_details(URL))

In [126]:
# System prompt: Instructs the LLM to write the final marketing brochure
brochure_system_prompt = """
You are a top-tier corporate marketing consultant designed to create high-impact executive brochures.
Analyze the provided company website contents and generate a professional, polished brochure targeting prospective clients, investors, and top talent.

**Guidelines:**
- **Tone:** Formal, persuasive, corporate, and trustworthy. Avoid casual language; sound like a Fortune 500 company.
- **Format:** Clean Markdown (no code blocks).
- **Visuals:** You MUST use relevant professional emojis/icons (e.g., üöÄ, üè¢, üíº, üåü, üìà, ü§ù) at the start of every section header and for key bullet points to enhance readability and visual appeal.
- **Structure:**
  - üè¢ **Company Overview:** Mission and vision.
  - üöÄ **Key Offerings:** Products/services and value proposition.
  - üìà **For Investors:** Market position, metrics, and stability.
  - ü§ù **Culture & Careers:** Benefits and environment for prospective recruits.
"""

In [141]:
# System Prompt for Structured Corporate Brochure
brochure_system_prompt = """
You are a top-tier corporate marketing consultant designed to create high-impact executive brochures.
Analyze the provided company website contents and generate a professional brochure.

**Formatting Constraints:**
1. Use professional emojis (üöÄ, üè¢, üí°, ü§ù) for headers.
2. Output strictly in Markdown.
3. Do not include code blocks.

**Required Sections Structure:**
You must organize the content into exactly these four sections:

1. üè¢ **Executive Summary & Mission**
   - Synthesize the company's core purpose and vision.
   
2. üöÄ **Key Solutions & Value Proposition**
   - Detail the main products or services.
   - Explain why customers choose them (USPs).

3. üìä **Market & Investor Data**
   - Focus on growth, market position, and stability.
   - Mention partners or key clients if available.

4. ü§ù **Talent & Culture**
   - Describe the work environment and benefits for potential recruits.
"""

In [142]:
def get_brochure_user_prompt(company_name, url):
    # Build the prompt with the aggregated context from all pages
    user_prompt = f"""
    You are looking at a company called: {company_name}
    Here are the contents of its landing page and other relevant pages;
    use this information to build a short brochure of the company in markdown without code blocks.
    
    {get_all_details(url)}
    """
    # Truncate to avoid exceeding context window limits (simple safety check)
    return user_prompt[:50000]

In [143]:
def create_brochure(company_name, url):
    """
    Generates the brochure (non-streaming version).
    """
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [None]:
create_brochure(SITE, URL)

In [None]:
def stream_brochure(company_name, url):
    """
    Generates the brochure with streaming enabled for a better user experience.
    """
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        stream=True # Enable streaming
    )
    
    response = ""
    # Create a placeholder for the display
    display_handle = display(Markdown(""), display_id=True)
    
    # Update the display in real-time as chunks arrive
    for chunk in stream:
        if chunk.choices[0].delta.content:
            response += chunk.choices[0].delta.content
            update_display(Markdown(response), display_id=display_handle.display_id)

In [None]:
stream_brochure(SITE, URL)

#### Additional Step: Brochure Translation

This is the new logic. We need a function that takes the text generated in step 2 and makes an additional call to the LLM to translate it.

In [145]:
def translate_brochure(brochure_text, target_language="English"):
    """
    Translates the generated brochure into the target language using a fresh LLM call.
    """
    translation_system_prompt = f"""
    You are a professional translator specializing in corporate marketing materials.
    Translate the following markdown text into {target_language}.
    
    IMPORTANT:
    - Maintain all original Markdown formatting (headers, bolding, lists).
    - Keep all emojis/icons exactly as they are.
    - Ensure the tone remains professional and persuasive in the target language.
    """
    
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": translation_system_prompt},
            {"role": "user", "content": brochure_text}
        ],
    )
    
    translated_content = response.choices[0].message.content
    return translated_content

In [149]:
def create_brochure_text(company_name, url):
    """
    Generates the brochure text (without displaying it immediately) so we can pipeline it.
    """
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
    )
    return response.choices[0].message.content

In [None]:
# --- MAIN EXECUTION FLOW ---

# 1. Generate the original brochure (Spanish/Original Language)
print("Generating original brochure...")
original_brochure = create_brochure_text(SITE, URL)
display(Markdown("## üá∫üá∏ Original Brochure"))
display(Markdown(original_brochure))

print("\n------------------------------------------------\n")

# 2. Translate the brochure (Third LLM Call)
print("Translating brochure to Spanish...")
english_brochure = translate_brochure(original_brochure, target_language="Spanish")
display(Markdown("## üá™üá∏ Translated Brochure (Spanish)"))
display(Markdown(english_brochure))

In [154]:
def call_brochure(company_name, url):
    """
    Generates the brochure with streaming enabled for a better user experience.
    """
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        stream=True
    )
    
    response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            response += chunk.choices[0].delta.content
            yield response

In [153]:
call_brochure(SITE, URL)

<generator object call_brochure at 0x00000228070B7AC0>

In [161]:
import gradio as gr

input_company = gr.Textbox(label="Company Name", placeholder="e.g. HuggingFace")
input_url = gr.Textbox(label="Website URL", placeholder="https://huggingface.co")

output_markdown = gr.Markdown(label="Response:")

view = gr.Interface(
    fn=call_brochure,
    title="Digital Brochure AI üöÄ", 
    inputs=[input_company, input_url], 
    outputs=[output_markdown], 
    examples=[
        ["HuggingFace", "https://huggingface.co"],
        ["OpenAI", "https://openai.com"],
        ["Anthropic", "https://www.anthropic.com"]
    ],
    flagging_mode="never"
    )
view.launch()

* Running on local URL:  http://127.0.0.1:7902
* To create a public link, set `share=True` in `launch()`.


