# AI-powered Brochure Generator

- üåç Task: Generate a company brochure using its name and website for clients, investors, and recruits.
- üß† Model: The user switches between ``OpenAI`` and ``Ollama`` by toggling ``USE_OPENAI``.
- üïµÔ∏è‚Äç‚ôÇÔ∏è Data Extraction: Scraping website content and filtering key links (About, Products, Careers, Contact).
- üìå Output Format: a Markdown-formatted brochure streamed in real-time.
- üöÄ Tools: BeautifulSoup, OpenAI API, and IPython display.
- üßë‚Äçüíª Skill Level: Intermediate to advanced.
- ‚öôÔ∏è Hardware: ‚úÖ CPU is sufficient ‚Äî no GPU required

## Workflow
1. **`main()`** initializes `BrochureGenerator` and calls `generate()`.  
2. **`generate()`** calls **`LLMClient.get_relevant_links()`** to extract relevant links using **LLM (OpenAI/Ollama)**.  
3. **`Website` scrapes the webpage**, extracting **text and links** from the given URL.  
4. **Relevant links are re-scraped** using `Website` to collect additional content.  
5. **All collected content is passed to `LLMClient.generate_brochure()`**.  
6. **`LLMClient` streams the generated brochure** using **OpenAI or Ollama**.  
7. **The final brochure is displayed in Markdown format.**

![brochure_generator_process.png](assets/brochure_generator_process.png)

![brochure_generator_process.png](assets/intermediate_reasoning.png)

## Class Structure 
This code consists of three main classes:

1. **`Website`**:  
   - Scrapes and processes webpage content.  
   - Extracts **text** and **links** from a given URL.  

2. **`LLMClient`**:  
   - Handles interactions with **OpenAI (`gpt`) or Ollama (`llama3`, `deepseek`, `qwen`)**.  
   - Uses `get_relevant_links()` to filter webpage links.  
   - Uses `generate_brochure()` to create and stream a Markdown-formatted brochure.  

3. **`BrochureGenerator`**:  
   - Uses `Website` to scrape the main webpage and relevant links.  
   - Uses `LLMClient` to filter relevant links and generate a brochure.  
   - Calls `generate()` to run the entire process.


## Class Diagram

![brochure_class_diagram.png](assets/brochure_class_diagram.png)


## Import Libraries

In [7]:
import os
import requests
import json
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import display, Markdown, update_display
from openai import OpenAI
import ollama

# Define the Model

The user can switch between OpenAI and Ollama by changing a single variable (`USE_OPENAI`). The model selection is dynamic.

In [21]:
# Load API key
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
if not api_key or not api_key.startswith('sk-'):
    raise ValueError("Invalid OpenAI API key. Check your .env file.")

# Define the model dynamically
USE_OPENAI = True  # True to use openai and False to use Ollama
MODEL = 'gpt-4o-mini' if USE_OPENAI else 'llama3.2'


openai_client = OpenAI() if USE_OPENAI else None


## Classes

In [22]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to scrape and process website content.
    """
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        self.text = self.extract_text(soup)
        self.links = self.extract_links(soup)

    def extract_text(self, soup):
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            return soup.body.get_text(separator="\n", strip=True)
        return ""

    def extract_links(self, soup):
        links = [link.get('href') for link in soup.find_all('a')]
        return [link for link in links if link and 'http' in link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [23]:
class LLMClient:
    def __init__(self, model=MODEL):
        self.model = model

    def get_relevant_links(self, website):
        link_system_prompt = """
        You are given a list of links from a company website. 
        Select only relevant links for a brochure (About, Company, Careers, Products, Contact).
        Exclude login, terms, privacy, and emails.
        
        ### **Instructions**
        - Return **only valid JSON**.
        - **Do not** include explanations, comments, or Markdown.
        - Example output:
        {
            "links": [
                {"type": "about", "url": "https://company.com/about"},
                {"type": "contact", "url": "https://company.com/contact"},
                {"type": "product", "url": "https://company.com/products"}
            ]
        }
        """
        
        user_prompt = f"""
        Here is the list of links on the website of {website.url}:
        Please identify the relevant web links for a company brochure. Respond in JSON format.
        Do not include login, terms of service, privacy, or email links.
        Links (some might be relative links):
        {', '.join(website.links)}
        """
        
        if USE_OPENAI:
            response = openai_client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": link_system_prompt},
                    {"role": "user", "content": user_prompt}
                ]
            )
            return json.loads(response.choices[0].message.content.strip())
        else:
            response = ollama.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": link_system_prompt},
                    {"role": "user", "content": user_prompt}
                ]
            )
            result = response.get("message", {}).get("content", "").strip()
            try:
                return json.loads(result)  # Attempt to parse JSON
            except json.JSONDecodeError:
                print("Error: Response is not valid JSON")
                return {"links": []}  # Return empty list if parsing fails
            

    def generate_brochure(self, company_name, content, language):
        system_prompt = """
        You are a professional translator and writer who creates fun and engaging brochures. 
        Your task is to read content from a company‚Äôs website and write a short, humorous, joky,
        and entertaining brochure for potential customers, investors, and job seekers. 
        Include details about the company‚Äôs culture, customers, and career opportunities if available. 
        Respond in Markdown format.
        """
        
        user_prompt = f"""
        Create a fun brochure for '{company_name}' using the following content:
        {content[:5000]}
        Respond in {language} only, and format your response correctly in Markdown.
        Do NOT escape characters or return extra backslashes.
        """
        
        if USE_OPENAI:
            response_stream = openai_client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                stream=True
            )
            response = ""
            display_handle = display(Markdown(""), display_id=True)
            for chunk in response_stream:
                response += chunk.choices[0].delta.content or ''
                response = response.replace("```","").replace("markdown", "")
                update_display(Markdown(response), display_id=display_handle.display_id)
        else:
            response_stream = ollama.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                stream=True 
            )
            display_handle = display(Markdown(""), display_id=True)
            full_text = ""
            for chunk in response_stream:
                if "message" in chunk:
                        content = chunk["message"]["content"] or ""
                        full_text += content
                        update_display(Markdown(full_text), display_id=display_handle.display_id)
            

In [24]:
class BrochureGenerator:
    """
    Main class to generate a company brochure.
    """
    def __init__(self, company_name, url, language='English'):
        self.company_name = company_name
        self.url = url
        self.language = language
        self.website = Website(url)
        self.llm_client = LLMClient()
    
    def generate(self):
        links = self.llm_client.get_relevant_links(self.website)
        content = self.website.get_contents()
        
        for link in links['links']:
            linked_website = Website(link['url'])
            content += f"\n\n{link['type']}:\n"
            content += linked_website.get_contents()
        
        self.llm_client.generate_brochure(self.company_name, content, self.language)


In [25]:
def main():
    company_name = "Tour Eiffel"
    url = "https://www.toureiffel.paris/fr"
    language = "French"
    
    generator = BrochureGenerator(company_name, url, language)
    generator.generate()

if __name__ == "__main__":
    main()


# Visitons la Tour Eiffel! üåü

## Bienvenue au monument le plus embl√©matique du monde ! üá´üá∑

Vous √™tes-vous d√©j√† demand√© ce que √ßa fait de d√Æner avec une vue √©poustouflante sur Paris ? Ou encore, de gravir le sommet d‚Äôune structure qui a plus de filaments que le cha√Ænon d‚Äôune pelote de laine ? Bienvenue √† la Tour Eiffel, o√π chaque visite est plus unique qu‚Äôun selfie rat√© ! üì∏

### Tarifs & Horaires üö™

- **Horaires :** Tous les jours, de 09h30 √† 23h00 (enfin, nous devons aussi dormir un peu, non ?).
- **Billets :** Gagnez du temps en les achetant *en ligne* (parce qu'attendre dans la queue, c‚Äôest pour les amateurs de sensations fortes).

### D√©couvrir la Tour Eiffel üåç

#### *Un Voyage Ascensionnel Unique!*

Du parvis au sommet, pr√©parez-vous √† une aventure vertigineuse ! üò≤ Vous serez √©bloui par la vue √† 360¬∞ (n'oubliez pas de fermer la bouche, √ßa attire les mouches). Que vous soyez √† la recherche d'une journ√©e romantique ou d'une aventure familiale, la Tour a quelque chose √† vous offrir.

- **1er √âtage :** D√Æner chez **Madame Brasserie** ‚Äì pensez √† la fondue au fromage en regardant le ciel bleu, sans r√©servation, mais avec une pinc√©e de chance !
- **2√®me √âtage :** Apportez votre appareil photo ! Les souvenirs forts se construisent ici.
- **Sommet :** Des frissons garantis et une vue o√π l'on se demande si nos probl√®mes sont vraiment si grands !

### Restaurants & Boutiques üçΩÔ∏èüõçÔ∏è

Pourquoi se limiter aux selfies quand vous pouvez √©galement acheter des souvenirs uniques pour prouver √† vos amis que vous avez visit√© la Tour Eiffel ? Des objets de collection, des cadeaux chaleureux pour votre grand-m√®re (ou pour vous-m√™me, on ne jugera pas).

### √Ä ne Pas Manquer üåå

Saviez-vous que la Tour s'illumine ? Oui, elle scintille comme un disco des ann√©es 70 ! Alors, pourquoi ne pas r√©server votre visite pour une nuit√©e magique ? ‚ú® 

### Et Vous ? Travailler √† la Tour Eiffel ? üë∑‚Äç‚ôÄÔ∏è

Si vous aimez l‚Äôarchitecture plus que votre propre reflet, nous avons des opportunit√©s professionnelles qui n‚Äôattendent que vous ! √âvoluez dans un environnement o√π l'ascenseur est un alli√© et les escaliers ne sont l√† que pour les personnes en qu√™te de cardio.

---

**Pr√™t pour l'aventure?** La Tour Eiffel vous attend pour des exp√©riences inoubliables! 
Que vous soyez un visiteur, un investisseur, ou un chercheur d‚Äôemploi, la magie vous attend ici. 

üóº *N'attendez plus, venez faire un tour (ou un ascenseur) √† la Tour Eiffel !*

