# Automating Company Brochure Creation with AI

This notebook demonstrates how to automate the creation of company brochures by scraping text content from a specified website and processing it using the LM Studio API. The generated brochures will be displayed in a user-friendly Markdown format.


### Import Required Libraries

In this cell, we import the necessary libraries:

- **`os`**: To manage environment variables for API keys.
- **`requests`**: To send HTTP requests to fetch the website content.
- **`json`**: To handle JSON data, particularly for API responses.
- **`List`**: To use type annotations for lists.
- **`load_dotenv`**: To load environment variables from a `.env` file.
- **`BeautifulSoup`**: To parse and extract text from HTML content.
- **`Markdown` and `display`**: To format and display output in Jupyter Notebook.
- **`OpenAI`**: To interact with the LM Studio API.


In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI


### Initialize OpenAI Client

In this cell, we initialize the OpenAI client using the specified base URL and API key. This allows us to interact with the LM Studio API for processing our scraped content.



In [2]:
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")


### Define the Website Class

Here, we define a class `Website` to represent a webpage. The class fetches the webpage content and extracts the title, body text, and links. The `get_contents` method provides a formatted string of the webpage's title and content.



In [3]:
class Website:
    url: str
    title: str
    body: str
    links: List[str]

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"



### Create Website Instance

In this cell, we create an instance of the `Website` class for a specific URL. We then print the extracted links from the webpage to verify that the class is functioning as expected.


In [4]:
ed = Website("https://www.llama.com/")
print(ed.links)



[]


### Define Link System Prompt

This cell defines a prompt for the LM Studio API. The prompt instructs the model to determine which links are most relevant for inclusion in a company brochure. The response is expected in a specific JSON format.



In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company,  \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt +=  """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""
link_system_prompt += "all the links should be start with https"


### Define User Prompt for Links

In this cell, we define a function `get_links_user_prompt` that creates a user prompt for the LM Studio API. The prompt includes the website URL and a list of links, asking the model to filter the relevant ones for the brochure.


In [6]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL only  in JSON format. \
Do not include Terms of Service, Privacy, email links any other links also and other then link also.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt


### Get Links from Website

This cell defines the function `get_links` that retrieves relevant links from a given website. It constructs a prompt for the API and processes the JSON response to return the filtered links.


In [7]:
import re
def get_links(url):
    website = Website(url)
    completion = client.chat.completions.create(
        model="model-identifier",
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)},
            {"role": "user", "content": "Output **only** the JSON object with a key 'links' that contains an array of link objects. Each object should have 'type' and 'url' fields. Do not include any explanations, comments, or additional text"}
      ]
        # response_format={"type": "json_object"}
    )
    result = completion.choices[0].message.content
    # print(result)
    json_match = re.search(r'{.*}', result, re.DOTALL)
    if json_match:
        raw_json = json_match.group(0)
        try:
            # Attempt to parse the JSON content
            json_data = json.loads(raw_json)
            return json_data
        except json.JSONDecodeError:
            print("Error: Invalid JSON format.")
            return None
    else:
        print("Error: JSON object not found in response.")
        return None
    # return json.loads(result)


### Print Links Retrieved

In this cell, we call the `get_links` function with a specified URL to retrieve and print the relevant links for the company brochure.


In [8]:
get_links("https://www.llama.com/")


{'links': [{'type': 'about page', 'url': 'https://www.llama.com/about'},
  {'type': 'company page', 'url': 'https://www.llama.com/our-story'},
  {'type': 'careers page', 'url': 'https://www.llama.com/careers'}]}

### Define Brochure Details Function

This cell defines a function `get_all_details` that compiles information from the landing page and other relevant pages to create a comprehensive overview for the brochure.


In [9]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result


### Print All Details

In this cell, we call the `get_all_details` function with the specified URL to print out the detailed content gathered for the company.


In [10]:
print(get_all_details("https://www.llama.com/"))


Found links: {'links': [{'type': 'about page', 'url': 'https://www.llama.com/about'}, {'type': 'careers page', 'url': 'https://www.llama.com/careers'}]}
Landing page:
Webpage Title:
Llama 3.2
Webpage Contents:




about page
Webpage Title:
Error
Webpage Contents:




careers page
Webpage Title:
Error
Webpage Contents:





### Define Brochure System Prompt

Here, we define a prompt for the LM Studio API that instructs the model to analyze the contents gathered and create a humorous, entertaining brochure about the company for potential customers, investors, and recruits.


In [11]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous ,entertaining ,jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information and add some emojies in the brochure."


### Define User Prompt for Brochure

This cell defines a function `get_brochure_user_prompt` that prepares the user prompt for generating the brochure, including the company name and the gathered details.


In [12]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:10_000] # Truncate if more than 20,000 characters
    return user_prompt


### Create Brochure Function

In this cell, we define the `create_brochure` function that sends a request to the LM Studio API to generate a brochure based on the user prompt. The response is displayed in a formatted Markdown style.


In [13]:
def create_brochure(company_name, url):
    response = client.chat.completions.create(
        model="model-identifier",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))


### Generate Brochure

In this cell, we call the `create_brochure` function with the company name and URL to generate and display the brochure content.


In [14]:
create_brochure("Llama", "https://www.llama.com/")


Found links: {'links': [{'type': 'about page', 'url': 'https://www.llama.com/about'}, {'type': 'company page', 'url': 'https://www.llama.com/our-story'}, {'type': 'careers page', 'url': 'https://www.llama.com/careers'}]}


**Welcome to Llama! 🐫**

We're a company that's still trying to figure out its website, but don't worry, we're working on it! 😅 Meanwhile, let us introduce you to our awesome culture, customers, and careers.

### Our Culture:

* We're a bunch of llamas (literally) who love to have fun at work. 🎉
* Our motto: "If it's not llama-tastic, why bother?" 💥

### Our Customers:

* We've got a growing list of happy customers who appreciate our... um, dedication to innovation? 😂

### Careers/Jobs:

* We're always looking for talented individuals who are willing to join our journey into the unknown. 🤯
* Currently, we have openings in:
	+ Web Development (if you can fix that pesky error page 😉)
	+ Content Creation (we need someone to write a real company bio! 💸)

### Why Join Llama?

* Be part of a pioneering team that's rewriting the book on "how not to build a website" 📚
* Enjoy our relaxed, laid-back office environment... as long as you don't mind occasional server crashes 🤯
* Get access to our legendary (at least we hope) company lunches 🍴

If you're ready for an adventure, come join us at Llama! 🎉 Contact us at [insert non-existent email address] and let's create some llama-tic history together! 💥

### Stream Brochure Function

This cell defines a function `stream_brochure` that allows for streaming responses from the LM Studio API, displaying the brochure content as it is generated, mimicking a typewriter animation.


In [15]:
def stream_brochure(company_name, url):
    stream = client.chat.completions.create(
        model="model-identifier",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)},
            {"role": "user", "content": "the output only in spanish language"},
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)


### Stream Brochure for a Different Company

In this cell, we call the `stream_brochure` function for a different company, demonstrating how the streaming functionality works with live content updates.


In [16]:
stream_brochure("Hugging Face", "https://huggingface.co/")

Found links: {'links': [{'type': 'About page', 'url': 'https://huggingface.co/about'}, {'type': 'Company page', 'url': 'https://huggingface.co'}, {'type': 'Careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'Docs page', 'url': 'https://huggingface.co/docs'}]}


**¡Bienvenidos a Hugging Face!**

Estamos emocionados de presentarte nuestro breve folleto en español sobre la compañía.

**¿Por qué elegir Hugging Face?**

* **Colaboración**: Un lugar donde los desarrolladores de inteligencia artificial pueden crear, descubrir y colaborar en modelos, datasets y aplicaciones.
* **Flexibilidad**: Explora diferentes modalidades: texto, imagen, video, audio o incluso 3D.
* **Portafolio**: Comparta su trabajo con el mundo y construya su perfil de ML.

**¿Qué nos diferencia?**

* **Comunidad global**: Más de 50,000 organizaciones ya están utilizando Hugging Face.
* **Open Source**: Estamos construyendo la base de herramientas de IA con la comunidad.
* **Diseño intuitivo**: Crea, experimenta y compartas fácilmente.

**¿Qué ofrecemos?**

* **Compute**: Ejecute modelos en nuestras Infraestructuras de Fin de Vuelta o actualice aplicaciones de Spaces a un GPU en pocos clics.
* **Enterprise**: Un lugar donde su equipo puede construir IA con seguridad empresarial, control de acceso y soporte dedicado.

**¿Qué más?**

* **Documentación**: Acceda a recursos y tutoriales para profundizar en Hugging Face.
* **Foro**: Participa en discusiones y preguntas frecuentes con nuestra comunidad global.

**¡Inscríbase ahora!**