<div style="border: 1px solid #ccc; padding: 10px; border-radius: 5px;">
    <table style="width: 100%;">
        <tr>
            <td style="width: 10%; text-align: center; vertical-align: top;">
                <img src="images/robotic_sales_assistent.png" alt="Robotic Sales Assistent" width="200">
            </td>
            <td style="width: 70%; vertical-align: top; padding-left: 15px;">
                <h3>Use Case: Sales Assistent</h3>
                <p>
                    Der Robotic Sales Assistent automatisiert Verkaufsprozesse und unterstützt 
                    Mitarbeiter bei der Kundenbetreuung. Typische Anwendungsfälle umfassen:
                </p>
                <p>
                    Ziel ist es, die Effizienz im Verkauf zu steigern und personalisierte 
                    Kundenerlebnisse zu ermöglichen.
                </p>
            </td>
        </tr>
    </table>
</div>



In [2]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [3]:
# Initialize and constants

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [4]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that I have scraped, including links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [5]:
# Test class

ed = Website("https://www.huggingface.com")
ed.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/blog/yagilb/lms-hf',
 '/Qwen/QwQ-32B-Preview',
 '/Djrango/Qwen2vl-Flux',
 '/AIDC-AI/Marco-o1',
 '/Lightricks/LTX-Video',
 '/OuteAI/OuteTTS-0.2-500M',
 '/models',
 '/spaces/PR-Puppets/PR-Puppet-Sora',
 '/spaces/Qwen/QwQ-32B-preview',
 '/spaces/akhaliq/anychat',
 '/spaces/huggingface-projects/ai-video-composer',
 '/spaces/multimodalart/logo-in-context',
 '/spaces',
 '/datasets/alpindale/two-million-bluesky-posts',
 '/datasets/HuggingFaceTB/smoltalk',
 '/datasets/O1-OPEN/OpenO1-SFT',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/bluesky-community/one-million-bluesky-posts',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',


## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
I will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

In [6]:
# Create the system prompt

link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [7]:
# Create user prompt

def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [8]:
# Ask LLM for most relevant links and return in Json

def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [9]:
# Crawl information out of relevant links

def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [10]:
# Create system prompt

system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [11]:
# Create user prompt

def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20_000] # Truncate if more than 20,000 characters
    return user_prompt

In [12]:
# Create the brochure

def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [None]:
create_brochure('<company_name>', '<url>')