<a href="https://colab.research.google.com/github/micah-shull/LLMs/blob/main/LLM_002_create_business_brochure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install python-dotenv
!pip install openai

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1
Collecting openai
  Downloading openai-1.52.0-py3-none-any.whl.metadata (24 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.52.0-py3-none-any.whl (386 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.9/386.9 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25hD

# A full business solution

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [10]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [9]:
from dotenv import load_dotenv
import os

# Load the environment variables from the .env file
load_dotenv('/content/OPENAI_API_KEY.env')

# Get the API key from the environment
api_key = os.getenv("OPENAI_API_KEY")

# Print the API key to confirm it's loaded correctly
print(f"API Key loaded from OPENAI_API_KEY.env: {api_key[0:30]}")

# Set the API key for the OpenAI library
OpenAI.api_key = api_key

API Key loaded from OPENAI_API_KEY.env: sk-proj-mfl7r6HDybev5pVz7ZoIDE


In [11]:
# Initialize and constants

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
MODEL = 'gpt-4o-mini'
openai = OpenAI()

In [14]:
# A class to represent a Webpage

class Website:
    url: str
    title: str
    body: str
    links: List[str]

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

ed = Website("https://www.microsoft.com/en-us")
for l in ed.links:
  print(l)

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

### System Prompt

### What Are System Prompts and Why Are They Needed?

**System prompts** are instructions provided to the model to establish its role, behavior, or how it should approach a particular task. In the OpenAI API, the system prompt helps define the context and rules for how the assistant should interact throughout the conversation. It sets the groundwork for the assistant's behavior before any specific tasks or questions (from the user prompt) are asked.

### Purpose of System Prompts:
1. **Set Expectations**: The system prompt defines what the model is expected to do during the interaction. For example, it tells the model that it should focus on certain content or follow specific guidelines when generating responses.
   
2. **Guide Behavior**: It helps guide the tone, structure, and style of the model’s responses. For example, the system prompt can ask the model to respond in a specific format (like JSON), maintain a professional tone, or ignore irrelevant content.

3. **Establish Context**: It provides the initial context for the task, so the model knows how to handle user queries or instructions. System prompts help ensure consistency in responses by maintaining the same behavior throughout a conversation.


#### What This System Prompt Does:
1. **Role Definition**: It clearly defines that the model’s job is to analyze a list of links and determine which are most relevant for a specific purpose (e.g., creating a brochure about a company). This frames the model’s behavior.
   
2. **Task-Specific Instructions**: It gives specific examples of the kinds of links that should be selected (e.g., About page, Careers/Jobs page), guiding the model on what to look for in the links.

3. **Response Format**: It specifies that the model should respond in a **JSON format**, ensuring that the output is structured and machine-readable.

#### Why This System Prompt Is Important:
- **Maintains Structure**: It ensures that the model returns output in the correct JSON format. Without this instruction, the model might respond with free text or unstructured data.
  
- **Focuses the Model**: By explicitly mentioning types of relevant links (e.g., About, Careers), it narrows down the model’s decision-making to select only the links relevant to the task.

- **Consistency**: System prompts are necessary to ensure that the model behaves consistently across multiple interactions or tasks, maintaining the same structure and focus throughout the conversation.

### Conclusion:
System prompts provide **high-level guidance** that sets the behavior, tone, and format of responses for a model. In your case, the **link_system_prompt** ensures that the model selects specific types of links and returns them in a clear, structured format. This is critical when you want the model to perform specific tasks with consistent, high-quality output.

In [28]:
link_system_prompt = """
You are provided with a list of links found on a webpage.
You can decide which of the links would be most relevant to include in a brochure about the company.
These could be links to an About page, Company page, or Careers/Jobs pages.
"""
link_system_prompt += """
You should respond in JSON format, as shown in the following example:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

### User Prompt

### What Are User Prompts and Why Are They Needed?

**User prompts** are the specific instructions or questions provided to the model. Unlike the system prompt, which establishes the model's role and behavior, the **user prompt** provides the actual task or query the model is expected to answer. It tells the model **what** to do, based on the context set by the system prompt.

### Purpose of User Prompts:
1. **Task-Specific Instructions**: The user prompt provides detailed instructions for the task at hand. It often contains the data (such as links or text) and clearly specifies what the user wants the model to do with that data.

2. **Clarity on Expectations**: It helps ensure the model understands the specific task, minimizing ambiguity and improving the accuracy of the response.

3. **Focus the Response**: User prompts allow the model to focus on a particular aspect of a task. They provide context and constraints (e.g., "don't include email links or terms of service").


#### What This User Prompt Does:
1. **Specifies the Task**: It clearly tells the model to evaluate the provided list of links and decide which are relevant for a company brochure.
   - Example: "Here is the list of links on the website of {website.url}" introduces the task to the model, asking it to analyze the list.

2. **Defines the Output Format**: It instructs the model to return its response in **JSON format**, which is critical for structured output that can be easily processed programmatically.
   - Example: "Respond with the full https URL in JSON format."

3. **Excludes Irrelevant Links**: The prompt adds specific constraints to exclude certain types of links, such as "Terms of Service," "Privacy," or email links. This ensures that only relevant links are considered for the brochure.
   - Example: "Do not include Terms of Service, Privacy, email links."

4. **Handles Relative Links**: It mentions that some links may be relative links (which lack the full domain name), guiding the model to handle them properly.
   - Example: "Links (some might be relative links):"

5. **Provides the Data**: The list of links from the website is appended to the prompt. This gives the model the data it needs to analyze and generate a relevant output.
   - Example: `"\n".join(website.links)` adds the actual list of links.

#### Why This User Prompt Is Important:
- **Provides Context**: It gives the model a specific task (choosing relevant links for a brochure) based on the context of the website.
  
- **Clarifies Expectations**: The prompt clearly states what types of links to exclude (e.g., Terms of Service, Privacy, email links) and what format to return the results in (JSON), helping the model deliver the right output.

- **Task-Oriented**: The user prompt ensures the model focuses on the specific task of evaluating web links for relevance to a brochure, rather than just returning all links or providing general information.

### Conclusion:
User prompts are critical because they tell the model exactly what it needs to do with the provided data. In this case, the **get_links_user_prompt** gives the model a clear task (evaluating website links), excludes certain types of irrelevant links, and asks for the output in a structured JSON format. This ensures the model understands the task and produces a relevant and structured response.

In [20]:
def get_links_user_prompt(website):
    # Step 1: Initialize the user_prompt string with an introduction and context about the website
    user_prompt = f"Here is the list of links on the website of {website.url} - "

    # Step 2: Add additional instructions to the user_prompt about what to do with the links
    user_prompt += "please decide which of these are relevant web links for a \
    brochure about the company, respond with the full https URL in JSON format. \
    Do not include Terms of Service, Privacy, email links.\n"

    # Step 3: Add more context, noting that some links may be relative
    user_prompt += "Links (some might be relative links):\n"

    # Step 4: Append the actual list of links to the user_prompt, joined by newlines
    user_prompt += "\n".join(website.links)

    # Step 5: Return the complete user_prompt
    return user_prompt



### User Prompt Code Explanation:

### Why Does It Use `+=`?

The `+=` operator in this context is used to **incrementally build the `user_prompt` string** by adding different parts of the message one after the other. This allows you to create the final string in multiple steps, which is helpful for a few reasons:

#### 1. **Readability**:
   - Breaking the string up across several lines and appending it piece by piece makes the code more **readable** and **maintainable**.
   - If all the text were written in one long string, it would be harder to follow or edit because of the length and complexity.

#### 2. **Modularity**:
   - Using `+=` allows you to easily **modify or insert** additional content at different points in the prompt. For instance, if you wanted to add more information after the list of links, it would be easy to do by appending another line.
   - It also makes it easy to understand which part of the prompt is responsible for what (i.e., instructions, list of links, exclusions).

#### 3. **Handling Dynamic Data**:
   - Parts of the string, like the `website.url` or the list of `website.links`, are **dynamic**. The prompt is being built incrementally, allowing for the insertion of these dynamic elements wherever needed.
   - It’s common to use `+=` when you are adding dynamic data because the structure of the string can be built in steps and adjusted more easily.

#### 4. **Easier Debugging**:
   - If there's a problem with the constructed string (e.g., formatting or incorrect instructions), you can easily debug and fix specific parts of the string when it's built incrementally.
   - If all the text is written in one long string, it might be more difficult to find errors.

### Could This Be Written as One Long String?

Yes, it could be written as a single string using **f-string** formatting with everything included at once. For example:

```python
def get_links_user_prompt(website):
    return f"Here is the list of links on the website of {website.url} - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.\nLinks (some might be relative links):\n" + "\n".join(website.links)
```

However, this approach:
- Would make the string **less modular** and harder to edit.
- Would become **unwieldy** if you had to change or add more sections of text.
- Could make it **more difficult to read and debug** if there are issues.

### Conclusion:
The `+=` operator is used to **incrementally** build the `user_prompt` for better readability, modularity, and ease of handling dynamic content. While it’s possible to write the entire prompt as one long string, doing so would reduce the flexibility and clarity of the code, especially when handling dynamic elements like `website.url` and `website.links`.

In [32]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://www.microsoft.com/en-us - please decide which of these are relevant web links for a     brochure about the company, respond with the full https URL in JSON format.     Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://www.microsoft.com
https://www.microsoft.com/microsoft-365
https://www.microsoft.com/en-us/microsoft-teams/group-chat-software
https://copilot.microsoft.com/
https://www.microsoft.com/en-us/windows/
https://www.microsoft.com/en-us/surface
https://www.xbox.com/
https://www.microsoft.com/en-us/store/b/sale?icid=gm_nav_L0_salepage
https://www.microsoft.com/en-us/store/b/business
https://support.microsoft.com/en-us
https://products.office.com/en-us/home
https://www.microsoft.com/en-us/windows/
https://www.microsoft.com/en-us/surface
https://www.xbox.com/
https://www.microsoft.com/en-us/store/b/sale?icid=gm_nav_L0_salepage
https://support.microsoft.com/en-us
https://www.micros

This function, `get_links(url)`, performs the following tasks to interact with the OpenAI API, extract relevant links from a webpage, and return the result in JSON format. Here’s a breakdown of each part:

## Get Links Function


### Detailed Breakdown of Each Step:

1. **`website = Website(url)`**:
   - This line creates an instance of the `Website` class by passing the `url` as an argument.
   - The `Website` class presumably extracts content and links from the given URL, making it ready for the model to process. The specific functionality of `Website` is assumed to be defined elsewhere.

2. **`openai.chat.completions.create(...)`**:
   - This line calls the **OpenAI API** to generate a response based on the model (`MODEL`) and the provided prompts. It sends a request to the API to analyze the extracted links and return relevant ones in **JSON format**.
   
   - **Parameters**:
     - `model=MODEL`: Specifies which model to use (e.g., `"gpt-3.5-turbo"` or `"gpt-4"`).
     - `messages`: This is a list of messages that the model will process. It contains:
       - **System Prompt**: `{"role": "system", "content": link_system_prompt}`. This defines how the model should behave (e.g., analyze the links and return relevant ones).
       - **User Prompt**: `{"role": "user", "content": get_links_user_prompt(website)}`. This is a dynamic prompt generated by the function `get_links_user_prompt()` based on the content of the website. It specifies what the model should do with the links (e.g., choose relevant ones for a company brochure).
     - `response_format={"type": "json_object"}`: This ensures that the response is formatted as a **JSON object**. The system prompt should also contain instructions that the output should be in JSON.

3. **`result = completion.choices[0].message.content`**:
   - The OpenAI API returns a **completion object**, which contains multiple choices (though typically just one for ChatCompletion).
   - `completion.choices[0].message.content` extracts the **content** from the first choice in the response (i.e., the result returned by the model).
   - The result is expected to be a string representing a JSON object, as requested by the `response_format` and the system prompt.

4. **`return json.loads(result)`**:
   - The `json.loads()` function is used to convert the **JSON-formatted string** (returned by the model) into a **Python dictionary**.
   - The function returns this dictionary, which contains the relevant links in the desired JSON format.

### Example Workflow:
1. **User Input**: You provide a webpage URL (e.g., `https://example.com`).
2. **Website Class**: The `Website(url)` object extracts the list of links from the given webpage.
3. **Model Interaction**: The model is given two prompts:
   - The **system prompt** (`link_system_prompt`) defines how the model should behave (e.g., analyze links, return JSON).
   - The **user prompt** (`get_links_user_prompt(website)`) provides the specific task (i.e., choose relevant links for a brochure).
4. **Model Output**: The model processes the links and returns relevant ones (e.g., About page, Careers page) in JSON format.
5. **Result Parsing**: The JSON string is parsed into a Python dictionary and returned.

### Purpose:
- This function allows you to pass a **URL** to the model and receive a **filtered list of relevant links** (e.g., for a company brochure) as a JSON object. It streamlines the process of identifying important website links and structures them in a machine-readable format.

### Improvements or Adjustments:
- **Error Handling**: You may want to add error handling to manage cases where the API response is empty, invalid, or the website doesn't have any links.
- **Flexible Output**: You could allow for different output formats (e.g., XML, plain text) if needed by adjusting the system prompt and `response_format`.
  
Let me know if you need further clarification or adjustments!

In [33]:
MODEL = 'gpt-4o-mini'

def get_links(url):
    # Step 1: Create a Website object to extract the links from the webpage
    website = Website(url)

    # Step 2: Call the OpenAI ChatCompletion API with a system prompt and user prompt
    completion = openai.chat.completions.create(
        model=MODEL,  # Specify the OpenAI model to use (e.g., "gpt-3.5-turbo", "gpt-4")
        messages=[
            {"role": "system", "content": link_system_prompt},  # System prompt: sets the assistant's behavior
            {"role": "user", "content": get_links_user_prompt(website)}  # User prompt: tells the model what task to do with the links
        ],
        response_format={"type": "json_object"}  # Ensures that the response is returned as a JSON object
    )

    # Step 3: Extract the result (the relevant links) from the model's response
    result = completion.choices[0].message.content

    # Step 4: Parse the result (in JSON format) and return it as a Python dictionary
    return json.loads(result)

### Get Links

In [35]:
get_links("https://www.microsoft.com/en-us")

{'links': [{'type': 'about page',
   'url': 'https://www.microsoft.com/en-us/about'},
  {'type': 'careers page', 'url': 'https://careers.microsoft.com/'},
  {'type': 'investor page',
   'url': 'https://www.microsoft.com/en-us/investor/default.aspx'},
  {'type': 'diversity page',
   'url': 'https://www.microsoft.com/en-us/diversity/'},
  {'type': 'accessibility page',
   'url': 'https://www.microsoft.com/en-us/accessibility/'},
  {'type': 'sustainability page',
   'url': 'https://www.microsoft.com/en-us/sustainability/'}]}

## Get Page Details Function

In [40]:
def get_all_details(url):
    # Step 1: Initialize the result string with the landing page label
    result = "Landing page:\n"

    # Step 2: Fetch and add the contents of the landing page to the result
    result += Website(url).get_contents()

    # Step 3: Extract relevant links from the landing page using the get_links function
    links = get_links(url)

    # Step 4: Print the links that were found (for debugging purposes)
    print("Found links:", links)

    # Step 5: For each relevant link, fetch its contents and add them to the result string
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"  # Add the type of the link (e.g., "about page")
        result += Website(link["url"]).get_contents()  # Fetch and add the contents of the linked page

    # Step 6: Return the complete result with the landing page content and all linked pages
    return result

print(get_all_details("https://www.microsoft.com/en-us"))

Found links: {'links': [{'type': 'about page', 'url': 'https://www.microsoft.com/en-us/about'}, {'type': 'careers page', 'url': 'https://careers.microsoft.com/'}, {'type': 'company page', 'url': 'https://www.microsoft.com/en-us/investor/default.aspx'}, {'type': 'diversity page', 'url': 'https://www.microsoft.com/en-us/diversity/'}, {'type': 'accessibility page', 'url': 'https://www.microsoft.com/en-us/accessibility/'}, {'type': 'sustainability page', 'url': 'https://www.microsoft.com/en-us/sustainability/'}]}
Landing page:
Webpage Title:
Your request has been blocked. This could be
                        due to several reasons.
Webpage Contents:
Skip
                                                                to main content
Microsoft
Microsoft 365
Teams
Copilot
Windows
Surface
Xbox
Deals
Small Business
Support
More
All
                                                                                                                                                Microsoft
Office
Wi

This function, `get_all_details(url)`, is designed to extract the contents of a webpage, then gather relevant links from that webpage, and finally retrieve and concatenate the content of the linked pages as well. The function constructs a single string that includes the contents of the landing page and all the linked pages categorized by their type (e.g., "About page", "Careers page").

### What the Function Does:

1. **`result = "Landing page:\n"`**:
   - Initializes a string called `result` with the label "Landing page:". This will hold the entire output, starting with the landing page's content.

2. **`result += Website(url).get_contents()`**:
   - Creates a `Website` object for the provided URL (i.e., the landing page) and calls the `get_contents()` method to extract its content. The content is appended to the `result` string.

3. **`links = get_links(url)`**:
   - Calls the `get_links()` function to extract relevant links from the landing page. This function is expected to return a dictionary where `links["links"]` contains a list of links in JSON format (e.g., "about page", "careers page", etc.).

4. **`print("Found links:", links)`**:
   - Prints the found links to the console for debugging purposes, allowing you to see which links were extracted from the landing page.

5. **`for link in links["links"]:`**:
   - Loops through each link in the list of relevant links.

6. **`result += f"\n\n{link['type']}\n"`**:
   - Appends the **type** of each link (e.g., "about page", "careers page") to the result, using a new line to format the output.

7. **`result += Website(link["url"]).get_contents()`**:
   - For each link, creates a new `Website` object using the link’s URL and calls `get_contents()` to retrieve the content of the linked page. This content is also appended to the result.

8. **`return result`**:
   - Returns the final string containing the contents of the landing page followed by the contents of all the linked pages.

### Improvements to the Code:

1. **Error Handling**:
   - Add error handling to gracefully handle cases where links may be broken, contents cannot be fetched, or the webpage does not have any links.

2. **Code Cleanup**:
   - Avoid constructing the `result` string in multiple steps using `+=` (which may not be the most efficient way to handle long strings in Python). Instead, use a list to accumulate strings and join them at the end for better performance with large amounts of text.

3. **Reduce Repeated Calls**:
   - Avoid creating multiple `Website` objects for the same URL by reusing the `Website` object, especially for the landing page.

### Cleaned-Up Version:

```python
def get_all_details(url):
    # Step 1: Create a list to accumulate the result contents
    result_parts = []
    
    # Step 2: Add the landing page content
    landing_page = Website(url)
    result_parts.append("Landing page:\n")
    result_parts.append(landing_page.get_contents())
    
    # Step 3: Get relevant links from the landing page
    links = get_links(url)
    print("Found links:", links)  # Debug: Print the found links

    # Step 4: Loop through each relevant link, fetch its content, and add it to the result
    for link in links["links"]:
        # Add the type of the link (e.g., "about page") and its contents
        result_parts.append(f"\n\n{link['type']}\n")
        linked_page = Website(link["url"])  # Create a Website object for each link
        result_parts.append(linked_page.get_contents())
    
    # Step 5: Join the result_parts list into a single string and return it
    return ''.join(result_parts)
```

### Key Improvements:

1. **Efficient String Handling**:
   - Instead of repeatedly appending to the string with `+=`, we accumulate parts of the result in a list (`result_parts`). At the end, we use `''.join(result_parts)` to join the list into a single string, which is more efficient for larger texts.

2. **Reusing Objects**:
   - The code reuses the `Website` object to avoid recreating it multiple times for the same URL. This might be useful if the object is complex or expensive to create.

3. **Clarity and Maintainability**:
   - The code is now easier to maintain and expand, with each logical step clearly separated.

### Summary:
This function is designed to gather the content from a landing page and its relevant linked pages, and return the complete content as a single string. The improvements enhance the efficiency of string handling and make the code more maintainable.

## System Prompt

### Explanation of the Code:

1. **Defines Assistant's Role**: The assistant is instructed to act as someone who analyzes company website pages and creates a brochure that will appeal to prospective customers, investors, and recruits.
   - Example: "You are an assistant that analyzes the contents of several relevant pages from a company website."
   
2. **Specifies Output Format**: The assistant is asked to respond in **markdown**. This is useful because markdown allows formatting (e.g., headers, bullet points, bold text) and is commonly used in web content, documentation, and brochures.
   - Example: "Respond in markdown."
   
3. **Provides Task Instructions**: The assistant is specifically asked to include information about the company's culture, customers, and career opportunities (if that information is available).
   - Example: "Include details of company culture, customers and careers/jobs if you have the information."


#### Why This Code is Important:
- **System Prompts**: The system prompt defines the assistant’s role and behavior. It shapes how the assistant approaches the task, ensuring that the output matches the desired tone and structure.
- **Flexibility**: By simply changing a few words (like making it humorous or professional), the tone of the assistant’s response can change significantly. This shows the **flexibility** of system prompts and how easily they can be adjusted.
- **Specificity**: It ensures that the assistant includes key details that are likely relevant to a brochure, such as company culture and job opportunities.




In [43]:
# Professional tone brochure prompt
system_prompt = """
You are an assistant that analyzes the contents of several relevant pages from a company website
and creates a short brochure about the company for prospective customers, investors, and recruits.
Respond in markdown. Include details of company culture, customers, and careers/jobs if available.
"""

# Uncomment the lines below for a more humorous tone
# system_prompt = """
# You are an assistant that analyzes the contents of several relevant pages from a company website
# and creates a short, humorous, entertaining, and jokey brochure about the company for prospective
# customers, investors, and recruits. Respond in markdown. Include details of company culture,
# customers, and careers/jobs if available.
# """

## Get Brochure Prompt

### What the Code Does:
1. **Generates the User Prompt**:
   - It starts by adding a sentence introducing the company name and instructs the assistant to create a brochure based on the company’s landing page and relevant pages.
   
2. **Fetches and Appends Website Details**:
   - The function `get_all_details(url)` is called to fetch the content of the company's landing page and relevant pages (already explained in your previous code). This information is appended to the `user_prompt`.

3. **Truncates the User Prompt**:
   - To avoid hitting limits (likely imposed by the API), the prompt is truncated to 20,000 characters. This prevents sending excessively long prompts to the model.

4. **Returns the Prompt**:
   - The final `user_prompt` is returned, which will be passed to the assistant to generate the brochure.




In [44]:
def get_brochure_user_prompt(company_name, url):
    # Step 1: Create a multi-line string for clarity
    user_prompt = f"""
    You are looking at a company called: {company_name}
    Here are the contents of its landing page and other relevant pages.
    Use this information to build a short brochure of the company in markdown.
    """

    # Step 2: Append the details from the landing page and relevant links
    user_prompt += get_all_details(url)

    # Step 3: Truncate the prompt if it exceeds 20,000 characters to avoid API limits
    # The OpenAI API can have limits on prompt length, so we truncate it if it's too long
    user_prompt = user_prompt[:20_000]

    # Step 4: Return the final user prompt
    return user_prompt

In [None]:
# def get_brochure_user_prompt(company_name, url):
#     user_prompt = f"You are looking at a company called: {company_name}\n"
#     user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
#     user_prompt += get_all_details(url)
#     user_prompt = user_prompt[:20_000] # Truncate if more than 20,000 characters
#     return user_prompt

## Create Brochure

### Code Explanation:

This function, `create_brochure(company_name, url)`, is designed to generate a company brochure by calling the OpenAI API. It sends both the **system prompt** and the **user prompt** to the model and then processes and displays the result as markdown. Here's a breakdown of what the code does:

### What the Code Does:

1. **Calls the OpenAI API (`openai.chat.completions.create()`)**:
   - This method makes an API call to OpenAI's language model using the specified model (`MODEL`) and a list of messages. The messages include:
     - **System Prompt**: `system_prompt` defines the behavior and role of the model (e.g., to create a professional or humorous brochure).
     - **User Prompt**: `get_brochure_user_prompt(company_name, url)` generates the user prompt based on the company’s name and webpage content.

2. **Processes the Response**:
   - The API returns a **completion object**, which contains one or more generated choices. The function retrieves the first choice from the response with `response.choices[0].message.content`.

3. **Displays the Brochure in Markdown Format**:
   - The `Markdown(result)` function converts the generated content into markdown format and displays it. This makes the brochure visually appealing when displayed in environments that support markdown (e.g., Jupyter notebooks).

### Purpose of Each Step:

- **System and User Prompts**:
  - These are critical to guiding the model's behavior. The **system prompt** defines how the model behaves (e.g., professional, concise), and the **user prompt** tells the model what specific task to perform (i.e., creating the brochure based on the company’s website).
  
- **API Response**:
  - After calling the API, the result contains the content of the brochure, which is formatted in markdown for easy reading.

### Why This Code is Important:

1. **Prompt Structure**:
   - The system and user prompts define the behavior and task for the model, ensuring it generates a brochure based on relevant content from the company's website. The system prompt guides the model’s behavior (e.g., formal or humorous), and the user prompt provides specific content for the model to process.

2. **Markdown Formatting**:
   - Using markdown allows the brochure to be presented in a visually appealing way, especially in environments like Jupyter notebooks, where markdown can be rendered directly.

3. **API Integration**:
   - This function seamlessly integrates with OpenAI’s API, making it easy to generate customized content (in this case, a company brochure) programmatically.



In [45]:
def create_brochure(company_name, url):
    # Step 1: Call the OpenAI API using the specified model and prompts
    response = openai.chat.completions.create(
        model=MODEL,  # Use the model specified (e.g., "gpt-3.5-turbo" or "gpt-4")
        messages=[
            {"role": "system", "content": system_prompt},  # Provide the system prompt
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}  # Provide the user prompt
        ],
    )

    # Step 2: Extract the generated brochure from the API response
    result = response.choices[0].message.content

    # Step 3: Display the result in markdown format for better presentation
    display(Markdown(result))

create_brochure("Microsoft", "https://www.microsoft.com/en-us")


Found links: {'links': [{'type': 'about page', 'url': 'https://www.microsoft.com/en-us/about'}, {'type': 'careers page', 'url': 'https://careers.microsoft.com/'}, {'type': 'company page', 'url': 'https://www.microsoft.com/en-us/investor/default.aspx'}, {'type': 'sustainability page', 'url': 'https://www.microsoft.com/en-us/sustainability/'}, {'type': 'diversity page', 'url': 'https://www.microsoft.com/en-us/diversity/'}, {'type': 'accessibility page', 'url': 'https://www.microsoft.com/en-us/accessibility/'}]}


# Microsoft Company Brochure

## Overview
Microsoft is a global leader in technology, committed to empowering every person and organization on the planet to achieve more. With a diverse portfolio of products and services, including Microsoft 365, Windows, Surface, and Xbox, the company provides innovative solutions that cater to both individual and business needs.

## Company Culture
At Microsoft, the company culture is built upon a foundation of **diversity and inclusion**, **innovation**, and a sense of **community**. The work environment embraces collaboration and encourages employees to bring their authentic selves to work. Microsoft strives to create a workplace where everyone feels they belong and are empowered to contribute their ideas.

### Core Values:
- **Growth Mindset:** Encouraging continuous learning and development.
- **Customer Obsession:** Placing customers at the heart of every initiative.
- **Diversity and Inclusion:** Welcoming unique perspectives to foster creativity and innovation.
- **Community Impact:** Making a positive difference through sustainability efforts and philanthropy.

## Customers
Microsoft serves a wide array of customers, ranging from **individual consumers** to **large enterprises**. Their solutions empower educators through Microsoft Teams for Education, provide essential business tools like Azure and Dynamics 365, and enhance entertainment experiences through Xbox. Microsoft’s commitment to security and user privacy is paramount, ensuring that all customers can trust in their technology landscape.

## Products & Services
- **Microsoft 365:** A suite of productivity applications such as Word, Excel, and Teams.
- **Azure:** A comprehensive cloud computing service for developing and hosting applications.
- **Xbox:** Leading gaming platforms including Xbox Live and Game Pass, tailored for gamers around the world.
- **Surface Devices:** Versatile and high-performance hardware designed for both personal and professional use.

## Careers at Microsoft
Microsoft is an exciting place to build a career. The company offers diverse roles across various fields, including software engineering, marketing, sales, and customer support. Microsoft is committed to helping employees build rewarding careers through ongoing training and development opportunities.

### Career Benefits:
- **Inclusive Work Environment:** A commitment to fostering diversity in the workplace.
- **Professional Growth:** Learning resources and programs for career advancement.
- **Work-Life Balance:** Flexible work arrangements that support personal and professional commitments.
- **Health & Wellness:** Competitive benefits that prioritize employee well-being.

### Join Us!
Explore career opportunities at Microsoft and be part of a team that is shaping the future of technology. 

For more information, visit [Microsoft Careers](https://careers.microsoft.com).

---

Transform your potential into progress with Microsoft – empower your future today!

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [46]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [47]:
stream_brochure("Anthropic", "https://anthropic.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://anthropic.com/company'}, {'type': 'careers page', 'url': 'https://anthropic.com/careers'}]}


# Anthropic Brochure

## Company Overview
**Anthropic** is a leading AI safety and research company based in San Francisco, dedicated to building reliable, interpretable, and steerable AI systems. Our mission is to ensure that transformative AI helps individuals and society flourish. We conduct interdisciplinary research and development, treating AI safety as a systematic science while promoting collaboration across civil society, government, academia, nonprofits, and industry to foster industry-wide safety.

## Our Product: Claude
Meet **Claude 3.5 Sonnet**, our most advanced AI model developed yet. Businesses can leverage Claude through our API to streamline efficiency and explore new revenue streams. At Anthropic, we prioritize creating AI systems that are not only powerful but also trustworthy and responsible.

## Company Culture
At Anthropic, our culture is rooted in four core values:

1. **Here for the Mission**: Every member of our team is dedicated to ensuring that AI development benefits humanity.
2. **Unusually High Trust**: We foster a supportive environment where honesty, kindness, and open-mindedness are paramount. We believe that trust fosters better decision-making.
3. **One Big Team**: Collaboration is key. Our teams work together towards shared goals while encouraging innovative contributions from all members.
4. **Do the Simple Thing That Works**: We focus on practicality and empiricism, encouraging straightforward solutions over complex theories.

### Interdisciplinary Team
Our diverse team consists of researchers, engineers, policy experts, and operational leaders with backgrounds in physics, machine learning, public policy, and business. Together, we approach AI challenges with a holistic mindset that enhances our effectiveness.

## Customers
Our products aim to benefit a wide range of stakeholders, including businesses, nonprofits, and civil society organizations. We partner with various entities to drive safe and responsible AI developments through our tools and systems.

## Careers at Anthropic
We are actively seeking passionate individuals to join our team dedicated to making AI safer. Anthropic offers a collaborative and supportive work environment where ideas from various disciplines are valued. 

### What We Offer:
- **Health & Wellness**: Comprehensive health, dental, and vision insurance, inclusive fertility benefits, and a generous parental leave policy.
- **Compensation & Support**: Competitive salaries with equity options, a 401(k) plan with matching, and flexible wellness stipends.
- **Additional Perks**: Unlimited PTO, daily lunches, relocation support, and home office improvement stipends.

### Application Process
We aim for a fair and thorough hiring experience, which includes:
1. Resume submission
2. Exploratory chat
3. Skills assessment
4. Team screen
5. Interview panel
6. Final checks
7. Offer

We welcome applicants from diverse backgrounds; previous experience in machine learning is not mandatory.

## Join Us
Help shape the future of AI by bringing your unique skills to Anthropic. Whether you are interested in research, engineering, policy, or operations, we’d love to hear from you. Together, we can create powerful AI that serves humanity. 

### Connect with Us
For more details on open positions and to learn more about our work, visit our [Careers page](#) or follow us on [LinkedIn](#) and [Twitter](#).

---

**Anthropic**: Building safer AI for a brighter future.

In [48]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company profile page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}



# Welcome to Hugging Face

**Hugging Face** is at the forefront of the AI community, dedicated to building the future of machine learning innovation by democratizing and facilitating access to state-of-the-art AI tools and resources.

## Company Mission
Our mission is to democratize good machine learning, one commit at a time. We aim to create an open space where developers, researchers, and organizations can collaborate on models, datasets, and applications within the machine learning ecosystem.

## Community Engagement
Currently, **over 50,000 organizations** leverage Hugging Face technologies, including notable names like AWS, Google, Microsoft, and Meta. We provide a robust platform for sharing and improving machine learning models, making it easier for the community to engage in meaningful collaborations.

### Top Features:
- **Wide Array of Models**: Access to **400,000+ models** for various tasks including NLP, CV, and audio tasks.
- **Rich Datasets**: Browse through **100,000+ datasets** that aid in developing effective machine learning solutions.
- **Spaces for Applications**: Create and deploy applications seamlessly using our **Spaces** to demonstrate the power of your models.

## Company Culture
At Hugging Face, we cultivate a culture of inclusivity and collaboration. We believe in pushing boundaries together and inspire our team to take risks and innovate through open-source contributions. With around **218 team members**, we celebrate diversity and aim to foster an environment that encourages creativity, collaboration, and innovation.

### Work Environment
We advocate for a flexible, remote-first working environment that allows team members to balance their responsibilities while contributing to impactful AI solutions. 

## Careers
We are continuously searching for talented individuals who are passionate about advancing machine learning. If you’re interested in joining a vibrant community and making a significant impact in the AI field, explore our **[current openings](#)**!

## Join Us on Our Journey
Whether you are a researcher, a developer, or an AI enthusiast, come be part of this vibrant community. Together, we can shape the future with cutting-edge technology and collaborative spirit.

**Get in Touch**: Visit our [website](https://huggingface.co) or connect with us on [LinkedIn](#) and [GitHub](#).

---

*Hugging Face – The AI Community Building the Future.*



## Business Applications

In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.

In terms of techniques, this is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.

In terms of business applications - generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype.