<a href="https://colab.research.google.com/github/micah-shull/LLMs/blob/main/LLM_003_role_system_user_prompts_summarize_webpage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# !pip install python-dotenv
# !pip install openai

### Install Libraries

### 1. **`python-dotenv`**
   - **Purpose**: The `python-dotenv` library is used for managing **environment variables** in Python applications. It allows you to store configuration values (like API keys, database URLs, or secret tokens) in a `.env` file and load them into your Python script as environment variables. This helps keep sensitive information out of your source code and allows for easy configuration.
   - **Common Uses**:
     - Loading environment variables from a `.env` file into the application’s environment.
     - Storing sensitive data (API keys, secrets, etc.) securely without hardcoding them into the codebase.

### 2. **`openai`**
   - **Purpose**: The `openai` library is the official Python client for **OpenAI’s API**. It allows you to interact with OpenAI models like **GPT-3**, **GPT-4**, **Codex**, and **DALL·E**. This library is used for performing tasks such as natural language generation, question answering, text summarization, code generation, and image creation.
   - **Common Uses**:
     - Interacting with OpenAI’s language models for text completion, code generation, summarization, translation, and more.
     - Generating images using models like **DALL·E**.
     - Building AI-powered applications like chatbots or content generation tools.





In [None]:
import os
from dotenv import load_dotenv
import openai

print(openai.__version__)

1.52.0


Creating an `.env` file to store your **OpenAI API key** is a great way to manage sensitive information securely and easily. I’ll walk you through the steps to create and use an `.env` file in your Colab notebook.

### Step-by-Step Guide to Creating and Using an `.env` File

#### Step 1: **Create an `.env` File**

1. First, you'll create a file named `.env` that contains your API key.
2. In this file, you'll define the environment variable (in this case, `OPENAI_API_KEY`).

Here’s how you can create the `.env` file and write your API key to it programmatically in a Colab notebook:

This code:
- Creates a new `.env` file at the location `/content/.env`.
- Writes the OpenAI API key to the file in the format `OPENAI_API_KEY=your_api_key`.



In [None]:
import os

# Path to the .env file
env_file_path = '/content/OPENAI_API_KEY.env'

# Your OpenAI API key
api_key = 'key###'

# Create the .env file and write the API key
with open(env_file_path, 'w') as f:
    f.write(f"OPENAI_API_KEY={api_key}\n")

print(f"OPENAI_API_KEY.env file created at: {env_file_path}")

OPENAI_API_KEY.env file created at: /content/OPENAI_API_KEY.env


In [None]:
# note the .env file is a hidden file and cannot be seen in the content folder
# List all files (including hidden ones) in the /content/ folder
!ls -la /content/

total 20
drwxr-xr-x 1 root root 4096 Oct 19 13:29 .
drwxr-xr-x 1 root root 4096 Oct 19 13:26 ..
drwxr-xr-x 4 root root 4096 Oct 17 13:21 .config
-rw-r--r-- 1 root root  180 Oct 19 13:29 OPENAI_API_KEY.env
drwxr-xr-x 1 root root 4096 Oct 17 13:21 sample_data


#### Step 2: **Load the `.env` File**

Now that the `.env` file is created, you need to load the environment variables from it using the `dotenv` library.

- **`load_dotenv('/content/.env')`**: This loads the environment variables from the `.env` file into your environment.
- **`os.getenv("OPENAI_API_KEY")`**: Retrieves the API key from the environment.



In [None]:
from dotenv import load_dotenv
import os

# Load the environment variables from the .env file
load_dotenv('/content/OPENAI_API_KEY.env')

# Get the API key from the environment
api_key = os.getenv("OPENAI_API_KEY")

# Print the API key to confirm it's loaded correctly
print(f"API Key loaded from OPENAI_API_KEY.env: {api_key[0:30]}")

# Set the API key for the OpenAI library
openai.api_key = api_key

API Key loaded from OPENAI_API_KEY.env: sk-proj-mfl7r6HDybev5pVz7ZoIDE


#### Step 3: **Use the API Key with OpenAI**

Once the key is loaded, you can use it with the OpenAI API as before:

### Common Pitfalls:
- **Make sure the API key is correct**: The `AuthenticationError` you're seeing might be because of a typo or using an outdated API key. Double-check that you're using the correct key from the [OpenAI dashboard](https://platform.openai.com/account/api-keys).
- **Ensure the `.env` file is correctly loaded**: If the `.env` file isn't being found, ensure that the path to it (`/content/.env` in this case) is correct.

### Recap:
1. **Create an `.env` file** with your API key in this format: `OPENAI_API_KEY=your_api_key`.
2. **Load the `.env` file** using `dotenv`.
3. **Retrieve the API key** using `os.getenv("OPENAI_API_KEY")` and use it with the OpenAI library.



#### Download .env file

In [None]:
from google.colab import files

# Download the .env file
files.download('/content/OPENAI_API_KEY.env')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Import Libraries

### 1. **`os`** Library in Python

The **`os`** library in Python provides a way to interact with the operating system, allowing you to perform various system-related tasks. It provides functionalities for managing files, directories, processes, and environment variables. This library is especially useful for file handling, directory manipulation, and accessing system-level information in a platform-independent manner.

### Key Features:
1. **File and Directory Operations**:
   - **`os.path`**: A submodule for common path manipulations.
   - **`os.makedirs()`**: Create directories recursively.
   - **`os.remove()`**: Delete a file.
   - **`os.listdir()`**: List all files and directories in a directory.

2. **Environment Variable Management**:
   - **`os.getenv()`**: Get the value of an environment variable.
   - **`os.environ`**: Access and modify the environment variables.

3. **Process Management**:
   - **`os.system()`**: Execute a system command.
   - **`os.getpid()`**: Get the process ID of the current process.
   - **`os.exit()`**: Terminate the current process.

### 2. **`python-dotenv`**
   - **Purpose**: The `python-dotenv` library is used for managing **environment variables** in Python applications. It allows you to store configuration values (like API keys, database URLs, or secret tokens) in a `.env` file and load them into your Python script as environment variables. This helps keep sensitive information out of your source code and allows for easy configuration.
   - **Common Uses**:
     - Loading environment variables from a `.env` file into the application’s environment.
     - Storing sensitive data (API keys, secrets, etc.) securely without hardcoding them into the codebase.

### 3. **`requests`**
   - **Purpose**: The `requests` library is one of the most popular Python libraries for making HTTP requests. It allows you to send HTTP requests to interact with web services (APIs) and retrieve data from web pages.
   - **Common Uses**:
     - Sending GET, POST, PUT, DELETE requests to APIs.
     - Handling responses from web servers.
     - Downloading files or scraping web content.

   **Example**:
   ```python
   import requests
   response = requests.get("https://api.example.com/data")
   print(response.json())
   ```

### 4. **`dotenv`**
   - **Purpose**: The `dotenv` library is used for reading environment variables from a `.env` file. This is useful for managing sensitive configuration data, such as API keys and secrets, without hardcoding them into your code.
   - **Common Uses**:
     - Loading environment variables from a `.env` file.
     - Keeping sensitive information like API keys secure.

   **Example**:
   ```python
   from dotenv import load_dotenv
   import os

   load_dotenv()  # Load environment variables from .env file
   api_key = os.getenv("API_KEY")
   ```

### 5. **`BeautifulSoup` (from `bs4`)**
   - **Purpose**: BeautifulSoup is a library used for **web scraping** and parsing HTML or XML documents. It provides Pythonic ways to search, navigate, and modify the parse tree of a webpage.
   - **Common Uses**:
     - Scraping data from websites.
     - Parsing HTML or XML files to extract information.
     - Automating interactions with websites.

   **Example**:
   ```python
   from bs4 import BeautifulSoup
   import requests

   response = requests.get("https://example.com")
   soup = BeautifulSoup(response.content, "html.parser")
   print(soup.title.text)  # Prints the title of the webpage
   ```

### 6. **`IPython.display.Markdown`**
   - **Purpose**: This is a utility from the `IPython.display` module that allows you to display rich content, like formatted Markdown, in Jupyter or Colab notebooks.
   - **Common Uses**:
     - Displaying Markdown (formatted text) directly in a notebook.
     - Rendering interactive HTML, images, or rich media in notebook environments.

   **Example**:
   ```python
   from IPython.display import Markdown, display
   display(Markdown("# Hello, World!"))
   ```




In [None]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

## Website Class

#### Brief Description of the `Website` Class:

This **`Website` class** is designed to fetch and process the content of a webpage. It performs the following key tasks:

1. **Initialization (`__init__` method)**:
   - Takes a URL as input.
   - Uses the `requests` library to make an HTTP request to the URL. If the request fails (e.g., due to network issues or an invalid URL), it handles the error gracefully and sets the webpage content to "No content".
   
2. **HTML Parsing with BeautifulSoup**:
   - After successfully retrieving the webpage, it parses the HTML content using the `BeautifulSoup` library.
   - Extracts the **title** of the webpage and stores it. If no title is found, it sets the title to "No title found".
   
3. **Cleaning Irrelevant Elements**:
   - It removes irrelevant HTML tags such as `<script>`, `<style>`, `<img>`, and `<input>` to avoid including unnecessary content like JavaScript or CSS in the text.
   
4. **Extracting the Webpage Text**:
   - The class then extracts all text from the webpage’s `<body>` tag and stores it in `self.text`.
   - If there is no `<body>` tag, it sets the content to "No content in body".




---

### Limitations of the `Website` Class:
1. **Static Text Extraction**:
   - This approach only extracts plain text from the webpage. It does not understand or summarize the content. All text, whether relevant or not, is included (e.g., legal notices, footers, or advertisements might be part of the extracted text).
   
2. **No Content Understanding**:
   - The class does not distinguish between important and unimportant content, nor does it provide any insights, analysis, or summaries of the text. It only performs basic text extraction and cleaning.

---

### Comparison to a ChatGPT Summarizer:

When using **ChatGPT** to summarize a webpage’s content, it provides several key improvements over this basic text extraction:

1. **Understanding of Context**:
   - ChatGPT understands the context of the text and can identify the main ideas, key points, and important sections of the webpage.
   - Instead of extracting raw text, it can generate concise summaries that focus on the essential content.

2. **Intelligent Summarization**:
   - Unlike the `Website` class that extracts large chunks of text, ChatGPT will **reduce the length** of the content while preserving the key information, making it more digestible.

3. **Content Prioritization**:
   - ChatGPT will prioritize important content (like the main topic or key sections of an article) over irrelevant sections (like ads, headers, or footers) because of its language understanding.

4. **Customization**:
   - With ChatGPT, you can provide specific instructions (e.g., "Summarize the benefits of using `python-dotenv`") and it will tailor the output based on your needs. The `Website` class just extracts whatever text is available.





In [None]:
# A class to represent a Webpage

class Website:
    url: str
    title: str
    text: str

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

cnn = Website("https://www.cnn.com/entertainment")
print(cnn.title,'\n')
print(cnn.text)

Entertainment | CNN 

CNN values your feedback
1. How relevant is this ad to you?
2. Did you encounter any technical issues?
Video player was slow to load content
Video content never loaded
Ad froze or did not finish loading
Video content did not start after ad
Audio on ad was too loud
Other issues
Ad never loaded
Ad prevented/slowed the page from loading
Content moved around while ad loaded
Ad was repetitive to ads I've seen previously
Other issues
Cancel
Submit
Thank You!
Your effort and contribution in providing this feedback is much
                                        appreciated.
Close
Ad Feedback
Close icon
Entertainment
Movies
Television
Celebrity
More
Movies
Television
Celebrity
Watch
Listen
Live TV
Subscribe
Sign in
My Account
Settings
Topics You Follow
Sign Out
Your CNN account
Sign in to your CNN account
Sign in
My Account
Settings
Topics You Follow
Sign Out
Your CNN account
Sign in to your CNN account
Live TV
Listen
Watch
Edition
US
International
Arabic
Espa√±ol
Edition

## Webpage ChatGPT Summarization

Let’s proceed to integrate the **ChatGPT summarizer**. This will allow us to take the extracted text from the `Website` class (or directly from the webpage) and provide a more meaningful summary.

### 1. **System Prompt**:

```python
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."
```

#### Explanation:
- **Purpose**: The **system prompt** sets up the role of the AI (ChatGPT) and defines how it should behave throughout the interaction.
- **Function**:
  - **Role Definition**: You tell the assistant that its job is to analyze the contents of a webpage and provide a summary.
  - **Instructions**: It specifies that the assistant should ignore navigation-related text (e.g., menus, footers, sidebars), which is often irrelevant to the main content.
  - **Response Format**: You instruct the assistant to respond in **Markdown** format. Markdown is a lightweight markup language that allows for easy formatting (e.g., headers, bold text, lists) when rendering the output.

#### Why It’s Needed:
- **Guides the AI’s Behavior**: The system prompt helps set up the assistant’s context, ensuring it knows that it needs to summarize webpage content, ignore irrelevant parts, and return results in a specific format.
- **Consistency**: It ensures consistent, structured responses, especially if you want output in a particular format like Markdown.

---

### 2. **User Prompt**:

```python
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt
```

#### Explanation:
- **Purpose**: The **user prompt** provides specific information about the website that the assistant will summarize.
- **Function**:
  - **Website Title**: The prompt starts by stating the title of the webpage, helping the assistant understand what the webpage is about.
  - **Instructions**: You ask the assistant to provide a short summary in Markdown format. Additionally, you ask it to summarize any news or announcements, making the prompt more tailored to the type of content on the website.
  - **Webpage Text**: Finally, the full text extracted from the webpage (stored in `website.text`) is appended to the prompt. This is the data that the assistant will analyze to create the summary.

#### Why It’s Needed:
- **Guides the Assistant**: The **user prompt** directs the assistant by giving it the actual content it needs to summarize, as well as instructions on how to summarize it.
- **Custom Instructions**: It provides specific guidance, such as ensuring that news or announcements are summarized, which makes the summary more useful for specific types of content.

---

### How These Prompts Fit Into the ChatGPT API Workflow:

1. **System Prompt**: Sets the overall context for the conversation and helps the assistant understand its role (in this case, a summarizer).
2. **User Prompt**: Supplies the specific content and instructions for what the user wants the assistant to do (in this case, summarizing a website).

### Example Workflow:
- **System Prompt**:
   - "You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown."
   
- **User Prompt** (for a specific website):
   - "You are looking at a website titled 'Python Dotenv'. The contents of this website are as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too."

- **Combined Result**: ChatGPT uses the system prompt to know how to behave, and the user prompt provides the specific webpage content to analyze and summarize.

### Why Prompts Are Important:
- **Contextual Understanding**: Proper prompts help ChatGPT understand the **context**, which in turn improves the quality of its responses.
- **Customization**: By customizing the user prompt, you can fine-tune the AI’s behavior for specific tasks, such as summarizing websites, answering questions, or providing technical explanations.
  
### Next Steps:
Now that you have a better understanding of the **system prompt** and **user prompt**, we can move on to integrating this into the **ChatGPT summarizer function** to provide a more robust summary of the webpage content.



### Prompts

In [None]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
                    please provide a short summary of this website in markdown. \
                    If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

### Messages

In [None]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

### Summarize

In [None]:
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    print(response.choices[0].message.content)

url = "https://www.cnn.com/entertainment"
# url = "https://pypi.org/project/python-dotenv/"
summarize(url)

# Summary of CNN Entertainment

The **CNN Entertainment** section covers the latest news and features related to movies, television, and celebrity culture. Recent highlights include:

- **Liam Payne's Death**: Multiple reports and tributes, including one from One Direction, mourn the passing of the singer while investigations into the circumstances surrounding his death continue.
- **Mitzi Gaynor**: The star of "South Pacific" has passed away at the age of 93.
- **Whoopi Goldberg**: Discusses finding peace after the loss of someone close to her.
- **Upcoming Films**: Insights into upcoming projects, such as the script draft for "Spider-Man 4" read by Tom Holland and Zendaya, and James Gunn's announcement regarding Krypto in an upcoming Superman film.
- **Cultural Commentary**: Commentary on the portrayal of female rabbis and humor in contemporary comedy.

Additionally, there's a focus on Halloween-themed movie and TV recommendations as well as notable celebrity appearances and events, 

### Markdown Summary

In [None]:
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    # print(response.choices[0].message.content)
    return response.choices[0].message.content # return instead of print

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

display_summary(url)

# CNN Entertainment Summary

CNN Entertainment provides news and updates about various facets of the entertainment industry, including movies, television, and celebrity news. Recent highlights feature tributes and announcements regarding notable figures in the entertainment world:

- **Liam Payne**: Following unfounded reports about his death, One Direction released a group statement mourning him. Investigations into the circumstances surrounding Payne's alleged fatal fall continue.
- **Cheryl Cole**: Paid tribute to Liam Payne, condemning sensationalist reports about his condition.
- **Mitzi Gaynor**: Known for her role in *South Pacific*, she recently passed away at the age of 93.
- **Whoopi Goldberg**: Reflects on finding peace amid personal loss.
- **Tom Holland and Zendaya**: They shared their thoughts after reading a draft script for *Spider-Man 4*.
- **Cynthia Erivo**: Expresses her disappointment over changes made to the *Wicked* movie poster.
- **James Gunn**: Announced the inclusion of Krypto, Superman’s dog, in an upcoming film project.

Additionally, the website covers a range of topics related to entertainment consumption recommendations, including a spooky movie guide for the Halloween season and highlights of recent celebrity interactions.

## Building Prompts

In the context of **ChatGPT API** (or more specifically, the `ChatCompletion` API in OpenAI), the primary inputs are **system prompts**, **user prompts**, and **assistant responses**. These inputs form part of a structured conversation, where different roles (system, user, assistant) contribute to the interaction.

However, you can also customize various aspects of the interaction through additional parameters. Let’s break it down:

### Primary Inputs (Core Messages):

1. **System Prompt**:
   - **Role**: `"system"`
   - **Purpose**: Provides the context or defines the role the assistant should play during the interaction. It helps set the stage for the assistant's behavior and tone.
   - **Example**:
     ```json
     {"role": "system", "content": "You are an assistant that summarizes websites."}
     ```

2. **User Prompt**:
   - **Role**: `"user"`
   - **Purpose**: This is the main input from the user, providing specific instructions or questions that the assistant will respond to. It's the query or task for the assistant to address.
   - **Example**:
     ```json
     {"role": "user", "content": "Please summarize the following webpage content: ..."}
     ```

3. **Assistant Response** (Optional):
   - **Role**: `"assistant"`
   - **Purpose**: This is the response generated by the assistant during multi-turn conversations. It's used to simulate the assistant's prior responses when continuing an ongoing conversation.
   - **Example**:
     ```json
     {"role": "assistant", "content": "Here is a summary of the website: ..."}
     ```

### Example Structure of a Chat Completion Request:
You typically send a list of messages representing the interaction:

```python
messages = [
    {"role": "system", "content": "You are an assistant that summarizes websites."},
    {"role": "user", "content": "Summarize the following webpage content: ..."}
]
```

The assistant will then process these messages and respond based on the provided context and instructions.

---

### Additional Parameters (Beyond System and User Prompts):

The **ChatGPT API** (`ChatCompletion.create`) accepts several additional parameters that help control the behavior of the model:

1. **Model**:
   - **Purpose**: Specifies which OpenAI model to use. For example, you might use `"gpt-3.5-turbo"` or `"gpt-4"`.
   - **Example**:
     ```python
     model="gpt-3.5-turbo"
     ```

2. **Max Tokens**:
   - **Purpose**: Limits the number of tokens (words or subwords) in the model's output. It controls the length of the response.
   - **Example**:
     ```python
     max_tokens=150
     ```

3. **Temperature**:
   - **Purpose**: Controls the **creativity** of the model's responses. A low temperature (e.g., `0.1`) results in more deterministic responses, while a higher temperature (e.g., `0.9`) encourages more randomness and creativity.
   - **Example**:
     ```python
     temperature=0.7
     ```

4. **Top_p**:
   - **Purpose**: This parameter controls "nucleus sampling." Instead of considering all possible words (tokens) during generation, the model chooses from a subset that represents the top `p` probability mass (e.g., `top_p=0.9` means considering only the most probable tokens until they reach 90% of the probability distribution).
   - **Example**:
     ```python
     top_p=0.9
     ```

5. **Frequency Penalty**:
   - **Purpose**: Discourages the model from repeating the same lines of text. A higher value reduces repetition.
   - **Example**:
     ```python
     frequency_penalty=0.5
     ```

6. **Presence Penalty**:
   - **Purpose**: Encourages the model to talk about new topics by penalizing previously mentioned topics. This can encourage more diverse outputs.
   - **Example**:
     ```python
     presence_penalty=0.5
     ```

7. **Stop Sequences**:
   - **Purpose**: Specifies a sequence of characters where the model should stop generating text. This is useful for controlling when the response should end (e.g., stopping at a certain delimiter or end of a section).
   - **Example**:
     ```python
     stop=["\n", "End of summary"]
     ```

---

### Example Chat Completion API Request:
Here’s how you might structure a request to the ChatGPT API using the system and user prompts, plus additional parameters:

```python
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an assistant that summarizes websites."},
        {"role": "user", "content": "Summarize the following webpage content: ..."}
    ],
    max_tokens=150,
    temperature=0.7,
    top_p=0.9,
    frequency_penalty=0.5,
    presence_penalty=0.0,
    stop=["\n"]
)
```

### Summary:
- **System Prompt**: Provides high-level instructions on the role of the assistant (e.g., "You are an assistant that summarizes websites").
- **User Prompt**: Provides the specific task or question for the assistant (e.g., "Please summarize the following webpage content").
- **Additional Parameters**: Such as `max_tokens`, `temperature`, `top_p`, and others can be used to fine-tune the behavior of the model.
  
Together, these elements give you a lot of control over how ChatGPT interacts and responds in a conversation, allowing for custom prompts and personalized behavior.

Would you like to see how this all fits into the summarization process using the website content?

In [None]:
import openai
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

# System prompt defines the role of the assistant
system_prompt = """You are an assistant that analyzes the contents of a website
and provides a short summary, ignoring text that might be navigation related.
Respond in markdown."""

# Function to generate the user prompt based on the website content
def user_prompt_for(website):
    return f"""
    You are looking at a website titled "{website.title}". The contents of this website are as follows:
    Please provide a short summary of this website in markdown. If it includes news or announcements,
    then summarize these as well.

    {website.text}
    """

# Function to create the messages for the API call
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

# Simplified Website class for fetching and parsing webpage content
class Website:
    def __init__(self, url):
        self.url = url
        self.title, self.text = self.fetch_website_content()

    def fetch_website_content(self):
        try:
            response = requests.get(self.url)
            response.raise_for_status()
        except requests.exceptions.RequestException as e:
            return "Error", f"Error fetching the URL {self.url}: {e}"

        soup = BeautifulSoup(response.content, 'html.parser')
        title = soup.title.string if soup.title else "No title found"

        # Clean and extract body text
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            body_text = soup.body.get_text(separator="\n", strip=True)
        else:
            body_text = "No content in body"

        return title, body_text

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        max_tokens=500,
        # temperature=0.7,
        # top_p=0.9,
        # frequency_penalty=0.5,
        # presence_penalty=0.0,
        # stop=["\n"],
        messages = messages_for(website)
    )
    return response.choices[0].message.content

# Function to display the summary of the webpage
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

# Example usage
url = "https://www.cnn.com/entertainment"
display_summary(url)

# Summary of CNN Entertainment

The CNN Entertainment section offers a comprehensive view of the latest trends and news in movies, television, and celebrity culture. Recent highlights include:

- **Tributes and Mournings**: Cheryl Cole honored Liam Payne while condemning reports about his death, and One Direction released a statement mourning him as well. Mitzi Gaynor, famous for her role in *South Pacific*, recently passed away at the age of 93.
  
- **Ongoing Investigations**: Questions linger regarding the investigation into Liam Payne’s fatal accident.

- **Celebrity News**:
  - Tom Holland and Zendaya have discussed a draft for *Spider-Man 4*.
  - Cynthia Erivo expressed her discontent with edited *Wicked* movie posters.
  - Mariah Carey made headlines with a humorous complaint.
  - Personal anecdotes include Jerry Seinfeld's change of perspective on comedy, and James Gunn's comments about Krypto appearing in an upcoming film.

- **Cultural Reflections**: Discussions around portrayals of Jewish culture in media, particularly by female rabbis.

- **What to Watch**: Recommendations for Halloween-themed movies and series, and insights into new projects from notable artists such as Will Ferrell and Kathy Bates.

The site is updated with a mix of serious and light-hearted content, covering both industry news and reflections on popular culture.

Now that we have a working web summarizer, we can experiment with prompts to get a better understanding of how they influence the assistant’s behavior and responses. Here are some ways you can toy around with system and user prompts to learn more about how they work.

### 1. **Experiment with System Prompts:**
The system prompt is crucial because it sets the overall behavior of the assistant. By adjusting the system prompt, you can guide the tone, style, or format of the response.

#### Examples:
- **Change the Tone or Style**:
   ```python
   system_prompt = "You are a playful assistant that summarizes websites with a humorous and lighthearted tone. Respond in markdown."
   ```
   **Effect**: The assistant might generate a summary in a more casual and humorous tone.

- **Professional Summarizer**:
   ```python
   system_prompt = "You are a professional assistant that provides highly detailed and precise summaries of websites in a formal tone. Respond in markdown."
   ```
   **Effect**: The summary will likely be more formal and precise.

- **Limiting the Summary**:
   ```python
   system_prompt = "You are an assistant that provides only a brief, one-sentence summary of websites."
   ```
   **Effect**: The assistant will aim to summarize the content in just one sentence.

#### What You Learn:
- How the **assistant’s behavior** can be influenced by setting its "role" or "tone" in the system prompt.
- The importance of being specific and clear when you want particular kinds of output.

### 2. **Experiment with User Prompts:**
User prompts provide specific instructions for the task. You can try different formulations to understand what makes for an effective user prompt.

#### Examples:
- **Asking for Key Information**:
   ```python
   user_prompt = f"Provide the three most important takeaways from this website titled {website.title}."
   ```
   **Effect**: Instead of a general summary, the assistant will focus on extracting key points or takeaways.

- **Expanding the Scope**:
   ```python
   user_prompt = f"Summarize the content of this website and provide suggestions for further reading on the same topic."
   ```
   **Effect**: This encourages the assistant to go beyond summarizing and offer related information.

- **Highlighting Specific Sections**:
   ```python
   user_prompt = f"Summarize the news section of this website titled {website.title}. Ignore any unrelated content."
   ```
   **Effect**: The assistant will focus on summarizing just the relevant section and ignore other content.

#### What You Learn:
- How **different instructions** in the user prompt impact the detail, focus, and structure of the assistant’s response.
- How you can tailor user prompts to get more actionable, focused, or concise answers.

### 3. **Test Prompt Interactions:**
Prompts interact with each other. For example, the system prompt may ask the assistant to be professional, while the user prompt might ask for something more casual. You can test how these prompts interact and prioritize.

#### Examples:
- **Contrasting Prompts**:
   - **System Prompt**: "You are a highly formal and professional assistant."
   - **User Prompt**: "Give me a super casual, chill summary of this website."
   
   **Effect**: The model may prioritize one over the other. Experimenting like this helps you understand how the system prompt and user prompt balance each other.

#### What You Learn:
- How **conflicting instructions** impact the output.
- Which prompt seems to carry more weight depending on the context.

### 4. **Adding Constraints to the Output:**
You can also experiment with prompts by placing constraints on the output, such as length limits or specific formats.

#### Examples:
- **Length Constraints**:
   ```python
   user_prompt = f"Provide a summary of this website in no more than 50 words."
   ```

- **Specific Formatting**:
   ```python
   user_prompt = f"Summarize this website in the form of a bullet-point list."
   ```

#### What You Learn:
- How well the assistant can follow **structural constraints** like length or formatting.
- How to formulate prompts for specific kinds of responses (e.g., bulleted lists, structured formats).

### 5. **Modify Based on Feedback**:
Try modifying the user prompts based on how the model responds. You may want to tweak wording or provide additional details based on the output you receive.

#### Example:
- **Original User Prompt**: "Summarize the website."
- **Feedback**: The output was too general.
  
- **Improved Prompt**:
   ```python
   user_prompt = f"Summarize the website in a detailed paragraph, focusing on the key arguments or points made in the text."
   ```

#### What You Learn:
- How **iterative refinement** of prompts can improve the quality of responses.

### 6. **Advanced: Test with Different Roles (Assistant, User, System)**:
While you’ve been using the system and user roles, you can also experiment with multi-turn conversations where the assistant responds multiple times and its own responses influence future outputs.

#### Example:
- After the initial summary, you could create a follow-up interaction:
   ```python
   assistant_response = "The summary is: ..."
   follow_up = f"Can you expand on the key arguments in more detail?"
   ```
   **Effect**: You can simulate conversations where the assistant keeps refining the output based on additional instructions.

---

### Summary of Key Learning Areas:
1. **Tone and Style**: Experiment with how different tones (professional, casual, humorous) affect the output.
2. **Focus and Detail**: Learn how focusing on specific sections or key points changes the summary.
3. **Prompt Interaction**: Test how system and user prompts interact and whether conflicting prompts lead to different results.
4. **Output Structure**: Play around with structuring the output, such as using lists, paragraphs, or concise statements.
5. **Iterative Prompting**: Learn how refining prompts based on feedback improves responses.

By experimenting with these strategies, you can gain a deep understanding of how to craft effective prompts and leverage them to get the desired output from the assistant. Let me know if you’d like to dive deeper into any specific strategy!

### User Prompt
Here's an example of a system message that modifies the behavior of the user message to reply in the style of a southern belle from the southeast United States.

In [None]:
import openai
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

# System prompt defines the role of the assistant with a "text" type field
system_prompt = {
    "type": "text",
    "text": """You are an assistant that analyzes the contents of a website
    and provides a short summary, ignoring text that might be navigation related.
    Respond in markdown.
    """
}

# Function to generate the user prompt based on the website content
def user_prompt_for(website):
    return {
        "type": "text",
        "text": f"""
        You are looking at a website titled "{website.title}". The contents of this website are as follows:
        Please provide a short summary of this website in markdown in the style of a southern belle from
        the southeast United States. If it includes news or announcements, then summarize these as well.
        {website.text}
        """
    }

# Function to create the messages for the API call, with "text" field
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt["text"]},  # Use the "text" field for content
        {"role": "user", "content": user_prompt_for(website)["text"]}
    ]

# Simplified Website class for fetching and parsing webpage content
class Website:
    def __init__(self, url):
        self.url = url
        self.title, self.text = self.fetch_website_content()

    def fetch_website_content(self):
        try:
            response = requests.get(self.url)
            response.raise_for_status()
        except requests.exceptions.RequestException as e:
            return "Error", f"Error fetching the URL {self.url}: {e}"

        soup = BeautifulSoup(response.content, 'html.parser')
        title = soup.title.string if soup.title else "No title found"

        # Clean and extract body text
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            body_text = soup.body.get_text(separator="\n", strip=True)
        else:
            body_text = "No content in body"

        return title, body_text

# Function to summarize a website using OpenAI
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        max_tokens=500,
        # temperature=0.7,
        # top_p=0.9,
        # frequency_penalty=0.5,
        # presence_penalty=0.0,
        # stop=["\n"],
        messages = messages_for(website)
    )
    return response.choices[0].message.content

# Function to display the summary of the webpage
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

# Example usage
display_summary(url)

# Sweet Summary of CNN's Entertainment Section

Well, bless your heart, darlin'! If y'all find yourselves wanderin' over to CNN's Entertainment section, you're in for a delightful treat! This splendid little corner of the internet is teemin' with the latest buzz on movies, television, and all the glitzy celebrity shenanigans we just can't get enough of.

Recently, the site has been abuzz with heartwrenchin' news, such as the tragic passin' of One Direction's beloved Liam Payne, which left fans feelin' simply devastated. There’s also chatter about Mitzi Gaynor, the star of *South Pacific*, who has crossed over at the age of 93. And bless her soul, Whoopi Goldberg is findin' her way through grief after losin' someone dear to her heart.

They’ve got the lowdown on all sorts of movin' pictures comin' our way, including a peek into Tom Holland and Zendaya's anticipation for a new *Spider-Man 4* script. Plus, get ready for some spooky viewin' with a lovely guide to Halloween-appropriate flicks.

So sugar, whether you’re lookin' for the latest in Hollywood’s happenings or want to keep up with your favorite stars, CNN Entertainment is just the place to catch that sweet gossip!

### System Prompts

In [None]:
# change the tone or style

# System prompt defines the role of the assistant with a "text" type field
system_prompt = {
    "type": "text",
    "text": """You are a playful assistant that summarizes websites with a humorous
    and lighthearted tone. Respond in markdown."
    """
}

def user_prompt_for(website):
    return {
        "type": "text",
        "text": f"""
        You are looking at a website titled "{website.title}". The contents of this website are as follows:
        Please provide a short summary of this website in markdown. If it includes news or announcements,
        then summarize these as well. {website.text}
        """
    }

# Function to display the summary of the webpage
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

display_summary(url)

# 🎬 CNN Entertainment: The Scoop on the Stars!

Welcome to the glitzy, glamorous world of **CNN Entertainment**, where movies, television, and your favorite celebrities frolic together like they’re at a posh Hollywood soiree! Here’s the dish:

## **What’s Happening?**
- **Liam Payne**: Crying emoji alert! The One Direction star's tragic passing has sent ripples through the fandom—a group statement was released to mourn him. And in case you missed this, the investigation into his fatal excursion continues. **Stay tuned for updates!**
  
- **Whoopi Goldberg**: She’s finding peace amid her own grief. Centaurs? No, just a deep dive into life—don’t we all need a Whoopi in our lives?

- **Hugh Jackman**: The man’s got lungs, singing Neil Diamond like he’s trying to save a dying karaoke night. 

- **Tom Holland & Zendaya**: They’ve peeked at a *Spider-Man 4* script—cue the excited gasps—and they have opinions! Probably not suitable for the Spidey-sense.

- **Mariah Carey**: Adding some diva sass by airing her ‘perfectly diva complaint’—she’s keeping it fabulous!

## **Pop Culture Fun!**
- **Andrew Garfield** made a splash at a red carpet event with a cardboard cutout of his co-star. Talk about ‘bringing your own hype!’ 
- **Kristen Bell** lets us in on a little salty joke from *Frozen*—and yes, it’s cooler than Olaf in a snowstorm.

## **Spooky Season Alert!** 🎃
Got your Halloween plans sorted? CNN’s Spooky Season Watch Guide has your back with **13 movies and series** to ensure your chills are adequately freaky!

So, if you’re ready to laugh, cry, or just keep up with all things celebrity and entertainment, CNN Entertainment has you covered! 💖

*Note*: Don’t forget to provide some feedback—your effort is appreciated, and they promise it’s more important than which season of *Friends* you think is the best!

In [None]:
system_prompt = {
    "type": "text",
    "text": """You are a professional assistant that provides highly detailed and
    precise summaries of websites in a formal tone. Respond in markdown. Respond in markdown."
    """
}

display_summary(url)

# Summary of the CNN Entertainment Website

The **"Entertainment | CNN"** website serves as a comprehensive source for news and updates related to movies, television, and celebrity culture. It features a variety of segments, including current topics in the entertainment industry, video content, and links to live TV programming.

## Key Content Areas
- **Movies**: Coverage includes upcoming film releases, behind-the-scenes insights, and notable announcements within the film community.
- **Television**: Updates pertain to television series, special features, and trends in the TV landscape.
- **Celebrity News**: The site focuses on high-profile personalities in the entertainment sector, including tributes, personal stories, and professional updates.

## Recent Announcements and Highlights
- **Liam Payne**: The site discusses the shocking appeal of ongoing speculation regarding Liam Payne's health and recent tragic events, including tributes from fans and fellow celebrities.
- **Whoopi Goldberg**: A piece explores how Goldberg has coped with significant personal loss, resonating with audiences on themes of grief.
- **Upcoming Projects**: News on notable projects includes updates on "Spider-Man 4," where Tom Holland and Zendaya have provided insights on script drafts, and James Gunn's upcoming film featuring Superman's dog, Krypto.

## User Interaction and Feedback
CNN invites user feedback through an ad feedback section, where visitors can provide insights regarding the relevance and performance of the ads displayed, including discussions on technical issues related to video content and advertisements.

In summary, the CNN Entertainment site is a vital resource for anyone seeking the latest updates and in-depth coverage of developments in the entertainment industry, combined with user engagement opportunities to enhance the visitor experience.

### User Prompts

In [None]:
system_prompt = {
    "type": "text",
    "text": """You are a professional assistant that provides highly detailed and
    precise summaries of websites in a formal tone. Respond in markdown."
    """
}

def user_prompt_for(website):
    return {
        "type": "text",
        "text": f"""
        Provide the three most important takeaways from this website titled {website.title}.
        """
    }

display_summary(url)

# Summary of CNN's Entertainment Section

1. **Diverse Coverage of Entertainment News**: CNN’s Entertainment section offers a comprehensive array of articles covering various aspects of the entertainment industry, including updates on movies, television shows, music, and celebrity news. This diversity ensures that readers stay informed about current trends and significant events in the entertainment world.

2. **In-Depth Analysis and Features**: The content frequently includes in-depth analyses and features that explore cultural phenomena, the impact of entertainment on society, and profiles of influential figures in the industry. This analytical approach enriches the reader’s understanding of both the entertainment landscape and its broader implications.

3. **Real-Time Updates and Interviews**: The section is regularly updated with real-time news, interviews, and exclusive content with prominent entertainers. This commitment to timely reporting allows readers to remain engaged with the latest developments and insights from industry leaders and artists.

For further information, you may visit the website directly at [CNN Entertainment](https://www.cnn.com/entertainment).

**Contrasting Prompts**:
   - **System Prompt**: "You are a highly formal and professional assistant."
   - **User Prompt**: "Give me a super casual, chill summary of this website."
   
   **Effect**: The model may prioritize one over the other. Experimenting like this helps you understand how the system prompt and user prompt balance each other.

In [None]:
system_prompt = {
    "type": "text",
    "text": """You are a highly formal and professional assistant. Respond in markdown.
    """
}

def user_prompt_for(website):
    return {
        "type": "text",
        "text": f"""
        Give me a super casual, chill summary of this website. {website.title}.
        """
    }

display_summary(url)

### CNN Entertainment Summary

CNN's Entertainment section serves up the latest in movies, music, TV shows, and celebrity news. You can find everything from blockbuster reviews and juicy gossip to features on artists and trends shaping the entertainment industry. It's a go-to spot for anyone looking to stay updated on what's hot in pop culture, with a mix of serious articles and light-hearted content. Whether you're into the latest film releases or upcoming albums, CNN has you covered with engaging articles and insightful commentary. Just a relaxed way to get your entertainment fix!

###  **System Prompts** and **User Prompts**

The difference between **system prompts** and **user prompts** in the context of the OpenAI API (such as ChatGPT) is based on their **purpose** and **role** within a conversation. Both are part of the message structure used in `ChatCompletion` API calls, but they serve distinct functions.

### 1. **System Prompts**:
   - **Role**: `"system"`
   - **Purpose**: The system prompt defines the **role**, **tone**, and **behavior** of the assistant throughout the interaction. It sets up the context for how the assistant should approach the task.
   - **When It’s Used**: Typically used once at the beginning of a conversation to establish the assistant's identity or guide its behavior in the rest of the interaction.
   - **Function**: Think of the system prompt as instructions for **how** the assistant should behave, without providing task-specific content. It affects the assistant’s overall tone, personality, or conversational style.
   
   #### Example:
   ```python
   {"role": "system", "content": "You are a helpful assistant that provides concise and accurate technical information."}
   ```
   In this example, the assistant is guided to provide **concise and accurate** information during the conversation.

   #### Common Uses:
   - Defining the assistant’s tone (e.g., professional, casual, humorous).
   - Specifying the assistant’s role (e.g., acting as a teacher, summarizer, customer service agent).
   - Establishing broad behavioral guidelines (e.g., the assistant should **always** be helpful, friendly, and polite).

---

### 2. **User Prompts**:
   - **Role**: `"user"`
   - **Purpose**: The user prompt provides **task-specific** input. It tells the assistant exactly what the user wants it to do. This is where the actual questions or instructions for completing the task are provided.
   - **When It’s Used**: User prompts are the **main interaction** in the conversation. Each time the user wants the assistant to perform a task, it sends a user prompt.
   - **Function**: It instructs the assistant on **what** to do, such as answering a question, summarizing content, or performing a specific task.

   #### Example:
   ```python
   {"role": "user", "content": "Please summarize the following article about machine learning."}
   ```

   #### Common Uses:
   - Asking the assistant to perform tasks (e.g., "Translate this text", "Summarize this website").
   - Asking questions (e.g., "What is the theory of relativity?").
   - Providing specific details (e.g., "Please explain this Python code step by step.").

---

### Key Differences:

| Aspect             | **System Prompt**                                    | **User Prompt**                             |
|--------------------|------------------------------------------------------|---------------------------------------------|
| **Role**           | `"system"`                                            | `"user"`                                    |
| **Purpose**        | Sets the assistant’s role, tone, and behavior globally. | Tells the assistant what specific task to do. |
| **When It's Used** | Usually provided once at the start of a conversation.  | Provided whenever the user wants to interact with the assistant. |
| **Content**        | High-level instructions for how the assistant should behave (tone, style). | Specific task-related input (e.g., questions or requests). |
| **Function**       | Guides the assistant’s overall conversational style and purpose. | Provides the content and task-specific instructions to get a result. |

---

### Example with Both System and User Prompts:

```python
messages = [
    {"role": "system", "content": "You are a friendly assistant that provides short, concise answers to any questions."},
    {"role": "user", "content": "What is quantum computing?"}
]
```

- **System Prompt**: Tells the assistant to be **friendly** and provide **short, concise** answers.
- **User Prompt**: Asks the assistant to explain **quantum computing**.

### What Happens:
- The system prompt ensures that the assistant responds in a **friendly** manner and keeps the explanation **concise**.
- The user prompt gives the specific question that the assistant will answer, based on the tone and guidelines set by the system prompt.

---

### Why They Matter:
- **System Prompts**: Shape the assistant’s overall behavior and ensure consistency in tone and approach.
- **User Prompts**: Dictate the tasks or questions you want the assistant to address.

When working together, system and user prompts allow you to fine-tune the assistant’s responses both in terms of **how** it responds (system prompt) and **what** it responds to (user prompt).

Let me know if you'd like to explore more about how you can combine these prompts for different use cases!

In [None]:
system_prompt = {
    "type": "text",
    "text": """You are a playful assistant that summarizes websites with a humorous and lighthearted tone.
    Respond in markdown.
    """
}

def user_prompt_for(website):
    return {
        "type": "text",
        "text": f"""
        Provide the website title and the five most important takeaways from this
        website in the form of a bullet-point list.{website.title}.
        """
    }

display_summary(url)

# Entertainment | CNN

Welcome to the fabulous world of CNN's Entertainment section — where celebrities are bigger than life and drama unfolds faster than you can say "red carpet." Here are the top five takeaways from this star-studded extravaganza:

- **Breaking News**: Forget about world politics! CNN’s Entertainment section keeps you in the know about who’s feuding with whom, which movie bombed at the box office, and which star was caught in their pajamas. Priorities, people!

- **Gossip Central**: Get your daily dose of celebrity gossip served hot and spicy. Who needs fictional drama when real-life shenanigans are happening on Instagram? Spoiler alert: it rarely ends well!

- **Movie Madness**: From blockbusters to indie gems, CNN spills the beans on all things film. You’ll find recommendations and a handy guide to what to binge-watch next – because scrolling through endless titles is just so 2020.

- **Music to Our Ears**: Discover the latest hits and who’s breaking the charts faster than you can say “Taylor Swift.” Find out which artists are stirring up trouble and which ones are quietly sipping their tea (and we don’t mean the drink).

- **Awards & Accolades**: Ready your tuxedos and ball gowns! CNN covers all the glitz and glamour of award shows where the real winners are those who can fit into their outfits after the buffet.

So grab some popcorn and dive into the entertaining whirlwind that is CNN's Entertainment — where every day's a show, and we’re all just here for the snacks and the spectacle! 🍿✨