<a href="https://colab.research.google.com/github/micah-shull/LLMs/blob/main/LLM_003_role_system_user_prompts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **"role," "system," and "user"** prompts in the API code

In LLM APIs, like OpenAI's and similar services, these roles are crucial because they help guide the conversation flow and context for the language model. Let me explain each role and its importance:

1. **"system" role**:
   - This sets the context for the conversation. The system message is used to provide instructions on how the assistant should behave throughout the conversation.
   - It's like setting up the "personality" or purpose of the model.
   - For example, `"role": "system", "content": "You are a helpful assistant that answers questions in a friendly and concise manner."` would make the LLM take on that friendly assistant persona. This influences all the subsequent messages and responses.

2. **"user" role**:
   - This represents the input from the user interacting with the LLM.
   - The content here is essentially the user's question, request, or input that the model will respond to.
   - For example, `"role": "user", "content": "What is the capital of France?"` tells the model what the user wants to know. The model processes this input to generate a response.

3. **"assistant" role**:
   - This is where the LLM's response is specified.
   - It is used to maintain context and keep track of what has already been said in the conversation.
   - For instance, `"role": "assistant", "content": "The capital of France is Paris."` provides the model's answer.

The **importance** of these roles lies in:

- **Structuring Conversations**: The user and assistant roles help the API differentiate between who is asking and who is responding, maintaining a coherent flow of information.
- **Maintaining State and Continuity**: These roles are also used to maintain the conversational history, allowing the LLM to generate context-aware responses as the conversation progresses.

Overall, these roles form the foundational structure for interactive, contextual communication between users and the language model. By appropriately structuring these roles, developers can make the LLM perform tasks more efficiently and maintain an interactive, context-rich conversation.

### Example 1: FAQ Chatbot for Customer Support
- **System Prompt**:
  ```json
  {"role": "system", "content": "You are a helpful customer support assistant for an e-commerce store. You answer questions about orders, refunds, and products."}
  ```
- **User Prompt**:
  ```json
  {"role": "user", "content": "How can I return a product?"}
  ```
- **Assistant Response**:
  ```json
  {"role": "assistant", "content": "You can return a product within 30 days of purchase. Please visit our returns page and fill out the return request form. You'll get a prepaid label to ship the product back."}
  ```

### Example 2: Friendly Conversational Assistant
- **System Prompt**:
  ```json
  {"role": "system", "content": "You are a friendly assistant who loves to chat about movies and recommend great films."}
  ```
- **User Prompt**:
  ```json
  {"role": "user", "content": "Can you recommend a good science fiction movie?"}
  ```
- **Assistant Response**:
  ```json
  {"role": "assistant", "content": "Sure! If you're into science fiction, I recommend watching 'Interstellar' by Christopher Nolan. It's an epic story about space exploration and the survival of humanity, with stunning visuals and a great storyline."}
  ```

### Example 3: Technical Assistant for Coding Help
- **System Prompt**:
  ```json
  {"role": "system", "content": "You are a technical assistant that provides programming guidance, particularly for Python."}
  ```
- **User Prompt**:
  ```json
  {"role": "user", "content": "How do I reverse a list in Python?"}
  ```
- **Assistant Response**:
  ```json
  {"role": "assistant", "content": "You can reverse a list in Python using the `reverse()` method or by using slicing. For example:\n\n```\nmy_list = [1, 2, 3, 4, 5]\nmy_list.reverse()  # In-place reversal\nprint(my_list)  # Output: [5, 4, 3, 2, 1]\n\n# Or using slicing:\nreversed_list = my_list[::-1]\nprint(reversed_list)  # Output: [5, 4, 3, 2, 1]\n```"}
  ```

### Example 4: Encouraging Study Buddy
- **System Prompt**:
  ```json
  {"role": "system", "content": "You are an encouraging study buddy who motivates the user to keep up with their studies."}
  ```
- **User Prompt**:
  ```json
  {"role": "user", "content": "I'm feeling overwhelmed with my math homework."}
  ```
- **Assistant Response**:
  ```json
  {"role": "assistant", "content": "I totally understand! Math can be challenging, but remember, you’re capable of doing hard things. Take it one problem at a time, and don’t be afraid to ask for help if you get stuck. You've got this!"}
  ```

### Example 5: Travel Planner Assistant
- **System Prompt**:
  ```json
  {"role": "system", "content": "You are a travel planner assistant who helps users create itineraries for their trips."}
  ```
- **User Prompt**:
  ```json
  {"role": "user", "content": "Can you help me plan a 3-day trip to Paris?"}
  ```
- **Assistant Response**:
  ```json
  {"role": "assistant", "content": "Absolutely! Here’s a suggested itinerary for a 3-day trip to Paris:\n\n**Day 1**: Visit the Eiffel Tower, stroll along the Seine River, and explore the Louvre Museum.\n\n**Day 2**: Start the day at Notre-Dame Cathedral, then head to Montmartre to see the Sacré-Cœur Basilica. In the evening, enjoy a dinner cruise on the Seine.\n\n**Day 3**: Spend the morning at the Palace of Versailles, and later, visit the Musée d'Orsay. Wrap up your trip by relaxing in the Luxembourg Gardens.\n\nLet me know if you'd like more details or any adjustments!"}
  ```

### Example 6: Health & Wellness Assistant
- **System Prompt**:
  ```json
  {"role": "system", "content": "You are a knowledgeable health and wellness coach providing advice on fitness, diet, and mental health."}
  ```
- **User Prompt**:
  ```json
  {"role": "user", "content": "What's a good way to manage stress?"}
  ```
- **Assistant Response**:
  ```json
  {"role": "assistant", "content": "Managing stress effectively can involve several strategies. Deep breathing exercises, meditation, and regular physical activity like yoga or jogging are great ways to reduce stress. Make sure to get enough sleep, and consider keeping a journal to express your thoughts and emotions. Also, staying connected with supportive friends and family can make a big difference."}
  ```

### Summary of Roles:
- **System**: Sets the assistant's behavior and tone for the conversation.
- **User**: Represents the queries or instructions given by the user.
- **Assistant**: Contains the responses generated by the LLM, based on the system prompt and user inputs.

These examples should help you see how you can use different system prompts to adjust the assistant’s behavior and make interactions tailored to specific tasks or purposes.

### Install Libraries

In [None]:
# !pip install python-dotenv
# !pip install openai
# !pip install google-generativeai
# !pip install anthropic

### Import Libraries

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import google.generativeai
import anthropic
from IPython.display import Markdown, display, update_display

### Load Environment Variables

In [None]:
# Load the environment variables from the .env file
load_dotenv('/content/API_KEYS.env')  # Ensure this is the correct path to your file

# Get the API keys from the environment
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
google_api_key = os.getenv("GOOGLE_API_KEY")

# Check if the keys are loaded correctly and print a portion of them
if openai_api_key:
    print(f"OpenAI API Key loaded: {openai_api_key[0:10]}...")  # Only print part of the key
else:
    print("OpenAI API key not loaded correctly.")

if anthropic_api_key:
    print(f"Anthropic API Key loaded: {anthropic_api_key[0:10]}...")
else:
    print("Anthropic API key not loaded correctly.")

if google_api_key:
    print(f"Google API Key loaded: {google_api_key[0:10]}...")
else:
    print("Google API key not loaded correctly.")

OpenAI API Key loaded: sk-proj-mf...
Anthropic API Key loaded: sk-ant-api...
Google API Key loaded: AIzaSyDh3a...


### Connect to OpenAI, Anthropic and Google

In [None]:
import openai
import anthropic
import google.generativeai

# Connect to OpenAI
openai.api_key = openai_api_key  # Set OpenAI API key

# Connect to Anthropic (Claude)
claude = anthropic.Anthropic(api_key=anthropic_api_key)  # Set Anthropic API key

# Connect to Google Generative AI
google.generativeai.configure(api_key=google_api_key)  # Set Google API key

### Setting up a Connection Explained

The code you provided initializes instances of different APIs — **OpenAI**, **Anthropic**, and **Google Generative AI**. Each line is setting up a connection or configuration for the specific API, enabling you to interact with their services in your code. Let's break down what each line does:

### 1. **`openai = OpenAI()`**:
   - This line **initializes the OpenAI API client**.
   - The `OpenAI()` function (assuming you're using the OpenAI SDK) connects your code to OpenAI’s API, allowing you to interact with models like GPT-3, GPT-4, etc.
   - To work correctly, it needs your **API key**, which is usually set via environment variables or passed during initialization.
   - Once the client is initialized, you can use it to send requests for things like **text generation**, **completions**, or other tasks supported by OpenAI's models.

   **What it does**: Prepares your code to interact with the OpenAI API, which might be used later to send requests for language models, embeddings, or other functionalities.

---

### 2. **`claude = anthropic.Anthropic()`**:
   - This line **initializes the Anthropic API client**, specifically for accessing Anthropic's **Claude** language model.
   - `anthropic.Anthropic()` is part of Anthropic's Python SDK, and it connects your code to their API, allowing you to interact with models like **Claude**.

---

### General Flow:
1. **Initialization**: Each of these lines initializes the **API clients** for their respective services — OpenAI, Anthropic (Claude), and Google Generative AI.
2. **Access to Models**: Once these clients are initialized, you can use them to interact with their models and services. For example, you might make API calls to generate text, complete prompts, or perform other AI-driven tasks.
3. **Authentication**: Typically, behind the scenes, these clients require **API keys** or other credentials, which are often stored in environment variables or configured during the setup process. This allows your code to securely interact with the respective APIs.



### Asking LLMs to tell a joke

### Main Functions of the Code:
- **System Message Setup**:
  - Defines the behavior or tone of the assistant using a `system_message`.
  - Example: `"You are an assistant that is great at telling jokes"`.

- **User Prompt Creation**:
  - Creates a specific user prompt for the assistant.
  - Example: `"Tell a bawdy joke for an audience of professional comedians"`.

- **Prompt List Creation**:
  - Combines the `system_message` and `user_prompt` into a structured list called `prompts`.
  - This list contains dictionaries with a role (`system` or `user`) and the corresponding content.
  
- **API Request**:
  - Uses the `openai.chat.completions.create()` method to generate a response from the model.
  - Specifies the model (`gpt-4o-mini`) and sends the `prompts` as the input messages.
  
- **Printing the Model's Response**:
  - Accesses the response using `completion.choices[0].message.content` to print the content generated by the model.

### Focus Points for Learning:
- **Roles (`system` and `user`)**:
  - Understand the importance of defining different roles to set up the assistant's behavior (`system`) and provide specific prompts (`user`).
  - Learn how the system prompt influences the entire conversation's tone and purpose.

- **Prompt Engineering**:
  - Practice structuring prompts to elicit desired responses from the model.
  - Learn how to use multiple roles to guide the assistant more effectively.

- **API Method Usage**:
  - Get familiar with the **`openai.chat.completions.create()`** method to interact with the model.
  - Understand how to pass the **model**, **messages**, and other relevant parameters to the API.

- **Accessing the Response**:
  - Understand how to access the generated response through `completion.choices`.
  - Learn how to interpret and extract the content from the response object to use it in your application.

These key functions and concepts will help you become more adept at structuring interactions with OpenAI's models, focusing on prompt engineering and using the API effectively.

In [None]:
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a bawdy joke for an audience of professional comedians"

prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts
)

In [None]:
# send request to openAI

print(completion.choices[0].message.content)

Why did the scarecrow win an award?

Because he was outstanding in his field... but he still couldn’t figure out how to make hay while the sun shined on his mishaps in the barn!


#### Response Object

This is an example of a response object returned by the OpenAI API after making a chat completion request. Let me break down the key parts of this JSON object:

```json
{
    "id": "chatcmpl-abc123",
    "object": "chat.completion",
    "created": 1677858242,
    "model": "gpt-4o-mini",
    "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 7,
        "total_tokens": 20,
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "\n\nThis is a test!"
            },
            "logprobs": null,
            "finish_reason": "stop",
            "index": 0
        }
    ]
}
```

### Breakdown of Key Elements:

1. **`id`**: `"chatcmpl-abc123"`
   - This is a unique identifier for the response.
   - It can be useful for tracking or logging purposes.

2. **`object`**: `"chat.completion"`
   - This specifies the type of response object.
   - Here, it indicates that the object is a chat completion.

3. **`created`**: `1677858242`
   - This is a timestamp representing when the completion was created.
   - It’s in Unix time format, which is the number of seconds since January 1, 1970.

4. **`model`**: `"gpt-4o-mini"`
   - This specifies the model that generated the response.
   - In this case, it’s `"gpt-4o-mini"`.

5. **`usage`**: Contains token usage information.
   - **`prompt_tokens`**: `13`
     - The number of tokens used for the input (the user's prompts).
   - **`completion_tokens`**: `7`
     - The number of tokens used for the model's generated response.
   - **`total_tokens`**: `20`
     - The total number of tokens used for both the prompt and the response.
   - **`completion_tokens_details`**:
     - **`reasoning_tokens`**: `0`
       - This provides additional details about the types of tokens used in the response. It might be specific to this model, for tracking certain aspects of the generated completion.

6. **`choices`**: A list of response choices.
   - This list contains one or more possible completions from the model. Each item in this list represents a different generated completion.
   - In this example, there is only one completion (index `0`).
   
   **Within each choice**:
   - **`message`**:
     - **`role`**: `"assistant"`
       - This indicates that this part of the response is from the assistant.
     - **`content`**: `"This is a test!"`
       - This is the actual response generated by the assistant.

   - **`logprobs`**: `null`
     - This field may contain log probabilities of the tokens generated, which can be used for analyzing how confident the model was about each token. In this example, it is `null`.

   - **`finish_reason`**: `"stop"`
     - This indicates why the completion ended. The value `"stop"` means that the model stopped because it reached a logical stopping point, based on its internal heuristics.
     - Other reasons could include `"length"` if it reached a token limit, or `"max_tokens"` if it hit a specific limit defined by the request.

   - **`index`**: `0`
     - This is the index of the completion in the list of possible choices.
     - Useful if you request multiple completions and need to differentiate between them.

### Summary:
- The JSON response provides a complete breakdown of what the model returned, including metadata about the request, token usage details, and the actual response message(s).
- The `usage` information is particularly useful for tracking the cost of using the API, as many OpenAI models charge based on the number of tokens used.
- The `choices` list contains the generated response content and other details about how and why the response was generated.

This structure is designed to be detailed, allowing you to understand exactly how the model responded and the resources used in the process.

In [None]:
import json  # Import the json module to pretty-print the response

completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts
)

print(completion.choices[0].message.content)
print('\n #---------  Response Object ---------#\n')
# Convert the response to a dictionary and print it
completion_dict = completion.to_dict()
print(json.dumps(completion_dict, indent=2))

Sure, here’s a more risqué joke for a crowd of professional comedians:

Why did the comedian bring a ladder to the bar?

Because he heard the drinks were on the house—but he wanted to make sure he could reach the top shelf for some "adult" entertainment!

 #---------  Response Object ---------#

{
  "id": "chatcmpl-ALrzSQmRhYvvAvwpyMkZSRl8FS2eu",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Sure, here\u2019s a more risqu\u00e9 joke for a crowd of professional comedians:\n\nWhy did the comedian bring a ladder to the bar?\n\nBecause he heard the drinks were on the house\u2014but he wanted to make sure he could reach the top shelf for some \"adult\" entertainment!",
        "refusal": null,
        "role": "assistant"
      }
    }
  ],
  "created": 1729775790,
  "model": "gpt-4o-mini-2024-07-18",
  "object": "chat.completion",
  "system_fingerprint": "fp_482c22a7bc",
  "usage": {
    "completion_token

### OpenAI LLM APIs

To effectively work with OpenAI LLM APIs, there are several key concepts you'll want to understand. These concepts help you use the APIs effectively, control the output, and understand what the model can do. Here’s a list of important areas to focus on:

### 1. **API Basics**
   - **Endpoints**: Understand the different API endpoints available, such as the `ChatCompletion` endpoint for chat models.
   - **Authentication**: Learn how to use API keys securely to authenticate your requests.
   - **API Limits and Pricing**: Be aware of rate limits, token usage, and how pricing is calculated.

### 2. **Roles and Messages**
   - **Message Structure**: Understand how messages are structured with `"role"` (e.g., `system`, `user`, `assistant`) and `"content"`. These roles help define the context of the conversation and guide the model's behavior.
   - **System, User, Assistant Roles**: Grasp how each role contributes to a structured conversation and how to use them to control the assistant's behavior.

### 3. **Parameters and Controls**
   - **Temperature**: Controls the randomness of responses. Higher values make the output more creative, while lower values make it more deterministic.
   - **Top-p (Nucleus Sampling)**: Similar to temperature, this parameter influences response generation by limiting the choice of next words to a certain cumulative probability.
   - **Max Tokens**: Understand how to control the length of responses to manage costs and ensure concise answers.
   - **Frequency Penalty & Presence Penalty**: These parameters control repetition and encourage the introduction of new ideas, helping to make responses more interesting or stay focused.

### 4. **Tokens and Usage**
   - **Tokenization**: Learn how text is broken into tokens. Tokens can be as short as one character or as long as one word. Understanding how tokenization works is crucial for managing the input and output efficiently.
   - **Prompt Tokens vs Completion Tokens**: Understand the difference between tokens used for the input prompt and those generated as a response. Both affect the API usage costs.

### 5. **Prompt Engineering**
   - **System Prompts**: Learn how to design system prompts that set up the assistant's behavior effectively for a given task.
   - **Few-Shot Learning**: Practice giving examples in the prompt to teach the model how to answer in a specific format.
   - **Prompt Design**: Understand how to craft effective user prompts to elicit desired responses from the model.

### 6. **Managing Context**
   - **Conversation History**: Learn how to maintain conversation history by passing previous messages in the `messages` parameter to create coherent, context-aware interactions.
   - **Token Limits**: Understand the token limit of the models (e.g., 8,000 or 32,000 tokens depending on the version) and how to manage context within these limits.

### 7. **Handling Responses**
   - **Response Parsing**: Be comfortable working with the JSON response object. Learn how to access different parts of the response, such as choices and usage statistics.
   - **Error Handling**: Learn to handle common errors (e.g., rate limits, token limit exceeded) gracefully, using retry logic or catching exceptions.

### 8. **Use Cases and Model Limitations**
   - **Understand Use Cases**: Be familiar with how LLMs are used—like content generation, summarization, answering questions, writing code, and more.
   - **Limitations**: Understand what the model can and cannot do. For instance, the model may generate incorrect information confidently. It’s important to know when to validate responses or use other tools.

### 9. **Streaming Responses**
   - **Response Streaming**: Learn about streaming responses (like typing indicators in chat) to improve user experience, especially for long responses.

### 10. **Rate Limits and Best Practices**
   - **Rate Limits**: Understand the rate limits associated with your API key and how to optimize requests to stay within these limits.
   - **Best Practices**: Learn best practices, such as batching requests, optimizing prompts to reduce tokens, and ensuring API keys are kept secure.

### 11. **Advanced Techniques**
   - **Fine-Tuning**: Learn about fine-tuning models on specific datasets to improve their performance for a particular task.
   - **Tool Use with Plugins**: In some advanced cases, LLMs can interact with external tools or plugins (e.g., for code execution or searching the web). Understanding how these plugins work can greatly enhance what your assistant can accomplish.

### 12. **APIs in Practice**
   - **Experimentation**: Experiment with different types of system and user prompts to see how they influence the assistant’s responses.
   - **Integration**: Learn to integrate the API into applications using libraries like Python’s `requests` or dedicated SDKs (`openai` library). Build simple projects such as chatbots, question-answering systems, or content generators to practice.

Understanding these fundamental concepts will give you the ability to use OpenAI's LLM APIs effectively, creating more intelligent and contextually-aware applications while optimizing for performance and cost.

### **Temperature**

The **`temperature`** parameter controls the randomness or creativity of the model's responses.

- **`temperature=0.7`**:
  - The **temperature** value ranges typically from `0` to `1` (though it can sometimes go higher).
  - **Lower values (e.g., `0` to `0.3`)** make the model more deterministic and focused. It tends to generate more predictable, often fact-based responses.
  - **Higher values (e.g., `0.7` to `1`)** make the model more creative and diverse. It allows for more varied responses, but this can sometimes lead to less consistent or off-topic content.

- By setting **`temperature=0.7`**, you are asking the model to provide responses that strike a balance between creativity and reliability. It won't be too rigid or repetitive, but it also won't be completely unpredictable. It's a good middle-ground setting that allows for some creativity without being too random.

In [None]:
# Temperature setting controls creativity

completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.9
)
print(completion.choices[0].message.content)

Sure, here's one for a seasoned crowd:

Why did the scarecrow win an award?

Because he was outstanding in his field... until the crows started asking for a raise! 

(Feel free to adjust the delivery to match your style—nothing like a little playful innuendo to spice up the punchline!)


### **Top-p (Nucleus Sampling)**
The `top_p` parameter controls the diversity of the response by limiting the possible tokens to a subset based on cumulative probability. A value between `0` and `1` is used:

- **Lower Value (e.g., `top_p=0.3`)**: Limits the response to the most probable words, making the response more focused and predictable.
- **Higher Value (e.g., `top_p=0.9`)**: Allows for more diversity in the response.

In this example, setting `top_p=0.85` means that the model will only consider tokens that collectively make up 85% of the probability distribution, ensuring some level of diversity but keeping it within a manageable range.

In [None]:
completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.9,      # Controls randomness
    top_p=0.85            # Controls diversity of the response
)
print(completion.choices[0].message.content)

Sure, here’s a cheeky one for a crowd of pros:

Why did the comedian bring a ladder to the bar?

Because they heard the drinks were on the house, but they were really just trying to raise the stakes!


### **Max Tokens**
The `max_tokens` parameter controls the length of the generated response. It limits the number of tokens (words or characters) in the response:

- **Lower Value (e.g., `max_tokens=50`)**: Keeps the response short.
- **Higher Value (e.g., `max_tokens=200`)**: Allows for longer, more detailed responses.

In this example, setting `max_tokens=100` ensures that the response is concise, useful for scenarios like short answers or summaries.


In [None]:
completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7,      # Balanced creativity
    max_tokens=100        # Limit the response to a maximum of 100 tokens
)
print(completion.choices[0].message.content)

Sure, here’s a classic with a cheeky twist:

Why did the scarecrow win an award?

Because he was outstanding in his field... and he really knew how to bury the competition under a stack of corny puns! 

But seriously, folks, it’s hard to find good help these days. I mean, my last assistant was so bad, I had to fire him… he kept trying to plant ideas instead of seeds!


### **Frequency Penalty**
The `frequency_penalty` parameter helps prevent the model from repeating words or phrases, making the response more varied:

- **Lower Value (e.g., `frequency_penalty=0.0`)**: No penalty, which can lead to repetition if the model tends to be repetitive.
- **Higher Value (e.g., `frequency_penalty=1.0`)**: Encourages more diverse responses by penalizing repeated words.

In this example, setting `frequency_penalty=0.8` discourages the model from repeating itself, which can be useful when generating creative content where redundancy is undesirable.


In [None]:
completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7,            # Balanced creativity
    frequency_penalty=0.8       # Penalize repeated phrases or words
)
print(completion.choices[0].message.content)

Sure, here’s a bawdy joke for you:

Why did the scarecrow win an award?

Because he was outstanding in his field... but rumor has it, he also had a reputation for getting straw-hatted after dark! 

(Just remember to keep it playful and know your audience!)


### Putting It All Together:
You can combine these parameters to fine-tune the model’s response characteristics based on your requirements:

- **`temperature=0.9`**: Promotes creativity.
- **`top_p=0.9`**: Adds diversity while maintaining focus.
- **`max_tokens=150`**: Caps response length.
- **`frequency_penalty=0.5`**: Encourages avoiding repetition but not overly strict.

These parameters provide a lot of control over how the model responds, allowing you to optimize it for specific use cases, whether it's more deterministic or creative, concise or elaborate.

In [None]:
completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.9,          # Makes the response creative
    top_p=0.9,                # Allows for diverse responses
    max_tokens=150,           # Limit response length to 150 tokens
    frequency_penalty=0.5     # Avoid repeating words too frequently
)
print(completion.choices[0].message.content)

Sure, here’s a bawdy joke that should resonate with a crowd of professional comedians:

Why did the scarecrow win an award?

Because he was outstanding in his field… and had plenty of straw to wrap around his “crops”! 

(Okay, that’s not as naughty as you might expect—maybe you have to "harvest" the punchline! 😏)


In [None]:
# Claude 3.5 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens

message = claude.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)

print(message.content[0].text)

In [None]:
# Claude 3.5 Sonnet again
# Now let's add in streaming back results

result = claude.messages.stream(
    model="claude-3-5-sonnet-20240620",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)

with result as stream:
    for text in stream.text_stream:
            print(text, end="", flush=True)

In [None]:
# The API for Gemini has a slightly different structure

gemini = google.generativeai.GenerativeModel(
    model_name='gemini-1.5-flash',
    system_instruction=system_message
)
response = gemini.generate_content(user_prompt)
print(response.text)

A guy walks into a bar with a tiny piano under his arm. He sits down, places the piano on the counter, and starts playing a beautiful tune. 

The bartender, impressed, asks, "Wow, where did you get that little piano?"

The guy replies, "It's a special piano. It shrinks every time I play it."

The bartender, intrigued, asks, "Really? How does that work?"

The guy leans in and whispers, "I just gotta keep my fingers crossed." 



## Stream, Reply, Display

This code snippet demonstrates how to stream responses from OpenAI's API and update the display dynamically, such as in an interactive notebook like Jupyter. Let’s break it down step by step:

### Overview
- The code streams the response from OpenAI's language model and displays it incrementally in a Jupyter-like environment.
- This approach is particularly useful when generating long responses, as it allows you to start displaying output before the full response has been generated.

### Detailed Breakdown

1. **Streaming the Response**
   ```python
   stream = openai.chat.completions.create(
       model='gpt-4o',
       messages=prompts,
       temperature=0.7,
       stream=True
   )
   ```
   - **`stream=True`**: This tells the API to send the response in chunks, as it is being generated.
   - **`stream`**: This variable represents the iterable response from the OpenAI API. Each item in `stream` contains a part of the model’s response (called a "chunk").
  
2. **Initializing `reply` and Display Handle**
   ```python
   reply = ""
   display_handle = display(Markdown(""), display_id=True)
   ```
   - **`reply = ""`**: Initializes an empty string to store the complete reply as it is constructed chunk by chunk.
   - **`display_handle = display(Markdown(""), display_id=True)`**:
     - **`display()`**: This function is used to output content to a notebook cell. The initial `Markdown("")` creates an empty Markdown output, which is later updated.
     - **`display_id=True`**: Creates a display with an identifier, allowing for updates to the same output instead of creating new outputs each time.
     - **`display_handle`**: Stores a reference to the display so it can be updated with new content as the response streams in.

3. **Streaming the Response and Updating the Display**
   ```python
   for chunk in stream:
       reply += chunk.choices[0].delta.content or ''
       reply = reply.replace("```", "").replace("markdown", "")
       update_display(Markdown(reply), display_id=display_handle.display_id)
   ```
   - **`for chunk in stream:`**: Iterates over each chunk of the streamed response.
   
   - **`reply += chunk.choices[0].delta.content or ''`**:
     - **`chunk`**: Each chunk contains a partial response.
     - **`chunk.choices[0].delta.content`**: This retrieves the content of the chunk. The `delta` object represents a partial update (like an incremental addition to the response).
     - **`or ''`**: This ensures that if `content` is `None`, it appends an empty string, preventing errors.
     - **`reply += ...`**: Appends the newly received content to the existing `reply` string.

   - **`reply = reply.replace("```","").replace("markdown","")`**:
     - **`.replace("```", "")`**: Removes backticks (`\`\`\``), often used to format code blocks, from the streamed response.
     - **`.replace("markdown", "")`**: Removes occurrences of the word "markdown" from the response, likely to avoid unnecessary formatting instructions.
     - These replacements help keep the output clean and avoid unintentional formatting issues in the displayed content.

   - **`update_display(Markdown(reply), display_id=display_handle.display_id)`**:
     - **`Markdown(reply)`**: Converts the `reply` string to a Markdown object so it is properly formatted when displayed.
     - **`update_display()`**: Updates the existing display (referenced by `display_handle.display_id`) with the new content of `reply`. This avoids creating new outputs for every chunk and instead progressively updates the original display.

### Summary
- The code uses streaming to receive the response in parts and constructs the full response incrementally.
- **`reply`** stores the combined response.
- **`display_handle`** allows updating the output in real-time in a notebook, providing an interactive experience.
- The loop iteratively adds each chunk to the reply and updates the display, ensuring that users see the response being built as it is generated.

This approach creates a seamless experience for users, where they can see the assistant's response being generated step by step, which is especially helpful for longer outputs or when reducing waiting time in interactive environments.

In [None]:
# To be serious! GPT-4o-mini with the original question

prompts = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution?"}
  ]

In [None]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.7,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

Determining whether a business problem is suitable for a language model solution, such as a large language model (LLM) like GPT-3 or similar, involves evaluating several key factors. Here are some steps and considerations to help you make that decision:

1. **Nature of the Problem:**
   - **Textual Data:** LLMs are particularly effective for problems involving natural language processing (NLP), such as text generation, summarization, translation, sentiment analysis, entity recognition, and question-answering.
   - **Creativity and Variability:** Problems that require creativity, generating varied outputs, or handling ambiguous instructions are well-suited for LLMs.

2. **Complexity of Language Understanding:**
   - **Contextual Understanding:** If the problem requires understanding context, nuances, or conversational dynamics, an LLM might be appropriate.
   - **Complex Language Tasks:** Tasks that involve complex language understanding, such as drafting detailed reports, generating conversational agents, or understanding nuanced customer feedback, can benefit from an LLM.

3. **Scalability:**
   - **Volume of Text:** If the problem involves processing large volumes of text efficiently, LLMs can be useful due to their ability to handle vast datasets.

4. **Data Availability:**
   - **Quality and Quantity of Data:** Ensure you have access to sufficient high-quality text data for training or fine-tuning if needed. Pre-trained LLMs can handle a wide range of tasks, but specific applications may require additional data.

5. **Business Value:**
   - **Impact on Business Goals:** Evaluate whether the LLM solution aligns with your business goals and can deliver tangible value, such as improving customer satisfaction, increasing efficiency, or reducing costs.
   - **Cost-Benefit Analysis:** Consider the cost of implementing an LLM solution versus the expected benefits.

6. **Ethical and Compliance Considerations:**
   - **Bias and Fairness:** Assess the potential for bias in outputs and ensure the LLM solution complies with ethical standards and regulations.
   - **Data Privacy:** Ensure that using LLMs adheres to data privacy laws and that sensitive data is protected.

7. **Technical Feasibility:**
   - **Integration with Existing Systems:** Determine if the LLM solution can be integrated into your current technology stack.
   - **Resource Requirements:** Consider the computational resources required to deploy and maintain an LLM solution.

8. **Alternatives and Complementary Solutions:**
   - **Explore Alternatives:** Assess whether simpler models or rule-based systems could solve the problem effectively.
   - **Hybrid Approaches:** Sometimes, combining LLMs with other techniques (e.g., structured data analysis, machine learning models) provides a more robust solution.

9. **Expertise and Support:**
   - **Availability of Expertise:** Ensure you have access to the necessary expertise to implement and maintain the LLM solution.
   - **Vendor Support:** Consider the level of support and documentation available from the LLM provider.

By considering these factors, you can determine if an LLM is a suitable solution for your business problem. If the problem meets most of these criteria, it is likely a good candidate for an LLM-based approach.