# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [3]:
# imports
import os
import requests
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from openai import OpenAI


In [4]:
# constants
MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'
OLLAMA_BASE_URL = "http://localhost:11434/v1"


In [5]:
# set up environment
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')

if not openai_api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not openai_api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

# set up OpenAI client
gpt = OpenAI()

# set up Ollama client
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

API key found and looks good so far!


In [6]:
system_prompt = """
    You are a helpful technical assistant that will respond with a clear, concise, and easy to understand explanation to the user's technical question.
    Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

In [7]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:

stream = gpt.chat.completions.create(
        model=MODEL_GPT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question}
        ],
        stream=True
    )
response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    update_display(Markdown(response), display_id=display_handle.display_id)
    
"""


In [8]:
# Get gpt-4o-mini to answer, with streaming
stream = gpt.chat.completions.create(
        model=MODEL_GPT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question}
        ],
        stream=True
    )
response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    update_display(Markdown(response), display_id=display_handle.display_id)


This code snippet is using the OpenAI GPT API to create a chat completion in a streaming manner. Here's a detailed breakdown of what each part does and why:

### Code Breakdown

1. **Creating a Chat Completion Stream**:
   ```python
   stream = gpt.chat.completions.create(
       model=MODEL_GPT,
       messages=[
           {"role": "system", "content": system_prompt},
           {"role": "user", "content": question}
       ],
       stream=True
   )
   ```
   - `gpt.chat.completions.create(...)`: This function is called to initiate a chat completion request.
   - `model=MODEL_GPT`: Specifies the GPT model you want to use (e.g., "gpt-3.5-turbo").
   - `messages=[...]`: This is a list of messages that establish the context for the chat. The messages include:
     - A "system" message that sets the behavior of the assistant using `system_prompt`.
     - A "user" message containing the `question` that the user is asking.
   - `stream=True`: This flag enables streaming, allowing the response to be received in chunks as they are generated rather than waiting for the entire response.

2. **Initializing the Response and Display**:
   ```python
   response = ""
   display_handle = display(Markdown(""), display_id=True)
   ```
   - `response = ""`: Initializes an empty string to accumulate the response chunks received from the API.
   - `display_handle = display(Markdown(""), display_id=True)`: Sets up a display handle for real-time updating of the response in a Markdown format. The `display_id=True` ensures that the output can be updated in place without having to clear the entire output area.

3. **Processing the Streamed Response**:
   ```python
   for chunk in stream:
       response += chunk.choices[0].delta.content or ''
       update_display(Markdown(response), display_id=display_handle.display_id)
   ```
   - `for chunk in stream`: This loop iterates over each chunk of data received from the streaming response.
   - `response += chunk.choices[0].delta.content or ''`: Here, the code appends the new content from the current chunk to the `response` string. The `delta.content` contains the newly generated text by the model.
   - `update_display(Markdown(response), display_id=display_handle.display_id)`: This updates the displayed Markdown content in real-time with the accumulated response. The use of `display_id` allows the output to be updated without creating new display outputs.

### Why This Code is Useful
- **Real-time Interaction**: Streaming allows users to see the response develop gradually, which can enhance the interactive experience.
- **Dynamic Updates**: By updating the display in real-time, the user receives immediate feedback, making it feel more conversational and responsive.
- **Structured Responses**: The use of structured messages (system and user roles) helps in maintaining context for the conversation, improving the quality of the responses generated by the model.

In summary, this code facilitates real-time interaction with a GPT model in a chat-like format, making it suitable for applications such as chatbots or interactive Q&A systems.

In [9]:
# Get Llama 3.2 to answer
# Check if Llama 3.2 is running
requests.get("http://localhost:11434").content

b'Ollama is running'

In [10]:
# Pull down Llama 3.2
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ▕██████████████████▏   96 B                         [K
pulling 34bb5ab01051: 100% ▕██████████████████▏  561 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


In [11]:
response = ollama.chat.completions.create(model=MODEL_LLAMA, messages=[{"role":"system", "content": system_prompt}, {"role": "user", "content": question}])
display(Markdown(response.choices[0].message.content))

**GPT-3 Chat Completion Code Explanation**

This code is used to initiate a chat completion session with the GPT-3 AI model through the `gpt.chat.completions` API. Here's a step-by-step breakdown of what it does:

1. **Initialization**
   - It sets up an HTTP connection to create a chat completion session. The first argument, `model=MODEL_GPT`, specifies that the conversation should use the GPT model.

2. **Preparing Chat Prompts**
   - It constructs two messages: one as "system" and another as "user". These messages are essentially predefined chat prompts.
     - In this example, the "system" message is a blank line (`Markdown("")`), and the "user" message is an unknown question (`question`). This suggests that this code may be part of a dynamic or random question generation process.
   - The format `"role": "system", "content"` and `"role": "user", "content"` indicates the AI's role in the conversation. 

3. **Starting the Conversation**
   - It sets up an HTTP connection to display data sent by the API, which is displayed below the chat window.

4. **Handling Response Chunks**
   - The loop iterates over chunks of response from the server:
     - Each chunk includes several pieces of information such as all responses ("all`).choices"). In this example, it only uses the first (most likely) one.
    - The chosen answer (`chunk.choices[0].delta.content`) is added to a string that stores the final chat response.

5. **Updating Chat Window Display**
   - Every iteration of the loop updates the display with the new responses by creating an updated Markdown instance that includes the newest addition and then displaying it using `update_display`.

This code effectively initiates a GPT-3 chat conversation, sending a predefined question at startup, waiting for the AI's response, and printing that input after which we manually type it down.