In [None]:
from openai import OpenAI
import dotenv
dotenv.load_dotenv()
import os

llm = OpenAI(base_url=os.environ['BASE_URL_OLLAMA'], 
             api_key=os.environ['API_KEY_OLLAMA'])

In [2]:
import numpy as np

# Legacy (Chat Completion API) 

```python
response = client.chat.completions.create(
    model='gemma3:12b',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant specializing in Python programming.'},
        {'role': 'user', 'content': 'Explain async/await'}
    ],
    temperature=0.3,
    max_tokens=2000
)

print(response.choices[0].message.content)
```

-------------------------------------------------------------------------------------------------------------------------------

Okay, let's break down `async/await` in Python. It's a powerful feature for writing concurrent code, but it can be a bit confusing at first. I'll explain it conceptually, then with code examples, and finally address common pitfalls.

**1. The Problem: Blocking Operations**

Imagine you're writing a program that needs to fetch data from a website.  A traditional, synchronous (sequential) approach would look like this:

```python
import time

def fetch_data(url):
  """Simulates fetching data from a URL (takes some time)."""
  print(f"Fetching data from {url}...")
  time.sleep(2)  # Simulate network delay
  print(f"Data fetched from {url}")
  return f"Data from {url}"

def process_data(data):
  """Processes the fetched data."""
  print(f"Processing: {data}")
  time.sleep(1)
  print("Data processed.")

def main():
  data1 = fetch_data("https://example.com/api/data1")
  process_data(data1)
  data2 = fetch_data("https://example.com/api/data2")
  process_data(data2)

main()
```

The problem is that `fetch_data` takes 2 seconds. During those 2 seconds, your program is *blocked*. It can't do anything else.  If you have multiple data fetches, the total execution time is the sum of each fetch time. This is inefficient.

**2. Enter `async/await` - Cooperative Multitasking**

`async/await` provides a way to write code that *appears* sequential (easy to read) but can actually run concurrently.  It's a form of *cooperative multitasking*.  Here's the key idea:

* **`async`:**  Declares a function as a *coroutine*. A coroutine is a special kind of function that can be paused and resumed. Think of it as a function that can yield control back to the "event loop" (more on that later).
* **`await`:**  Used *inside* an `async` function. It pauses the execution of the coroutine until a *awaitable* object (usually another coroutine or a `Future`) completes.  Crucially, while the coroutine is paused, the event loop can switch to running other coroutines.

**3.  A Simple Example with `async/await`**

```python
import asyncio
import time

async def fetch_data(url):
  """Simulates fetching data from a URL (takes some time)."""
  print(f"Fetching data from {url}...")
  await asyncio.sleep(2)  # Simulate network delay.  Crucially, this *yields* control.
  print(f"Data fetched from {url}")
  return f"Data from {url}"

async def process_data(data):
  """Processes the fetched data."""
  print(f"Processing: {data}")
  await asyncio.sleep(1)
  print("Data processed.")

async def main():
  """Runs the asynchronous tasks."""
  task1 = asyncio.create_task(fetch_data("https://example.com/api/data1"))
  task2 = asyncio.create_task(fetch_data("https://example.com/api/data2"))

  data1 = await task1
  data2 = await task2

  await process_data(data1)
  await process_data(data2)

# Run the event loop
asyncio.run(main())
```

**Explanation:**

* **`async def fetch_data(...)`:**  Defines `fetch_data` as a coroutine.
* **`await asyncio.sleep(2)`:**  This is the magic.  `asyncio.sleep(2)` is an awaitable object (a coroutine that represents a delay).  When `await` is encountered, `fetch_data` pauses, *yielding control back to the event loop*.  The event loop can then start running other coroutines (like `fetch_data` for the second URL).  After 2 seconds, `fetch_data` resumes from where it left off.
* **`asyncio.create_task(...)`:**  This creates a `Task` object.  A `Task` is a wrapper around a coroutine that allows it to be scheduled and run concurrently.  We create two tasks, one for each `fetch_data` call.
* **`await task1` and `await task2`:**  We wait for the tasks to complete.  This pauses `main` until the corresponding coroutine finishes.
* **`asyncio.run(main())`:**  This starts the asyncio event loop and runs the `main` coroutine.  The event loop is the engine that manages the execution of all the asynchronous tasks.

**Key Benefits:**

* **Concurrency:**  The two `fetch_data` calls now run *concurrently*.  The total execution time is closer to 2 seconds (the longest fetch time) rather than 4 seconds (the sum of the fetch times).
* **Readability:** The code still looks sequential, making it easier to understand than complex threading or multiprocessing code.
* **Efficiency:**  The program doesn't waste time waiting idly; it can switch between tasks while waiting for I/O operations (like network requests) to complete.

**4.  Important Concepts**

* **Event Loop:** The heart of `asyncio`. It's a single-threaded loop that monitors and manages all the asynchronous tasks. It decides which task to run next.
* **Awaitable Objects:**  Objects that can be used with `await`.  These are typically coroutines (defined with `async def`) or `Future` objects.
* **`asyncio.sleep()`:**  A simple awaitable object that pauses execution for a specified number of seconds.  It's often used to simulate I/O operations.
* **`asyncio.create_task()`:**  Creates a `Task` object from a coroutine, allowing it to be scheduled for concurrent execution.

**5. Common Pitfalls and Considerations**

* **Blocking Code:**  Avoid performing blocking operations (like CPU-bound calculations or synchronous I/O) directly within an `async` function.  These will block the entire event loop, negating the benefits of `async/await`.  Use `asyncio.to_thread()` to run blocking code in a separate thread.
* **`await` Only Inside `async` Functions:** You can only use `await` inside functions defined with `async def`.
* **Not a Silver Bullet:** `async/await` is primarily beneficial for I/O-bound operations.  For CPU-bound tasks, multiprocessing is usually a better choice.
* **Context Switching Overhead:** While concurrency is achieved, there's some overhead associated with context switching between coroutines.  Too many very short coroutines can lead to performance degradation.
* **Debugging:** Debugging asynchronous code can be tricky. Use logging and debugging tools carefully.

**6.  Example with `asyncio.to_thread()`**

Let's say you have a CPU-bound function that you need to run within an `async` function:

```python
import asyncio
import time

def cpu_bound_function(n):
  """Simulates a CPU-intensive task."""
  result = 0
  for i in range(n):
    result += i * i
  return result

async def main():
  start_time = time.time()
  result = await asyncio.to_thread(cpu_bound_function, 10000000) # Run in a separate thread
  end_time = time.time()
  print(f"Result: {result}")
  print(f"Time taken: {end_time - start_time:.2f} seconds")

asyncio.run(main())
```

`asyncio.to_thread()` runs the `cpu_bound_function` in a separate thread, preventing it from blocking the event loop.

**In Summary:**

`async/await` is a powerful tool for writing concurrent, I/O-bound Python code. It allows you to write code that looks sequential but can run efficiently, maximizing the use of your resources.  Understanding the event loop, awaitable objects, and the importance of avoiding blocking code is crucial for mastering `async/await`.



Do you want me to elaborate on any of these points, provide more examples, or discuss a specific use case?

# Responses API

# sample response body

## Example
```json
{
  "id": "resp_277036",
  "created_at": 1768784470.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": null,
  "model": "gemma3:12b",
  "object": "response",
  "output": [
    {
      "id": "msg_759917",
      "content": [
        {
          "annotations": null,
          "text": "[\"Explain the concept of decorators in Python.\",\"Decorators are a powerful feature allowing you to modify or enhance functions without actually changing their code.\"]",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": null,
  "temperature": null,
  "tool_choice": null,
  "tools": null,
  "top_p": null,
  "background": null,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "prompt_cache_key": null,
  "reasoning": null,
  "safety_identifier": null,
  "service_tier": null,
  "status": "completed",
  "text": null,
  "top_logprobs": null,
  "truncation": null,
  "usage": {
    "input_tokens": 89,
    "input_tokens_details": null,
    "output_tokens": 32,
    "output_tokens_details": null,
    "total_tokens": 121
  },
  "user": null
}
```

## Parameter Definitions

| Parameter | Description |
|-----------|-------------|
| **id** | Unique identifier for this specific API response object. |
| **created_at** | Unix timestamp indicating when the response was generated. |
| **error** | Contains error information if the request failed, otherwise null. |
| **incomplete_details** | Provides details if the response is incomplete or truncated. |
| **instructions** | The system instructions provided to the model for this request. |
| **metadata** | Additional metadata associated with the API request/response. |
| **model** | The specific AI model used to generate the response. |
| **object** | The type of object returned, typically "response". |
| **output** | Array containing the main content generated by the model. |
| **parallel_tool_calls** | Indicates whether parallel tool calls were enabled. |
| **temperature** | Sampling temperature parameter used for generation. |
| **tool_choice** | Configuration for how tools should be used in the response. |
| **tools** | Array of tools available to the model for this request. |
| **top_p** | Nucleus sampling parameter used for response generation. |
| **background** | Indicates if the request was processed in the background. |
| **max_output_tokens** | Maximum number of tokens allowed in the output. |
| **max_tool_calls** | Maximum number of tool calls permitted. |
| **previous_response_id** | Links to a previous response in a conversation chain. |
| **prompt** | The original input prompt that generated this response. |
| **prompt_cache_key** | Key used for caching and reusing prompt processing. |
| **reasoning** | Contains the model's internal reasoning process if enabled. |
| **safety_identifier** | Identifies safety checks applied to the response. |
| **service_tier** | Indicates the service level used for this request. |
| **status** | Current status of the response (completed, processing, etc.). |
| **text** | Legacy field for response text (deprecated in favor of output). |
| **top_logprobs** | Contains token probabilities if logprobs were requested. |
| **truncation** | Indicates if the response was truncated due to length limits. |
| **usage** | Object containing token usage statistics for the request. |
| **user** | User identifier associated with the API request. |

## Object Parameters
| Parameter | Description |
|-----------|-------------|
| **output[].id** | Unique identifier for individual message in the output. |
| **output[].content** | Array containing the actual content blocks of the response. |
| **output[].content[].annotations** | Additional metadata or annotations for the content. |
| **output[].content[].text** | The actual text content generated by the model. |
| **output[].content[].type** | Type of content (e.g., "output_text"). |
| **output[].content[].logprobs** | Token-level probabilities if detailed logging was requested. |
| **output[].role** | Role of the message sender (e.g., "assistant"). |
| **output[].status** | Status of the individual message component. |
| **output[].type** | Type of the message object. |
|-----------|-------------|
| **usage.input_tokens** | Number of tokens consumed by the input/prompt. |
| **usage.input_tokens_details** | Detailed breakdown of input token usage. |
| **usage.output_tokens** | Number of tokens generated in the output/response. |
| **usage.output_tokens_details** | Detailed breakdown of output token usage. |
| **usage.total_tokens** | Total tokens used (input + output). |

# code snippets

In [None]:
instructions = f"""
You are a helpful assistant specializing in Python programming. You should provide:
1. Short explanation of the main concepts, no more than 3 sentences.
2. Code snippets demonstrating usage.
3. Common edge cases or pitfalls to watch out for.
4. Recommended best practices.

You should organize all information in a structured table with clear labels and cite sources for each section. 

Answer the following question: ```{input}```
"""

input = input("Enter your programming question: ")

response = client.responses.create(
    model="gemma3:12b",
    instructions=instructions,
    input=input,
    temperature=0.3
)

# print(response.output_text) this is saved as the markdown file 2026_01_17_openai_api_v1_prompt_output.md

In [3]:
# testing different inputs
# get just plain output

instructions = """You are a tenured coding professor at the world's top university. 
You excel at explaining complex programming concepts in simple terms. 
You are in a rapid fire tournament and must respond to each question in only one sentence. Store each question and response as a pair in a list like: [Question, Response]
"""

input = ["Explain the concept of decorators in Python.",
         "How does garbage collection work in Python?"]

for user in input:
    response = client.responses.create(
        model="gemma3:12b",
        instructions=instructions,
        input=user,
        temperature=1
    )
    print(f"{response}\n")

Response(id='resp_361545', created_at=1768786947.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gemma3:12b', object='response', output=[ResponseOutputMessage(id='msg_859605', content=[ResponseOutputText(annotations=None, text='[“Explain the concept of decorators in Python.”, "Decorators are a way to modify or enhance functions without permanently changing their code, essentially wrapping them with additional functionality."]', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=None, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier=None, status='completed', text=None, top_logprobs=None, truncation=None, usage=ResponseUsage(input_tokens=89, input_tokens_details=None, output_tokens=36, output_tokens_details=N

In [4]:
# testing temperature ranges
qa_list = []
for user in input:
    for i in np.arange(0, 2.25, 0.5):
        response = client.responses.create(
            model="gemma3:12b",
            instructions=instructions,
            input=user,
            temperature=i
        )
        response_list = response.output_text.strip('[]') + f',"{i}"'
        qa_list.append(response_list)

In [5]:
str(qa_list)

'[\'“Explain the concept of decorators in Python.”, "Decorators are a way to modify or enhance functions without permanently changing their code, essentially wrapping them with additional functionality.","0.0"\', \'“Explain the concept of decorators in Python.”, "Decorators are a way to modify or enhance functions without permanently changing their code, essentially wrapping them with additional functionality.","0.5"\', \'“Explain the concept of decorators in Python.”, "Decorators are a way to add functionality to existing functions without modifying their original code, essentially wrapping them in another function.","1.0"\', \'“Explain the concept of decorators in Python.”, "Decorators are a design pattern allowing you to add functionality to existing functions or classes without modifying their core structure, essentially wrapping them with enhanced behavior.","1.5"\', \'“Explain the concept of decorators in Python.”, "Decorators are a way to modify the behavior of functions or meth

In [6]:
# testing LLM to LLM

secondary_instructions = """
You are a teaching assistant helping students understand programming concepts. 
You receive a list of questions and answers, and your job is to organize them into a format that is easily understood. 
The list you receive will contain sub-lists as strings, and the strings in each sublist will be: [Question, Answer, Temperature].
Each list is delimitted by a backslash.
"""

response = client.responses.create(
    model="gemma3:12b",
    instructions=secondary_instructions,
    input=str(qa_list),
    temperature=1
)

print(response.output_text)

Okay, here's the organized breakdown of the questions and answers you provided, grouped by question and showing the different explanations with their associated temperatures. This should help students see the nuances in how the concept can be explained.

**1. Question: “Explain the concept of decorators in Python.”**

*   **Temperature 0.0:** "Decorators are a way to modify or enhance functions without permanently changing their code, essentially wrapping them with additional functionality."
*   **Temperature 0.5:** "Decorators are a way to modify or enhance functions without permanently changing their code, essentially wrapping them with additional functionality." (Duplicate - this is good to notice and might indicate a data quality issue)
*   **Temperature 1.0:** "Decorators are a way to add functionality to existing functions without modifying their original code, essentially wrapping them in another function."
*   **Temperature 1.5:** "Decorators are a design pattern allowing you t

## streamming output

In [4]:
stream = client.responses.create(
    model="gemma3:12b",
    instructions="You are a helpful assistant specializing in Python programming.",
    input="Explain async/await",
    temperature=0.3,
    stream=True,
)

for event in stream:
    print(event)

ResponseCreatedEvent(response=Response(id='resp_940147', created_at=None, error=None, incomplete_details=None, instructions=None, metadata=None, model=None, object='response', output=[], parallel_tool_calls=None, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier=None, status='in_progress', text=None, top_logprobs=None, truncation=None, usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_940147', created_at=None, error=None, incomplete_details=None, instructions=None, metadata=None, model=None, object='response', output=[], parallel_tool_calls=None, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reaso