# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [18]:
# imports
import os
import requests
from typing import List
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [1]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [3]:
# set up environment
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

API key found and looks good so far!


In [4]:
openai = OpenAI()

In [5]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [7]:
system_prompt = "You are a very patient and detail oriented CS teachers who can answer question an intermediate python ML scientist for code she does not understand. \
Give the answer in Markdown."


In [21]:
# Get gpt-4o-mini to answer, with streaming

def get_user_prompt(question_asked):
    user_prompt = f"Explain the code {question_asked} to me asked in as a question."
    user_prompt += f"For example if the question asked was this code: \"{question}\".\
Then explain what \"yield from\", and specifically what \"yield\" and \"get\" do. What can you tell about the data type\
of \"books\" from this line of code. And why are curly braces being used? What would be other alternatives? What if variable sizes were very large?"
    return user_prompt

def stream_gpt_answer(user_question):
    stream = openai.chat.completions.create(
        model=MODEL_GPT,
        messages=[
           {"role":"system", "content":system_prompt},
           {"role":"user", "content":get_user_prompt(user_question)}
        ],
        stream=True
    )

    response=""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response+=chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [22]:
stream_gpt_answer(question)

Certainly! Let's break down the question about the code snippet step by step.

### Code Explanation

The line of code you provided is:

python
yield from {book.get("author") for book in books if book.get("author")}


#### What Does This Code Do?

1. **Set Comprehension**: 
   - The part `{book.get("author") for book in books if book.get("author")}` is a set comprehension. It creates a set of unique authors from the list or collection `books`.

2. **Processing Books**: 
   - The comprehension iterates over each `book` in the `books` variable (which we will discuss later). For each `book`, it calls the method `get("author")` to retrieve the value associated with the key `"author"`.

3. **Conditional Filtering**: 
   - The `if book.get("author")` checks if the author exists. Only books that have a valid author will be included in the resulting set.

4. **Yielding Values**: 
   - The `yield from` statement is used to yield each item from the set of authors one by one rather than yielding the entire set at once. This is useful in generator functions where maintaining state is important.

### Understanding `yield from`

- **`yield`**: 
  - `yield` is a keyword in Python that transforms a function into a generator, which means it can be paused and resumed. When `yield` is called, the function returns the value and saves its state, so it can continue later from where it left off.

- **`yield from`**: 
  - `yield from` is specifically used to yield all values from an iterable. In this case, it yields each author from the set of unique authors one by one.

### What Does `get` Do?

- **`get` Method**: 
  - `get` is a method used on dictionaries in Python. It retrieves the value associated with the specified key; in this case, the key is `"author"`. If the key does not exist, it returns `None` (or a specified default value if provided).

### Data Type of `books`

- The variable `books` is expected to be a list (or another iterable collection) of dictionaries. Each dictionary likely represents a book with key-value pairs, where one of the keys is `"author"`.

### Why Use Curly Braces?

- **Curly Braces**:
  - Curly braces `{}` denote a set in Python. This structure automatically ensures that duplicate authors are not included. Each author will be unique in the resulting set.

### Alternatives to Using Sets

- Instead of using `{}`, you could use `[]` for a list comprehension, but that would not prevent duplicates:

  python
  authors = [book.get("author") for book in books if book.get("author")]
  

- However, this would require additional steps to convert the list to a set if uniqueness is desired:

  python
  unique_authors = set(authors)
  

### Considerations for Large Variable Sizes

- If the size of `books` is very large:
  - Using a set comprehension is more memory-efficient than creating a list first because it avoids duplicates directly.
  - The generator produced by `yield from` allows you to process items as they are requested rather than loading all authors into memory at once.
  - This can significantly reduce memory usage, especially if not all authors need to be processed at one time.

### Summary

The code snippet utilizes a set comprehension to retrieve unique authors from a collection of books, filtering out any books without an author and then yielding each unique author one by one using `yield from`. The `get` method safely fetches the author information, and curly braces define the structure of the data as a set.

# Get Llama 3.2 to answer

### Solution 1

In [24]:
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

In [29]:
def stream_ollama_via_openai_answer(user_question):
    stream = ollama_via_openai.chat.completions.create(
        model=MODEL_LLAMA,
        messages=[
           {"role":"system", "content":system_prompt},
           {"role":"user", "content":get_user_prompt(user_question)}
        ],
        stream=True
    )

    response=""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response+=chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [30]:
stream_ollama_via_openai_answer(question)

**Question**
Please explain what this code does and why:
python
yield from {book.get("author") for book in books if book.get("author")}

**Answer**

This line of code uses a combination of Python features to extract the authors from a list of books. Let's break it down step by step.

* `books`: This is the variable containing the list of books.
* `{book.get("author") for book in books if book.get("author")}`: This is a **nested generator expression**. It uses the `.get()` method (more on that later) to filter and transform the data from the `books` list.

Here's a simplified explanation:

1. The outer loop iterates over each book in the `books` list.
2. For each book, the inner expression:
	* Calls `book.get("author")`, which returns the author name associated with that book (if it exists). This value is used to filter out books without an author (more on this filtering clause).
	* Uses dictionary comprehension to create a new generator expression containing only the authors from that book.

The `(book.get("author"))` part ensures `book.get("author")` returns an empty string if no author name is found. This way, empty strings are not included in the output when filter out books with no author.

**`yield from`**
Now, let's talk about `yield from`. This is a new syntax introduced in Python 3.3 (and later). When used with a generator expression or another iterable, it allows us to delegate generation of values from one function to another.

In this case:

* The result of the nested generator expression `{...}` is an iterable containing the authors.
* `yield from` says: "instead of returning `this` iterable, *re-yield* each element of the given iterable".

**`get` method**
The `.get()` method calls the dict (key-value pair dictionary) to retrieve a value associated with a specific key. It returns:

* The **value** for that key if it exists in the dictionary.
* A default value (`None`, the second argument) if no matching value is found.

In this case, `book.get("author")` returns an author's name or `None` if that book doesn't have one.

**Data structure**
The `books` variable contains a list of dictionaries. Each dictionary represents a book and has keys (`title`, `authors`, etc.) and values (such as the book's title, publication year, authors...).

Here example:

python
books = [
    {"id": 1, "title": "Book A", "author": "John Smith"},
    {"id": 2, "title": "Book B",
     "author": None},
    {"id": 3, "title": "Book C`, `author": "Jane Doon"}
]


**Curly braces**
The curly braces `{}` syntax creates a new iterable. In this case, we use it to define a nested generator expression.

Other alternatives might be:

* Using a list comprehension and then yielding from the resulting list.
* Rewriting the generator expression using `map` functions or similar constructs.

python
# List comprehension:
for author in {book['author'] for book in books if 'author' in book}:
    yield author

# Map function:
from functools import partial
yield from map(partial(dict.get, "author"), filter(None, books))


**Performance**
When working with very large datasets, using generator expressions and `yield` can be beneficial because it consumes less memory than creating lists or other data structures.

### Solution 2

In [23]:
def stream_ollama_answer(user_question):
    stream = ollama.chat(
        model=MODEL_GPT,
        messages=[
           {"role":"system", "content":system_prompt},
           {"role":"user", "content":get_user_prompt(user_question)}
        ],
        stream=True
    )

    response=""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response+=chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)