<a href="https://colab.research.google.com/github/nbeaudoin/CTME-llm-lecture-resources/blob/main/01_getting_started_with_llms_nbeaudoin.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making our first LLM API call



## Setup

In [1]:
try:
    from google.colab import userdata
    print("Colab notebook detected. Instal ling dependencies...")
    !pip install -Uqqqq \
        openai \
        chromadb \
        sentence-transformers \
        llama-index \
        llama-index-llms-openai \
        gradio \
        datasets \
        dspy-ai

except:
    print("Not in Colab. Skipping installation.")


Colab notebook detected. Instal ling dependencies...
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m389.8/389.8 kB[0m [31m31.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m628.3/628.3 kB[0m [31m41.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m81.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.2/57.2 MB[0m [31m27.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# Basic imports
from rich import print
import gradio as gr
import numpy as np
import os
import matplotlib.pyplot as plt
from ipywidgets import interact
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix, classification_report, cohen_kappa_score, mean_absolute_error

# OpenAI
from openai import OpenAI

## Setting our `OPENAI_API_KEY` environment variable

When we use any LLM provider like OpenAI, Anthropic, or Google, we need some way to tell them who we are making the request.
Today, we'll be using OpenAI.
The most straightforward way to provide this key is through the environment variable `OPENAI_API_KEY`.
The OpenAI python client looks for this environment variable to use in authentication.

In the cell below, we load it from the Google Colab secrets manager on the left 👈.
Before runnign this code, make sure your API key is set as shown below:

![](https://github.com/mgfrantz/CTME-llm-lecture-resources/blob/main/images/colabSecrets.png?raw=true)

In [3]:
# Set the OPENAI_API_KEY environment variable
try:
    from google.colab import userdata # import the environment variables from secrets
    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY') # Set the OPENAI_API_KEY environmnet variable
except:
    import dotenv
    env_loaded = dotenv.load_dotenv('../.env')
    if not env_loaded:
        raise ValueError("Failed to load environment variables from .env file.")
    else:
        print("Loaded environment variables from .env file.")


## Under the hood: `curl`

Almost all of the interactions we will have with LLMs are through API calls.
Below is one of the most low-level ways we can call an LLM, using the `curl` command.
This command gives us a lot of information about how the API request is structured.
We pass a JSON with an authorization header containing our `OPENAI_API_KEY`.
We also pass the model we want to call, the chat messages, and hyperparameters such as `temperature` that help control how text is generated.

In [4]:
# Make the API call to OpenAI and store the response in test.json
!curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{ \
     "model": "gpt-4o-mini", \
     "messages": [{"role": "user", "content": "Say: This is a test!"}], \
     "temperature": 0.7 \
    }' > test.json
# Show the output of test.json formatted nicely
!cat test.json | python -m json.tool

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   901    0   765  100   136   1297    230 --:--:-- --:--:-- --:--:--  1527
{
    "id": "chatcmpl-AchZbpIOixL9mRhQ84VHmmfESMfRu",
    "object": "chat.completion",
    "created": 1733787323,
    "model": "gpt-4o-mini-2024-07-18",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "This is a test!",
                "refusal": null
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 5,
        "total_tokens": 19,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0,
            "audio_tokens": 0,
    

## Using the OpenAI Python client

While the `curl` command shows us how the API call is made, it's not something that we can easily use in more complex applications.
One thing we can use is the OpenAI python client.
We can do the exact same thing, but the API call is a bit more abstracted from the developer.
Let's see how to perform the exact same API call using the OpenAI clinet 👇:

In [5]:
client = OpenAI() # Create the OpenAI client

In [6]:
# Create the messages (same as above)
messages = [
    {"role": "user", "content": "Say: This is a test!"},
]

# Make the API call
chat_completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    # stream=True
)

In [7]:
# Display the output
print(chat_completion)

# Build our first chatbots

In this section, we'll go through several demos.
By the end of this section, you should be able to:
- Build a basic chatbot with the popular `gradio` Python library
- Understand key hyperparameters like `temperature`, `top_p`, and `top_k`
- Build an advanced chatbot with hyperparameter controls

## Exercise: Building a basic chatbot with `gradio`

User interfaces (UIs) are a great way to demo work in AI.
In the next several lessons, we will be using the `gradio` framework to demonstrate our growing skillset.
In this exercise, we will get a gentle introduction to creating chatbots with `gradio`.

Please follow the [ChatInterface](https://www.gradio.app/docs/gradio/chatinterface) documentation and the [Creating a chatbot fast](https://www.gradio.app/guides/creating-a-chatbot-fast) guide to make your first AI chatbot.
Your chatbot must:
- respond to messages

If this too easy, try to:
- add a system prompt
- use `stream=True` in your chat function

In [8]:
pip install --upgrade gradio



In [40]:
from time import sleep
import numpy as np

In [41]:

def chat(message:str, history=list):
  if len(history) == 0:

    history.append({
        "role": "system",
        "content": "You are a helpful assistant."})

    history.append({
        "role": "user",
        "content": message})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=history,
        stream=True
    )

    resp = ""
    for chunk in response:
      sleep(np.random.random())
      resp += chunk.choices[0].delta.content
      yield resp


In [None]:
demo = gr.ChatInterface(fn=chat, type="messages")
demo.launch(debug=True)

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://73a2cb623648dcf1f2.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


## Text generation hyperparameters

There are several hyperparameters we can play with that determine how text is generated.
For each token, the model outputs a score distribution over words, and that distribution is normalized using the softmax function to sum to 1.0.
We have several options to modify this probability distribution in ways that affect the way text is generated.

### `temperature`

The softmax function is shown below:

$$
\text{softmax}(p) = \frac{e^{x_i}}{\sum_{j=0}^ne^{x_j}}
$$

The softmax function is defined in python below:


In [None]:
def softmax(p):
    return np.exp(p) / np.sum(np.exp(p))

# Example usage
p = np.array([1, 2, 3, 4, 5])
print(softmax(p))

The `temperature` paramater allows us to make the most probable words more probable (temperature < 1) or less probable (temperature > 1) than vanilla softmax (temperature = 1).
The formula for softmax with temperature is show below:

$$
\text{softmax}(p, T) = \frac{e^{\frac{x_i}{T}}}{\sum_{j=0}^ne^{\frac{x_j}{T}}}
$$

All you do is divide everything by T before taking the exponent; larger values of $T$ flatten the distribution, while smaller values of $T$ skew the distribution towards the most probable tokens.

### Exercise: Softmax with temperature

In [None]:
# Softmax function with temperature parameter
def softmax_with_temperature(probs, temperature):
    raise NotImplementedError("You need to implement this function!")

In [None]:
# Define a small probability distribution
probs = np.array([0.5, 0.3, 0.1, 0.05, 0.05])

# Plot the distribution with numbers on top of each bar
def plot_distribution(temperature):
    adjusted_probs = softmax_with_temperature(probs, temperature)
    plt.figure(figsize=(6, 4))
    bars = plt.bar(range(len(probs)), adjusted_probs, tick_label=['A', 'B', 'C', 'D', 'E'])
    plt.ylim(0, 1)
    plt.title(f'Softmax with Temperature = {temperature:.2f}')
    plt.ylabel('Probability')
    plt.xlabel('Tokens')

    # Add numbers on top of each bar
    for bar in bars:
        yval = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2, yval, f'{yval:.2f}', ha='center', va='bottom')

    plt.show()

# Interactive widget
interact(plot_distribution, temperature=(0.1, 2.0, 0.1));

### Demo: Alter the `temperature` parameter

In this demo, we generate several messages from the same prompt.
If we lower the temperature to 0, what do you notice about the results?
What if we raise it above 1.0?

In [None]:
messages = [
    {"role": "user", "content": "Write a python function that reverses as tring. Tell a joke in the docstring!"},
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=1.0, # change this number between 0 and 2 to see the outcome
    n=3 # generate 3 choices
)

In [None]:
for choice in response.choices:
    print(choice.message.content)
    print('\n\n' + '='*50 + '\n')

### Demo: `top_k`

In top k sampling, we define the number of tokens we want to consider to sample from.
For example if `top_k = 3`, we will take the scores of the top 3 tokens and apply the softmax to only those 3 scores.
Run the code block below to see how `top_k` normalizes the scores at different values.

In [None]:
# Define a small probability distribution
probs = np.array([0.5, 0.3, 0.1, 0.05, 0.05])

# Plot the distribution with numbers on top of each bar
def plot_top_k(top_k):
    ticks = ['A', 'B', 'C', 'D', 'E']
    sorted_probs = np.sort(probs)[::-1]
    top_k_probs = sorted_probs[:top_k]
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    adjusted_probs = softmax(probs)
    ax = axes[0]
    bars = ax.bar(range(len(probs)), adjusted_probs, tick_label=ticks)
    ax.set_ylim(0, 1)
    ax.set_title('Original Probabilities')
    ax.set_ylabel('Probability')
    ax.set_xlabel('Token')

    # Add numbers on top of each bar
    for bar in bars:
        yval = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2, yval, f'{yval:.2f}', ha='center', va='bottom')

    ax = axes[1]
    if top_k < len(probs):
        updated_probs = softmax(probs[:top_k])
        bars = ax.bar(range(top_k), updated_probs, tick_label=ticks[:top_k])
    else:
        bars = ax.bar(range(len(probs)), softmax(probs), tick_label=ticks)
    ax.set_ylim(0, 1)
    ax.set_title(f'Top {top_k} Probabilities')
    ax.set_ylabel('Probability')
    ax.set_xlabel('Token')
    for bar in bars:
        yval = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2, yval, f'{yval:.2f}', ha='center', va='bottom')

    plt.show()

# Interactive widget
interact(plot_top_k, top_k=(1, len(probs), 1));

### Demo: `top_p` (aka nucleus sampling)

`top_p` is similar to `top_k`, but instead of defining the number of tokens to consider, you define a cutoff for cumulative probability.
For example, if you have a top_p of 0.6 and score of ('cat', 0.4), ('dog', 0.15), ('llama', 0.1), and ('parakeet', 0.01), you would cut only consider 'cat', 'dog', and 'llama' because 0.4 + 0.15 is less than 0.6, but 0.4 + 0.15 + 0.1 is greater.
Because you have a probability cutoff instead of number of tokens, this may mean you have different numbers of tokens considered at each decoding step.

Play around with the `top_p` slider below 👇 to get some intuition for how it works.

In [None]:
# Define a small probability distribution (can simulate a language model's logits)
probs = np.array([0.4, 0.2, 0.15, 0.1, 0.08, 0.05, 0.02])

# Function to apply top-p filtering with a minimum of one token selected
def top_p_filter(probs, p):
    sorted_probs = np.sort(probs)[::-1]
    cumulative_probs = np.cumsum(sorted_probs)

    # Ensure at least one token is selected
    if p < sorted_probs[0]:
        cutoff = 1
    else:
        cutoff = np.argmax(cumulative_probs >= p) + 1

    filtered_probs = sorted_probs[:cutoff]
    return filtered_probs, cutoff

# Plot the distribution with top-p filtering
def plot_top_p(p):
    filtered_probs, cutoff = top_p_filter(probs, p)
    normalized_probs = filtered_probs / np.sum(filtered_probs)  # Normalize the selected probabilities

    # Create two subplots
    fig, axs = plt.subplots(1, 2, figsize=(12, 5))
    labels = ['A', 'B', 'C', 'D', 'E', 'F', 'G']

    # Plot 1: Original distribution with top-p filtering
    bars1 = axs[0].bar(range(len(probs)), np.sort(probs)[::-1], tick_label=labels)
    axs[0].set_ylim(0, 1)
    axs[0].set_title(f'Top-p Sampling (p = {p:.2f}) - Original Probabilities')
    axs[0].set_ylabel('Probability')
    axs[0].set_xlabel('Tokens')

    # Highlight selected and unselected probabilities
    for i, bar in enumerate(bars1):
        if i >= cutoff:
            bar.set_color('gray')  # Color the bars outside top-p as gray
        else:
            bar.set_color('blue')  # Highlight the selected probabilities

    # Add numbers on top of each bar for original distribution
    for bar in bars1:
        yval = bar.get_height()
        axs[0].text(bar.get_x() + bar.get_width()/2, yval, f'{yval:.2f}', ha='center', va='bottom')

    # Plot 2: Normalized probabilities of the selected tokens
    bars2 = axs[1].bar(range(len(filtered_probs)), normalized_probs, tick_label=labels[:len(filtered_probs)])
    axs[1].set_ylim(0, 1)
    axs[1].set_title(f'Normalized Probabilities of Selected Tokens (p = {p:.2f})')
    axs[1].set_ylabel('Normalized Probability')
    axs[1].set_xlabel('Selected Tokens')

    # Add numbers on top of each bar for normalized probabilities
    for bar in bars2:
        yval = bar.get_height()
        axs[1].text(bar.get_x() + bar.get_width()/2, yval, f'{yval:.2f}', ha='center', va='bottom')

    plt.tight_layout()
    plt.show()

# Interactive widget
interact(plot_top_p, p=(0.01, 1.0, 0.05));

## Exercise: Advanced chatbot with hyperparameter controls

Now that you've learned about roles and generation hyperparameters, let's create a new chatbot that allows you to contol at them.
Your chatbot must:
- allow for control of at least 1 generation hyperparameter (ex: `temperature`)
- allow for user input of a system message

If this is too easy, try to:
- allow for control over `temperature` and `top_p`
- improve the UI by putting all the controls in a sidebar
- have a `Clear` button that restarts the conversation
- add documentation with markdown
- implement streaming responses

In [None]:
# Your code here

# Prompt engineering 1: zero-shot prompting

To really benchmark how each of these techniques do, we need a baseline.
We will use zero-shot prompting to get a base level of performance on our task.

So far, we've been using the low-level `openai` library.
However, there are several very competent higher-level libraries that provide great abstractions such as `langchain` and `llama-index`.
Today, we'll be using `llama-index` to make our LLM calls a bit easier.

## Prepare our dataset

In [None]:
# Prompt engineering imports
from datasets import load_dataset, Dataset
from llama_index.core import PromptTemplate
from llama_index.core.prompts import ChatMessage
from llama_index.llms.openai import OpenAI
from pydantic import BaseModel, Field
from IPython.display import display
import asyncio

In [None]:
ds = load_dataset('SetFit/amazon_reviews_multi_en')

In [None]:
train_samples_per_class = 50
eval_test_samples_per_class = 10
train = Dataset.from_pandas(ds['train'].to_pandas().groupby('label').sample(train_samples_per_class, random_state=1234).reset_index(drop=True))
valid = Dataset.from_pandas(ds['validation'].to_pandas().groupby('label').sample(eval_test_samples_per_class, random_state=1234).reset_index(drop=True))
test = Dataset.from_pandas(ds['test'].to_pandas().groupby('label').sample(eval_test_samples_per_class, random_state=1234).reset_index(drop=True))

In [None]:
train.to_pandas().sample(3)

In [None]:
async def predict_and_evaluate(predict_fn):
    labels = [int(x) for x in valid['label']]
    tasks = [
        predict_fn(text)
        for text in valid['text']
    ]
    predictions = await asyncio.gather(*tasks)
    cm = ConfusionMatrixDisplay.from_predictions(labels, predictions, normalize='true')
    cr = classification_report(labels, predictions)
    kappa = cohen_kappa_score(labels, predictions, weights='quadratic')
    mae = mean_absolute_error(labels, predictions)
    return labels, predictions, kappa, mae, cm, cr

## Zero-shot prompt

In [None]:
prompt_tmpl_str = """\
The review text is below.
---------------------
{review}
---------------------
Given the review text and not prior knowledge, \
please attempt to predict the review score of the context.

Query: What is the rating of this review?
Answer: \
"""

prompt_tmpl = PromptTemplate(
    prompt_tmpl_str,
)

In [None]:
class Rating(BaseModel):
    rating: int = Field(..., description="Rating of the review", enum=[0, 1, 2, 3, 4])

llm = OpenAI(model="gpt-4o-mini")
zero_shot_structured_llm = llm.as_structured_llm(Rating)

In [None]:
async def zero_shot_predict(text):
    messages = [
        ChatMessage.from_str(prompt_tmpl.format(review=text))
    ]
    response = await zero_shot_structured_llm.achat(messages)
    return response.raw.rating

In [None]:
zero_shot_labels, zero_shot_predictions, zero_shot_kappa, zero_shot_mae, zero_shot_cm, zero_shot_cr = await predict_and_evaluate(zero_shot_predict)
print(f"Cohen's Kappa: {zero_shot_kappa:.04f}, MAE: {zero_shot_mae}")
print(zero_shot_cr)

# Prompt engineering 2: few-shot promting

In the following cell, we load and parse the data.
The data here is uber reviews

In [None]:
train.shuffle()[:5]

In [None]:
rng = np.random.Generator(np.random.PCG64(1234))

def random_few_shot_examples_fn(**kwargs):
    random_examples = train.shuffle(generator=rng)[:5]
    result_strs = []
    for text, rating in zip(random_examples['text'], random_examples['label']):
        result_strs.append(f"Text: {text}\nRating: {rating}")
    return "\n\n".join(result_strs)

In [None]:
print(random_few_shot_examples_fn())

In [None]:
few_shot_prompt_tmpl_str = """\
The review text is below.
---------------------
{review}
---------------------
Given the review text and not prior knowledge, \
please attempt to predict the review score of the context. \
Here are several examples of reviews and their ratings:

{random_few_shot_examples}

Query: What is the rating of this review?
Answer: \
"""

few_shot_prompt_tmpl = PromptTemplate(
    few_shot_prompt_tmpl_str,
    function_mappings={"random_few_shot_examples": random_few_shot_examples_fn},
)

In [None]:
print(few_shot_prompt_tmpl.format(review='I loved this product!'))

In [None]:
class Rating(BaseModel):
    rating: int = Field(..., description="Rating of the review", enum=[0, 1, 2, 3, 4])

llm = OpenAI(model="gpt-4o-mini")
rand_few_shot_structured_llm = llm.as_structured_llm(Rating)

In [None]:
async def random_few_shot_predict(text):
    messages = [
        ChatMessage.from_str(few_shot_prompt_tmpl.format(review=text))
    ]
    response = await rand_few_shot_structured_llm.achat(messages)
    return response.raw.rating

In [None]:
random_few_shot_labels, random_few_shot_predictions, random_few_shot_kappa, random_few_shot_mae, random_few_shot_cm, random_few_shot_cr = await predict_and_evaluate(random_few_shot_predict)
print(f"Cohen's Kappa: {random_few_shot_kappa:.04f}, MAE: {random_few_shot_mae}")
print(random_few_shot_cr)

# Demo: Embeddings and vector stores

In the previous demonstration, we saw that providing several randomly-selected examples to the LLM at inference time does decently well - it's decently good at predicting the review score, especially within 1 point of the actual review.
In a little bit, we'll see that providing better examples to the model at inference time helps improve these scores.
But we need efficient ways of searching over our `train` examples to determine which one to use.

This is when you want to use a vector store.
Vector stores can be in-memory stors, on-disk stores, database extensions like pgvector for Postgres, or even external APIs like Pinecone.

Today, we'll use a popular open-source vectore database called `chromadb`.
This tool allows us to ingest our documents and search over them effectively to determine which examples to use.

In this demo, we'll go over the basics of how to use ChromaDB.
We will also use `sentence-transformers` for embeddings as an example of how to use open-weights embedding models.

In [None]:
# Imports
from chromadb import Client
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

In this cell, we declare our embedding function.
We will use the small but powerful [BGE-small](https://huggingface.co/BAAI/bge-small-en-v1.5) model to embed our documents.

In [None]:
embed_fn = SentenceTransformerEmbeddingFunction('BAAI/bge-small-en-v1.5')

Next, we can create our `chromadb` client and use it to create our collection (think table).
Notice that we pass our embedding function.
That way, when we add documents to the table, the the text is automatically embedded.

In [None]:
chroma_client = Client()

In [None]:
reviews = chroma_client.create_collection(
    name='reviews',
    embedding_function=embed_fn,
    get_or_create=True
)

In [None]:
reviews.add(documents=train['text'], metadatas=[{'rating': x} for x in train['label']], ids=train['id'])

Once we have created our vector store, we can search over it using plain text.
Here are 3 queries - good review, a neutral review, and a bad review.
Let's search our train documents and observe the average rating for the closest 5 documents to each fake review.

In [None]:
queries = [
    "This product is great!",
    "This product was pretty typical - not good or bad.",
    "This product was awful",
]

In [None]:
retrievals = reviews.query(
    query_texts=queries,
    n_results=5
)

In [None]:
for query, metadatas in zip(queries, retrievals['metadatas']):
    ratings = [float(i.get('rating')) for i in metadatas]
    print(f"Review {query}")
    print(f"Avg rating of retrieved passages: {np.mean(ratings)}")

In [None]:
reviews.query(query_texts = 'hello!')['documents']

## Discussion: Using vector stores

Now that we have our data encoded this way, can anyone tell me how we might use this object to improve the way we classify reviews?

# Prompt engineering 3: dynamic few-shot prompting

In [None]:
def dynamic_few_shot_examples_fn(**kwargs):
    n_examples = kwargs.get('n_examples', 5)
    retrievals = reviews.query(
        query_texts=[kwargs['review']],
        n_results=n_examples
    )
    result_strs = []
    documents = retrievals['documents'][0]
    metadatas = retrievals['metadatas'][0]
    for document, metadata in zip(documents, metadatas):
        result_strs.append(f"Text: {document}\nRating: {metadata.get('rating')}")
    return "\n\n".join(result_strs)

In [None]:
print(dynamic_few_shot_examples_fn(review="This is the best uber ride of my life!"))

In [None]:
print(dynamic_few_shot_examples_fn(review="This is the worst uber ride of my life!", n_examples=2))

In [None]:
dynamic_few_shot_prompt_tmpl_str = """\
The review text is below.
---------------------
{review}
---------------------
Given the review text and not prior knowledge, \
please attempt to predict the review score of the context. \
Here are several examples of reviews and their ratings:

{dynamic_few_shot_examples}

Query: What is the rating of this review?
Answer: \
"""

dynamic_few_shot_prompt_tmpl = PromptTemplate(
    dynamic_few_shot_prompt_tmpl_str,
    function_mappings={"dynamic_few_shot_examples": dynamic_few_shot_examples_fn},
)

In [None]:
print(dynamic_few_shot_prompt_tmpl.format(review='I loved this product!', n_examples=1))

In [None]:
class Rating(BaseModel):
    rating: int = Field(..., description="Rating of the review", enum=[0, 1, 2, 3, 4])

llm = OpenAI(model="gpt-4o-mini")
dynamic_few_shot_structured_llm = llm.as_structured_llm(Rating)

async def dynamic_few_shot_predict(text):
    messages = [
        ChatMessage.from_str(dynamic_few_shot_prompt_tmpl.format(review=text))
    ]
    response = await dynamic_few_shot_structured_llm.achat(messages)
    return response.raw.rating

In [None]:
dynamic_few_shot_labels, dynamic_few_shot_predictions, dynamic_few_shot_kappa, dynamic_few_shot_mae, dynamic_few_shot_cm, dynamic_few_shot_cr = await predict_and_evaluate(dynamic_few_shot_predict)
print(f"Cohen's Kappa: {dynamic_few_shot_kappa:.04f}, MAE: {dynamic_few_shot_mae}")
print(dynamic_few_shot_cr)

# Exercise: Rating Reviews

In today's session, we've learned about:
- Chat models, interfaces, and `gradio`
- Zero shot prompting
- Few shot learning
- Embeddings and vector stores
- Dynamic few shot learning

It's time to combine these principles into our final exercise of the day.
Your task is to create a `gradio` app where a user can paste a review from Amazon and the app displays the predicted number of ⭐stars⭐.
To complete this task, please:
- Create a GradIO app with...
  - an input field where a user can submit text
  - a submit button and/or functionality to submit the text to the app when the user hits the return key
  - an output field to display the predicted result

If this is too easy, try to:
- Add hyperparameters like the number of examples retrieved
- Add details in markdown for how to use the app
- Display the prompt and response for inspection
- Install the [`gradio-client`](https://pypi.org/project/gradio-client/) library and make requests to your app from this notebook

If you're done, and **really** want to challenge yourslef, add a dropdown for a different model.
You can follow the `Gemini API keys` button in the 👈secrets🔑 tab of Colab, or follow [this notebook guide](https://github.com/mgfrantz/CTME-llm-lecture-resources/blob/main/resources/ollama.ipynb) on a GPU colab to try doing inference with local LLMs like llama3 (Recommended to restart and use a gpu runtime, runtime > change runtime type. May not work with `.as_structured_llm`, check out [this low-level guide on structured outputs](https://docs.llamaindex.ai/en/stable/examples/output_parsing/llm_program/)).


In [None]:
# Your code here

# Bonus: Prompt optimization

In [None]:
import dspy

In [None]:
lm = dspy.LM(model='openai/gpt-4o-mini')

In [None]:
dspy.configure(lm=lm)

In [None]:
train_examples = [
    dspy.Example(
        review=e['text'],
        rating=e['label'],
    ).with_inputs('review')
    for e in train
]

valid_examples = [
    dspy.Example(
        review=e['text'],
        rating=e['label'],
    ).with_inputs('review')
    for e in valid
]

test_examples = [
    dspy.Example(
        review=e['text'],
        rating=e['label'],
    ).with_inputs('review')
    for e in test
]


In [None]:
class FewShotResponse(dspy.Signature):
    "A rating for a review"
    review: str = dspy.InputField(description="Review text")
    examples: str = dspy.InputField(description="Examples of reviews and their ratings")
    rating: int = dspy.OutputField(description="Rating of the review. Should be 0, 1, 2, 3, or 4.", ge=0, le=4)

class FewShotLearning(dspy.Module):
    def __init__(self, collection=reviews, k=5):
        super().__init__()
        self.cot = dspy.ChainOfThought(FewShotResponse)
        self.collection = collection
        self.k = k


    def search(self, query):
        results = self.collection.query(query_texts=query, n_results=self.k)
        examples = "\n\n".join([f"Review: {doc}\nRating: {meta.get('rating')}" for doc, meta in zip(results['documents'][0], results['metadatas'][0])])
        return examples

    def forward(self, review):
        examples = self.search(review)
        return self.cot(review=review, examples=examples)

In [None]:
# calculate metrics (not async)
def dspy_predict_and_evaluate(predict_fn):
    labels = [int(x) for x in valid['label']]
    predictions = [
        predict_fn(text).rating
        for text in valid['text']
    ]
    cm = ConfusionMatrixDisplay.from_predictions(labels, predictions, normalize='true')
    cr = classification_report(labels, predictions)
    kappa = cohen_kappa_score(labels, predictions, weights='quadratic')
    mae = mean_absolute_error(labels, predictions)
    return labels, predictions, kappa, mae, cm, cr

In [None]:
def score_func(example, pred, trace=None):
    return float(example.rating == pred.rating)

In [None]:
eval_example = valid_examples[0]
score_func(eval_example, FewShotLearning()(eval_example.review))

In [None]:
# If you want to run the optimizer, set this to True.
# Otherwise, an optimized model will be downloaded for you.
DO_OPTIMIZE = False
if DO_OPTIMIZE:
    optimizer = dspy.teleprompt.MIPROv2(metric=score_func)
    optimized_few_shot = optimizer.compile(
        FewShotLearning(),
        trainset=train_examples,
        valset=valid_examples,
        max_bootstrapped_demos=2,
        max_labeled_demos=2,
        requires_permission_to_run=False
    )
else:
    # If you don't want to run the optimizer, you can download the pre-trained model below
    import os
    import requests

        # Download the pre-trained model if it doesn't exist locally
    model_path = "mipro_optimized_few_shot.json"
    if not os.path.exists(model_path):
        print("Downloading pre-trained model...")
        url = "https://raw.githubusercontent.com/mgfrantz/CTME-llm-lecture-resources/main/prototyping_ai/mipro_optimized_few_shot.json"
        response = requests.get(url)
        if response.status_code == 200:
            with open(model_path, "wb") as f:
                f.write(response.content)
            print("Model downloaded successfully")
        else:
            print(f"Failed to download model: {response.status_code}")

    optimized_few_shot = FewShotLearning()
    optimized_few_shot.load(model_path)

In [None]:
optimized_few_shot.save("mipro_optimized_few_shot.json")

In [None]:
# calculate metrics (not async, no async support for dspy yet)
def dspy_predict_and_evaluate(predict_fn):
    labels = [int(x) for x in valid['label']]
    predictions = [
        predict_fn(text).rating
        for text in valid['text']
    ]
    cm = ConfusionMatrixDisplay.from_predictions(labels, predictions, normalize='true')
    cr = classification_report(labels, predictions)
    kappa = cohen_kappa_score(labels, predictions, weights='quadratic')
    mae = mean_absolute_error(labels, predictions)
    return labels, predictions, kappa, mae, cm, cr

In [None]:
unoptimized_labels, unoptimized_predictions, unoptimized_kappa, unoptimized_mae, unoptimized_cm, unoptimized_cr = dspy_predict_and_evaluate(FewShotLearning())
print(f"Cohen's Kappa: {unoptimized_kappa:.04f}, MAE: {unoptimized_mae}")
print(unoptimized_cr)

In [None]:
optimized_labels, optimized_predictions, optimized_kappa, optimized_mae, optimized_cm, optimized_cr = dspy_predict_and_evaluate(optimized_few_shot)
print(f"Cohen's Kappa: {optimized_kappa:.04f}, MAE: {optimized_mae}")
print(optimized_cr)