# Evaluating Model Outputs

We can evaluate a model's confidence in its results by using perplexity. Perplexity is a measure of uncertainty that can be calculated by exponentiating the negative of the average of the logprobs. 

+ Perplexity can be used to assess the result of an individual model run.
+ It can also be used to compare the relative confidence of results between model runs. 

Low perplexity or high confidence does not guarantee accuracy, but it can be a helpful signal when paired with other evaluation metrics. 

In [1]:
%load_ext dotenv
%dotenv ../../05_src/.secrets

In [2]:
from openai import OpenAI
import numpy as np
import os 

client = OpenAI(base_url='https://k7uffyg03f.execute-api.us-east-1.amazonaws.com/prod/openai/v1', 
                api_key='any value',
                default_headers={"x-api-key": os.getenv('API_GATEWAY_KEY')})

In [3]:
prompts = [
    # Low perplexity: Clear topic, common structure, highly predictable vocabulary
    "Explain how photosynthesis works in simple terms.",
    # Medium preplexity: Narrative freedom, but familiar theme and constraints.
    "Write a short story about a traveler who realizes the journey mattered more than the destination.",
    # High perplexity: Abstract concept, creative freedom, unpredictable vocabulary
    "Describe the taste of a color that only exists for one second at dusk, using metaphors from mathematics and weather."
]

In [4]:
def get_completion(
    input: list[dict[str, str]],
    model: str = "gpt-4o-mini",
    max_tokens=500,
    temperature=0,
    tools=None,
    logprobs=None,  # whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message..
    top_logprobs=None,
) -> str:
    params = {
        "model": model,
        "input": input,
        "max_output_tokens": max_tokens,
        "temperature": temperature,
        "tools": tools,
        "include": ["message.output_text.logprobs"] if logprobs else [],
        "top_logprobs": top_logprobs,
    }
    if tools:
        params["tools"] = tools

    completion = client.responses.create(**params)
    return completion

In [5]:

for prompt in prompts:
    API_RESPONSE = get_completion(
        [{"role": "user", "content": prompt}],
        model="gpt-4o-mini",
        logprobs=True,
    )
    logprobs = [token.logprob for token in API_RESPONSE.output[0].content[0].logprobs]
    response_text = API_RESPONSE.output[0].content[0].text
    response_text_tokens = [token.token for token in API_RESPONSE.output[0].content[0].logprobs]
    max_starter_length = max(len(s) for s in ["Prompt:", "Response:", "Tokens:", "Logprobs:", "Perplexity:"])
    max_token_length = max(len(s) for s in response_text_tokens)
    

    formatted_response_tokens = [s.rjust(max_token_length) for s in response_text_tokens]
    formatted_lps = [f"{lp:.2f}".rjust(max_token_length) for lp in logprobs]

    perplexity_score = np.exp(-np.mean(logprobs))
    
    print("\n\n\nPrompt:".ljust(max_starter_length), prompt)
    print("Response:".ljust(max_starter_length), response_text, "\n")
    print("Tokens:".ljust(max_starter_length), " ".join(formatted_response_tokens))
    print("Logprobs:".ljust(max_starter_length), " ".join(formatted_lps))
    print("\nPerplexity:".ljust(max_starter_length), perplexity_score, "\n")





Prompt:  Explain how photosynthesis works in simple terms.
Response:   Photosynthesis is the process that plants, algae, and some bacteria use to make their own food. Here’s how it works in simple terms:

1. **Sunlight**: Plants take in sunlight using a green pigment called chlorophyll, which is found in their leaves.

2. **Water**: Plants absorb water from the soil through their roots.

3. **Carbon Dioxide**: Plants take in carbon dioxide from the air through tiny openings in their leaves called stomata.

4. **Making Food**: Using the energy from sunlight, plants combine water and carbon dioxide to create glucose (a type of sugar) and oxygen. The glucose is used as food for energy and growth.

5. **Oxygen Release**: The oxygen produced during this process is released back into the air, which is essential for us and other living beings to breathe.

In summary, photosynthesis is how plants turn sunlight, water, and carbon dioxide into food and oxygen! 

Tokens:         Photos   ynthe