In [70]:
%%capture output
%pip install python-dotenv openai azure-identity instructor

# Agenda

1. Chat Completions
   * Message Types
   * Request Parameters
   * Prompt Engineering
   * Structured Output
   * Tool Calling
2. Retrieval Augmented Generation
   * Vector Databases
   * Visualizing Semantic Similarity
   * Hybrid Search and Reranking
3. Promptflow
   * Anatomy of a flow
   * Evaluations + Benchmarking


# Chat Completions

Chat completions refer to the responses generated by language models like GPT-4, during a conversation or interaction with users. These responses are crafted based on the input received, context, and predefined instructions or system messages.

These models produce probabilistic output by assigning weights to different parts of the input during inference, determining the likelihood of each possible next word or phrase based on its surrounding contextual relevance.

## Message Types

* User Messages
* System Messages
* Assistant Messages

### User Messages

* Messages sent by the user to the AI.
* Usually in the form of questions, commands, or conversational input.
* Example: "How do I make an omelette?"

### System Messages

* Instructions provided to guide the AI’s behavior
* The model weighs instructions here much more than other message types
* Example: "You are a british chef and restauranteur with a short and fiery temper”

### Assistant Messages

* Responses generated by the AI
* Typically reserved for replies to user inputs based on the context and instructions.
* Example: "Make the bloody omelette you donkey"

In [62]:
from openai import OpenAI

ollama_client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = ollama_client.chat.completions.create(
    model = "gemma2:9b",
    messages = [
        {
            "role": "system",
            "name": "gordon_ramsay",
            "content": "You are a british chef and restauranteur with a short and fiery temper. Keep your responses rude and curt."
        },
        {
            "role": "user",
            "name": "amateur_chef",
            "content": "How do I make an omelette?",
        }
    ],
    stream=True
)

for chunk in response:
    if (len(chunk.choices) > 0 and chunk.choices[0].delta.content):
        print(chunk.choices[0].delta.content, end='', flush=True)

Butter the pan, crack the eggs in a bowl, whisk 'em like you mean it. Throw it in, don't fiddle about, and slide that flipping thing out before it burns!  Now get outta my kitchen. 




## Request Parameters

* Top Logprobs (logprobs, top_logprobs)
* Limiting Parameters (max_tokens, n, stop sequences)

### Top Logprobs

* logprobs: Boolean that if true, will return log probabilities of output tokens.
* top_logprobs: Number of most likely tokens to return at each token position.
  

### Top Logprobs

* logprobs: Boolean that if true, will return log probabilities of output tokens.
* top_logprobs: Number of most likely tokens to return at each token position.
  


**Use Cases:**

* Classification: Set confidence thresholds based on probabilities
* Retrieval Evaluation: Self-evaluation with confidence scores
* Autocomplete: Assist in word suggestion as a user types
* Calculating Perplexity: Compare confidence of results across different prompts

In [33]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv
from math import exp
import numpy as np

load_dotenv(override=True)

scope = "https://cognitiveservices.azure.com/.default"
token_provider = get_bearer_token_provider(DefaultAzureCredential(), scope)
client = AzureOpenAI(
    azure_ad_token_provider=token_provider, 
    api_version="2024-03-01-preview",
    azure_endpoint="https://oai-vena-copilot-npr-canadaeast-01.openai.azure.com"
)

instruction = """
You will be given text describing some food.  Your task is to determine if that food is spicy.

Examples:
pepper = true
milk = false

Expected Output: 
true or false NOTHING ELSE
"""
response = client.chat.completions.create(
    model="gpt-35-turbo",
    messages=[
        {"role": "system", "content": instruction},
        {"role": "user", "content": "jolokia = "}
    ],
    logprobs=True,
    top_logprobs=3
)

top_logprobs = response.choices[0].logprobs.content[0].top_logprobs
for logprob in top_logprobs:
    print({
        "token": logprob.token,
        "logprobs": logprob.logprob,
        "probability": np.round(np.exp(logprob.logprob) * 100, 2)
    })

{'token': 'true', 'logprobs': -0.00014716439, 'probability': 99.99}
{'token': 'True', 'logprobs': -9.126677, 'probability': 0.01}
{'token': 'false', 'logprobs': -10.310749, 'probability': 0.0}


## Prompt Engineering

In [None]:
print("hello world!")

## Structured Output

In [24]:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List
import instructor

class Character(BaseModel):
    name: str
    age: int
    fact: List[str] = Field(..., description="A list of facts about the character")


# enables `response_model` in create call
client = instructor.from_openai(
    OpenAI(base_url="http://localhost:11434/v1", api_key="ollama"), 
    mode=instructor.Mode.JSON,
)

resp = client.chat.completions.create(
    model="gemma2:9b",
    messages=[
        {
            "role": "user",
            "content": "Tell me about the Harry Potter",
        }
    ],
    response_model=Character,
)
print(resp.model_dump_json(indent=2))

{
  "name": "Harry Potter",
  "age": 17,
  "fact": [
    "He is a wizard.",
    "His parents were murdered by Lord Voldemort.",
    "He has a scar shaped like a lightning bolt on his forehead."
  ]
}


## Tool Calling
* Normal Tools with OpenAPI specs
* Semantic Kernel Auto Tool Calling

In [None]:
print("hello world!")

# Retrieval Augmented Generation

## Vector Databases

In [None]:
print("hello world!")

## Visualizing Semantic Similarity

In [None]:
print("hello world!")

## Hybrid Search and Reranking

In [None]:
print("hello world!")

# Promptflow

## Anatomy of a flow

In [None]:
print("hello world")

## Evaluations + Benchmarking

In [None]:
print("hello world")