In [70]:
%%capture output
%pip install python-dotenv openai azure-identity instructor semantic-kernel

# Agenda

1. Chat Completions
   * Message Types
   * Request Parameters
   * Prompt Engineering
   * Structured Output
   * Function Calling
2. Retrieval Augmented Generation
   * Vector Databases
   * Visualizing Semantic Similarity
   * Hybrid Search and Reranking
3. Promptflow
   * Anatomy of a flow
   * Evaluations + Benchmarking


# Chat Completions

Chat completions refer to the responses generated by language models like GPT-4, during a conversation or interaction with users. These responses are crafted based on the input received, context, and predefined instructions or system messages.

These models produce probabilistic output by assigning weights to different parts of the input during inference, determining the likelihood of each possible next word or phrase based on its surrounding contextual relevance.

## Message Types

* User Messages
* System Messages
* Assistant Messages

### User Messages

* Messages sent by the user to the AI.
* Usually in the form of questions, commands, or conversational input.
* Example: "How do I make an omelette?"

### System Messages

* Instructions provided to guide the AI’s behavior
* The model weighs instructions here much more than other message types
* Example: "You are a british chef and restauranteur with a short and fiery temper”

### Assistant Messages

* Responses generated by the AI
* Typically reserved for replies to user inputs based on the context and instructions.
* Example: "Make the bloody omelette you donkey"

In [78]:
from openai import OpenAI

ollama_client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = ollama_client.chat.completions.create(
    model = "gemma2:9b",
    messages = [
        {
            "role": "system",
            "name": "instruction",
            "content": "You are a british chef and restauranteur with a short and fiery temper. Keep your responses rude and curt."
        },
        {
            "role": "user",
            "name": "beginner_chef",
            "content": "How do I make an omelette?",
        }
    ],
    stream=True
)

for chunk in response:
    if (len(chunk.choices) > 0 and chunk.choices[0].delta.content):
        print(chunk.choices[0].delta.content, end='', flush=True)

Right, listen up, you muppet. Crack two eggs into a bowl, whisk 'em like you mean it. Splash in a bit of milk, salt and pepper. Fry a knob of butter in a pan, chuck the eggs in, then scramble them about till they're set. Fold it over, plate it up, and for God's sake, don't burn it. Now get outta my kitchen!  


## Request Parameters

* Token Probabilities
* Limiting Parameters

### Token Probabilities

* `logprobs`: Boolean that if true, will return log probabilities of output tokens.
* `top_logprobs`: Number of most likely tokens to return at each token position.
  


**Use Cases:**

* Classification: Set confidence thresholds based on probabilities
* Retrieval Evaluation: Self-evaluation with confidence scores
* Autocomplete: Assist in word suggestion as a user types
* Calculating Perplexity: Compare confidence of results across different prompts

In [72]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv
from math import exp
import numpy as np

load_dotenv(override=True)

scope = "https://cognitiveservices.azure.com/.default"
token_provider = get_bearer_token_provider(DefaultAzureCredential(), scope)
client = AzureOpenAI(
    azure_ad_token_provider=token_provider, 
    api_version="2024-03-01-preview",
    azure_endpoint="https://oai-vena-copilot-npr-canadaeast-01.openai.azure.com"
)

In [209]:
instruction = """
Your task is to determine if that food is ONE of these classifications: sweet, salty, sour, bitter or umami.

### Examples
strawberry = sweet
bacon = salty
lemon = sour
beer = bitter
mushroom = umami

### Expected Output
One of sweet, salty, sour, bitter or umami. NOTHING ELSE.
"""
response = client.chat.completions.create(
    model="gpt-35-turbo",
    messages=[
        {"role": "system", "content": instruction},
        {"role": "user", "content": "baby back ribs = "}
    ],
    logprobs=True,
    top_logprobs=3
)

print("Prediction:", response.choices[0].message.content)
for i, content in enumerate(response.choices[0].logprobs.content):
    print(f"token {i + 1}:")
    for j, logprob in enumerate(content.top_logprobs):
        probability = np.round(np.exp(logprob.logprob) * 100, 2)
        print(f"\ttop_logprobs: {logprob.token}, probability: {probability}")


Prediction: salty
token 1:
	top_logprobs: s, probability: 99.92
	top_logprobs: S, probability: 0.08
	top_logprobs: sweet, probability: 0.0
token 2:
	top_logprobs: alty, probability: 100.0
	top_logprobs: we, probability: 0.0
	top_logprobs: our, probability: 0.0


### Limiting Parameters

* `max_tokens`: Max number of tokens to generate
* `n`: Number of chat completion choices to generate
* `stop`: Sequence where the API will stop generating further tokens 
  

**Use Cases**
* Fixed response length / cost management
* Generating multiple options for gauging response cohesiveness
* Strategic sequences for controlling response length
  

In [210]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv
from math import exp
import numpy as np

load_dotenv(override=True)

scope = "https://cognitiveservices.azure.com/.default"
token_provider = get_bearer_token_provider(DefaultAzureCredential(), scope)
client = AzureOpenAI(
    azure_ad_token_provider=token_provider, 
    api_version="2024-03-01-preview",
    azure_endpoint="https://oai-vena-copilot-npr-canadaeast-01.openai.azure.com"
)

In [229]:
response = client.chat.completions.create(
    model="gpt-35-turbo",
    messages=[
        {"role": "system", "content": "Generate a haiku about the provided food"},
        {"role": "user", "content": "Pho"}
    ],
    # max_tokens=100,
    n=3
    # stop="\n"
)

for choice in response.choices:
    print(choice.message.content + "\n")

Steaming bowl of broth
Noodles, herbs, and meat abound
Pho warms body, soul

Steaming bowl of pho
Broth so rich, noodles so fine
Comfort in a bowl

Steaming broth and noodles
Herbs and meat in harmony
Pho warms the soul deep



## Prompt Engineering

In [None]:
print("hello world!")

## Structured Output

* JSON Output
* Types

### JSON Output

* Model is constrained to only generate strings that parse into valid JSON object
* Must instruct the model to produce JSON somewhere in the system message
* Formatting/examples are important here

In [237]:
import json
from openai import OpenAI

ollama_client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

resp = ollama_client.chat.completions.create(
    model="gemma2:9b",
    messages=[
        {
            "role": "system",
            "content": """
            # Task
            You will be given a meal (breakfast/lunch/dinner). Generate a JSON list of 5 dishes for that meal.
            
            # Expected Format:
            { "dishes": [ {"name": "<dish>", "style": "<dish style>"}, ... ] }
            """,
        },
        {
            "role": "user",
            "content": "breakfast",
        }
    ],
    response_format={ "type": "json_object" }
)

for dish in json.loads(resp.choices[0].message.content)["dishes"]:
    print(dish)

{'name': 'Pancakes', 'style': 'American'}

{'name': 'Scrambled Eggs with Toast', 'style': 'Classic'}

{'name': 'Oatmeal with Berries', 'style': 'Healthy'}

{'name': 'Breakfast Burrito', 'style': 'Mexican'}

{'name': 'Yogurt Parfait', 'style': 'Light'}

### Types

Types are important for bringing some semblance of order to an otherwise chaotic system

[Pydantic](https://docs.pydantic.dev/latest/) is the defacto type validation library for Python powered by type annotations
```python
class DishList(BaseModel):
    dishes: List[Dish] = Field(..., description="Contains a list of dish objects containing name and style")

class Dish(BaseModel):
    name: str
    style: str = Field(..., description="The dish type i.e. Mexican, Japanese etc.")
```

[Instructor](https://python.useinstructor.com/why/) is a library that shims various LLM provider with the ability to validate and return Pydantic types
```python
client.chat.completions.create(
    ...,
    response_model=DishList
)
```

In [244]:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List
import instructor

class DishList(BaseModel):
    dishes: List[Dish] = Field(..., description="Contains a list of dish objects containing name and style")

class Dish(BaseModel):
    name: str
    style: str = Field(..., description="The dish type i.e. Mexican, Japanese etc.")

ollama_client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
instructor_client = instructor.from_openai(ollama_client, mode=instructor.Mode.JSON)

response = instructor_client.chat.completions.create(
    model="gemma2:9b",
    messages=[
        {
            "role": "system",
            "content": """
            # Task
            You will be given a meal (breakfast/lunch/dinner). Generate a JSON list of 5 dishes for that meal.
            
            # Expected Format:
            { "dishes": [ {"name": "<dish>", "style": "<dish style>"}, ... ] }
            """,
        },
        {
            "role": "user",
            "content": "breakfast",
        }
    ],
    response_model=DishList
)

for dish in response.dishes:
    print(f"Dish: {dish.name}, Style: {dish.style}")

Dish: Pancakes, Style: American
Dish: Scrambled Eggs, Style: Western
Dish: Breakfast Burrito, Style: Mexican
Dish: Avocado Toast, Style: Modern
Dish: Oatmeal with Berries, Style: Healthy


## Function Calling
* Open AI Tools
* Semantic Kernel

### Open AI Tools

Model is constrained to predict which function(s) and argument(s) should be called to achieve a task.

Assuming we have functions like this in our codebase:
```python
def get_weather(location: str, unit: str):
    return ...

def google_search(query: str):
    return ...
```

How can we get an LLM to intelligently choose which tools to call?

### Tool Schema

In [None]:
weather_tool = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
        },
    },
}

google_tool = {
    "type": "function",
    "function": {
        "name": "google_search",
        "description": "Queries google for a search query",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query",
                }
            },
            "required": ["query"],
        },
    },
}

In [269]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv

load_dotenv(override=True)

scope = "https://cognitiveservices.azure.com/.default"
token_provider = get_bearer_token_provider(DefaultAzureCredential(), scope)
aoai_client = AzureOpenAI(
    azure_ad_token_provider=token_provider, 
    api_version="2024-03-01-preview",
    azure_endpoint="https://oai-vena-copilot-npr-canadaeast-01.openai.azure.com"
)

response = aoai_client.chat.completions.create(
    model="gpt-35-turbo",
    messages=[
        {
            "role": "system", 
            "content": "You must always use tools"
        },
        {
            "role": "user",
            "content": "What is the weather in toronto and dallas and who won the super bowl?",
        }
    ],
    tools=[weather_tool, google_tool]
)

for tool in response.choices[0].message.tool_calls:
    print(tool.function)

Function(arguments='{"location": "Toronto", "unit": "celsius"}', name='get_weather')
Function(arguments='{"location": "Dallas", "unit": "celsius"}', name='get_weather')
Function(arguments='{"query": "super bowl winner"}', name='google_search')


### Example with Instructor

In [270]:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Iterable, Literal
import instructor

class Weather(BaseModel):
    location: str = Field(..., description = "The city and state, e.g. San Francisco, CA")
    units: Literal["celsius", "fahrenheit"]

class GoogleSearch(BaseModel):
    query: str = Field(..., description = "The search query")

instructor_client = instructor.from_openai(aoai_client, mode=instructor.Mode.PARALLEL_TOOLS)

response = instructor_client.chat.completions.create(
    model="gpt-35-turbo",
    messages=[
        {
            "role": "system", 
            "content": "You must always use tools"
        },
        {
            "role": "user",
            "content": "What is the weather in toronto and dallas and who won the super bowl?",
        }
    ],
    response_model=Iterable[Weather | GoogleSearch]
)

for function in response:
    print(type(function), function)

<class '__main__.Weather'> location='Toronto, ON' units='celsius'
<class '__main__.Weather'> location='Dallas, TX' units='celsius'
<class '__main__.GoogleSearch'> query='super bowl winner'


### Semantic Kernel

An [open-source library](https://learn.microsoft.com/en-us/semantic-kernel/overview/) that lets you build AI agents and integrate the latest AI models with bindings in C#, Python, and Java.


For this talk, we'll focus on SK within the lens of function calling.

**Initializing Kernel**

Configures the LLM available to an instance of Semantic Kernel:

In [325]:
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv

load_dotenv(override=True)

scope = "https://cognitiveservices.azure.com/.default"
token_provider = get_bearer_token_provider(DefaultAzureCredential(), scope)

kernel = Kernel()

service_id = "default"
kernel.add_service(
    AzureChatCompletion(
        service_id=service_id,
        endpoint="https://oai-vena-copilot-npr-canadaeast-01.openai.azure.com",
        deployment_name="gpt-35-turbo",
        ad_token_provider=token_provider
    ),
)

**Initializing Native Plugins**

Defines types and functions the LLM has access to:

In [334]:
from typing import List, Optional, TypedDict, Annotated
from semantic_kernel.functions import kernel_function
import random

class Weather(TypedDict):
    location: str
    unit: str
    temperature: float

class WeatherPlugin:
   @kernel_function(
      name="get_weather",
      description="Get the current weather in a given location",
   )
   def get_weather(self, 
                   location: Annotated[str, "Location to retrieve weather for"], 
                   unit: Annotated[str, "celsius or fahrenheit"]) -> Annotated[Weather, "The weather for a given location"]:
      return Weather(location=location, unit=unit, temperature=random.randint(10, 100))

**Executing the Kernel**

Same canonical example as the previous section, but SK automatically invokes our native Python functions for us automatically!

In [350]:
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, AzureChatPromptExecutionSettings
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.chat_completion_client_base import ChatCompletionClientBase
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.functions.kernel_arguments import KernelArguments

# Add the plugin to the kernel
kernel.add_plugin(WeatherPlugin(), plugin_name="Weather",)

chat_completion : AzureChatCompletion = kernel.get_service(type=ChatCompletionClientBase)

# Enable auto-function calling
execution_settings = AzureChatPromptExecutionSettings(tool_choice="auto")
execution_settings.function_choice_behavior = FunctionChoiceBehavior.Auto()

# Create a history of the conversation
history = ChatHistory()
history.add_message({"role": "user", "content": "What's the weather in Toronto and San Antonio in freedom units?"})

# Get the response from the AI
result = (await chat_completion.get_chat_message_contents(
  chat_history=history,
  settings=execution_settings,
  kernel=kernel,
  arguments=KernelArguments(),
))[0]

# Print the results
print("Assistant > " + str(result))

Assistant > Currently, the weather in Toronto is 96°F and the weather in San Antonio is 25°F.


# Retrieval Augmented Generation

## Vector Databases

In [None]:
print("hello world!")

## Visualizing Semantic Similarity

In [None]:
print("hello world!")

## Hybrid Search and Reranking

In [None]:
print("hello world!")

# Promptflow

## Anatomy of a flow

In [None]:
print("hello world")

## Evaluations + Benchmarking

In [None]:
print("hello world")