# Challenge 02 - OpenAI Models & Capabilities

**NOTE:** This challenge is OPTIONAL due to the retirement of several Azure OpenAI models. You can read more about this decision on the Microsoft blog [here](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/announcing-updates-to-azure-openai-service-models/ba-p/3866757). However, in a world where the availability and development of models are always changing, we encourage you to understand the general concepts and material in this Challenge because the comparison techniques utilized can be applicable to scenarios where you are comparing Large Language Models.

In this challenge, you will learn about the different capabilities of OpenAI models and learn how to choose the best model for your use case.

If you have deployed some of the depreciated models from before, you will still be able to use them until July 5, 2024. However if you had not deployed those models, but have gpt-4 access, you can compare gpt3.5 to gpt4 in this challenge. If you do not have gpt-4 access, you can still go through this challenge conceptually to understand how to best pick a model from the ones you have deployed as well as the ones in the model catalog.

Questions you will be able to answer by the end of this challenge:

* How do responses differ for each model?
* What are ways to benchmark the performance of models? 

## 1. Overview on finding the right model for you



### 1.1 Model Families

Azure OpenAI provides access to many different models, grouped by family and capability. A model family typically associates models by their intended task. 

Model families currently available as of _Aug 4, 2023_ in Azure OpenAI includes GPT-3, Codex and Embeddings, GPT-4 is available for application. Please reference this link for more information: [Azure OpenAI Service models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)


*Some models are not available for new deployments beginning **July 6, 2023**. Deployments created prior to July 6, 2023 remain available to customers until **July 5, 2024**. You may revise the environment file and the model you deploy accordingly. Please refer to the following link for more details: [Azure OpenAI Service legacy models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/legacy-models)

### 1.2 Model Capacities
The GPT-3 models can understand and generate natural language. The service offers four model capabilities, each with different levels of power and speed suitable for different tasks. Davinci is the most capable model, while Ada is the fastest. The following list represents the latest versions of GPT-3 models, ordered by increasing capability.

- text-ada-001
- text-babbage-001
- text-curie-001
- text-davinci-003


[Azure OpenAI models](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models)  


| | Similarity embedding | Text search embedding | Code Search Embedding |
| --- | --- | --- | --- |
| |These models are good at capturing **semantic similarity** between two or more pieces of text. | These models help measure whether long documents are relevant to a short search query. There are two input types supported by this family: **doc**, for embedding the documents to be retrieved, and **query**, for embedding the search query. |Similar to text search embedding models, there are two input types supported by this family: **code**, for embedding code snippets to be retrieved, and text, for embedding natural language search queries. | 
|**Use cases** | Clustering, regression, anomaly detection, visualization | Search, context relevance, information retrieval | Code search and relevance |
|**Models** |text-similarity-ada-001 <br> text-similarity-babbage-001 <br> text-similarity-curie-001 <br> text-similarity-davinci-001 | text-search-ada-doc-001  <br> text-search-ada-query-001  <br> text-search-babbage-doc-001  <br> text-search-babbage-query-001  <br> text-search-curie-doc-001  <br>text-search-curie-query-001  <br> text-search-davinci-doc-001  <br> text-search-davinci-query-001 | code-search-ada-code-001 <br> code-search-ada-text-001 <br> code-search-babbage-code-001 <br> code-search-babbage-text-001 | 


### 1.3 Model Taxonomy  
Let's choose a general text GPT-3 model, using the second most powerful model (Curie)

**Model taxonomy**: {capability} - {family} - {input-type} - {identifier}  

{family}     --> text   (general text GPT-3 model)  
{capacity} --> curie  (curie is second most powerful in ada-babbage-curie-davinci family)  
{input-type} --> n/a    (only specified for search models)  
{identifier} --> 001    (version 001)  

model = "text-curie-001"

| Element | Description |
| --- | --- |
|**{family}** | The model family of the model. For example, GPT-3 models uses text, while Codex models use code.|
|**{capacity}** | The relative capacity of the model. For example, GPT-3 models include ada, babbage, curie, and davinci, with increasing capacity. |
|**{input-type}** | (Embeddings models only) The input type of the embedding supported by the model. For example, text search embedding models support doc and query. | 
|**{identifier}** |The version identifier of the model.| 	

	


	
	

While Davinci is the most capable, the other models provide significant speed advantages. Our recommendation is for users to start with Davinci while experimenting, because it produces the best results and validate the value that Azure OpenAI can provide. Once you have a prototype working, you can then optimize your model choice with the best latency/performance balance for your application.

### 1.4 Pricing Details

For the most up-to-date information, check out the Azure OpenAI [pricing page](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/).



### 1.5 Quotas and Limits

**NOTE:** The below limits are subject to change. As you move towards production and your solution scales, you might need higher limits. If a quota increase is required, please fill out the form found here: [Azure OpenAI Service: Request for Quota Increase](https://aka.ms/oai/quotaincrease)

|Limit Name	|Limit Value|
|---|---|
|OpenAI resources per region per Azure subscription|	3|
| Requests per minute per model* | Davinci-models (002 and later): 120<br>ChatGPT model: 300<br>GPT-4 models: 18<br>All other models: 300                                             |
| Tokens per minute per model*   | Davinci-models (002 and later): 40,000<br>ChatGPT model: 120,000<br>GPT-4 8k model: 10,000<br>GPT-4 32k model: 32,000<br>All other models: 120,000 |
|Max fine-tuned model deployments*	|2|
|Ability to deploy same model to multiple deployments	|Not allowed|
|Total number of training jobs per resource|	100|
|Max simultaneous running training jobs per resource|	1|
|Max training jobs queued	|20|
|Max Files per resource	|50|
|Total size of all files per resource	|1 GB|
|Max training job time (job will fail if exceeded)	|720 hours|
|Max training job size (tokens in training file) x (# of epochs)	|2 Billion|

### 1.6 Model Best Use Cases

Here is some general guidance on well-suited applications that tend to differentiate models. Note that these are not hard and fast rules, and oftentimes experimentation and benchmarking are important to making the best decision for your solution.

|Model|Use Cases|
|---|---|
|Davinci| Complex intent, cause and effect, summarization for audience|
|Curie|Language translation, complex classification, text sentiment, summarization|
|Babbage|Moderate classification, semantic search classification|
|Ada|Parsing text, simple classification, address correction, keywords|

### 1.7 Model Selection Best Practices
While Davinci is the most capable, the other models can provide significant advantages such as speed (low latency). Our recommendation is for users to start with Davinci while experimenting, because it produces the best results and validate the value that Azure OpenAI can provide. 

Once you have a prototype working, you can then optimize your model choice with the best latency/performance balance for your application.

## 2. Let's Start Implementation

If you don't already have the OpenAI, Python-dotenv, plotly, or scikit-learn packages installed on your compute, the following cells will install them.

In [None]:
%pip install --upgrade pandas
%pip install --upgrade openai
%pip install --upgrade python-dotenv
%pip install --upgrade plotly scikit-learn

In [None]:
import openai
import os
import json
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())


Set up your environment to access your OpenAI keys. Refer to your OpenAI resource in the Azure Portal to retrieve information regarding your OpenAI endpoint and keys.

For security purposes, store your sensitive information in a .env file.

In [None]:
API_KEY = os.getenv("OPENAI_API_KEY")
assert API_KEY, "ERROR: Azure OpenAI Key is missing"
openai.api_key = API_KEY

RESOURCE_ENDPOINT = os.getenv("OPENAI_API_BASE","").strip()
assert RESOURCE_ENDPOINT, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in RESOURCE_ENDPOINT.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"

openai.api_base = RESOURCE_ENDPOINT
openai.api_type = os.getenv("OPENAI_API_TYPE")
openai.api_version = os.getenv("OPENAI_API_VERSION")

chat_model=os.getenv("CHAT_MODEL_NAME")
davinci_model=os.getenv("TEXT_DAVINCI_MODEL_NAME")
curie_model=os.getenv("TEXT_CURIE_MODEL_NAME")
babbage_model=os.getenv("TEXT_BABBAGE_MODEL_NAME")
ada_model=os.getenv("TEXT_ADA_MODEL_NAME")

### 2.0 Helper Functions
Throughout this course, we will use OpenAI's `gpt-3.5-turbo` model and the [chat completions endpoint](https://platform.openai.com/docs/guides/chat). 

This helper function will make it easier to use prompts and look at the generated outputs.

**timer wrapper** helps us monitor and compare the latency of each model.

**get_completion** helps create the OpenAI response using the text completion model of your choice.

**get_chat_completion** helps create the OpenAI response using the chat model of your choice.

**get_completion_from_messages** helps create the OpenAI response using the chat model of your choice, enabling chat history.


In [None]:
import functools
import time

def timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        value = func(*args, **kwargs)
        end_time = time.perf_counter()
        run_time = end_time - start_time
        print("Finished {} in {} secs".format(repr(func.__name__), round(run_time, 3)))
        return value[0], value[1], round(run_time, 3)

    return wrapper

In [None]:
@timer
def get_completion(prompt, model=davinci_model):
    response = openai.Completion.create(
        engine=model,
        prompt=prompt,
        temperature=0, # this is the degree of randomness of the model's output
        max_tokens = 500,
        top_p = 1.0,
    )
    return response.choices[0].text, response['usage']['total_tokens']

In [None]:
@timer
def get_chat_completion(prompt, model=chat_model):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
        max_tokens = 200,
        top_p = 1.0,
    )
    return response.choices[0].message["content"]

In [None]:
@timer
def get_completion_from_messages(messages, model=chat_model, temperature=0):
    response = openai.ChatCompletion.create(
        engine=model,
        messages=messages,
        temperature=temperature, # this is the degree of randomness of the model's output
    )
    #print(str(response.choices[0].message))
    return response.choices[0].message["content"]


### 2.1 Summarize Text

In [None]:
import pandas as pd

model_pricing = pd.DataFrame(columns=['model', 'price', 'time'])

In [None]:
text = f"""
The Olympic Games Tokyo 2020 reached a global broadcast audience of 3.05 billion people, according to independent research conducted on behalf of the International Olympic Committee (IOC). Official coverage on Olympic broadcast partners\' digital platforms alone generated 28 billion video views in total – representing a 139 per cent increase compared with the Olympic Games Rio 2016 and underlining the changing media landscape and Tokyo 2020\'s designation as the first streaming Games and the most watched Olympic Games ever on digital platforms.Sony and Panasonic partnered with NHK to develop broadcasting standards for 8K resolution television, with a goal to release 8K television sets in time for the 2020 Summer Olympics. In early 2019, Italian broadcaster RAI announced its intention to deploy 8K broadcasting for the Games. NHK broadcast the opening and closing ceremonies, and coverage of selected events in 8K. Telecom company NTT Docomo signed a deal with Finland\'s Nokia to provide 5G-ready baseband networks in Japan in time for the Games.The Tokyo Olympics were broadcast in the United States by NBCUniversal networks, as part of a US$4.38 billion agreement that began at the 2014 Winter Olympics in Sochi. The United States Olympic & Paralympic Committee asserted that a "right of abatement" clause in the contract was triggered by the delay of the Games to 2021, requiring the IOC to "negotiate in good faith an equitable reduction in the applicable broadcast rights payments" by NBC, which remains one of IOC\'s biggest revenue streams. According to NBCUniversal CEO Jeff Shell, the Tokyo games could be the most profitable Olympics in NBC\'s history. The Tokyo games were NBC\'s first Olympics broadcast under current president Susan Rosner Rovner.In Europe, this was the first Summer Olympics under the IOC\'s exclusive pan-European rights deal with Eurosport, which began at the 2018 Winter Olympics and is contracted to run through 2024. The rights for the 2020 Summer Olympics covered almost all of Europe; a pre-existing deal with a marketer excludes Russia. Eurosport planned to sub-license coverage to free-to-air networks in each territory, and other channels owned by Discovery, Inc. subsidiaries. In the United Kingdom, these were set to be the last Games with rights owned primarily by the BBC, although as a condition of a sub-licensing agreement due to carry into the 2022 and 2024 Games, Eurosport holds exclusive pay television rights. In France, these were the last Games whose rights are primarily owned by France Télévisions. Eurosport debuted as pay television rightsholder, after Canal+ elected to sell its pay television rights as a cost-saving measure.In Canada, the 2020 Games were shown on CBC/Radio-Canada platforms, Sportsnet, TSN and TLN. In Australia, they were aired by Seven Network. In the Indian subcontinent, they were aired by Sony Pictures Networks India (SPN).
"""
prompt = f"""
Summarize the text delimited by triple backticks into a single sentence.
```{text}```
"""

davinci_response, davinci_price, davinci_time = get_completion(prompt, model=davinci_model)
curie_response, curie_price, curie_time = get_completion(prompt, model=curie_model)
babbage_response, babbage_price, babbage_time = get_completion(prompt, model=babbage_model)
ada_response, ada_price, ada_time = get_completion(prompt, model=ada_model)
print(f"Davinci Response: {davinci_response}\n")
print(f"Curie Response: {curie_response}\n")
print(f"Babbage Response: {babbage_response}\n")
print(f"Ada Response: {ada_response}\n")

new_rows = pd.DataFrame([{'model': 'davinci', 'price': davinci_price, 'time': davinci_time},
                                       {'model': 'curie', 'price': curie_price, 'time': curie_time},
                                       {'model': 'babbage', 'price': babbage_price, 'time': babbage_time},
                                       {'model': 'ada', 'price': ada_price, 'time': ada_time}])
model_pricing = pd.concat([model_pricing, new_rows], ignore_index=True)

_**Takeaway: Davinci and Curie models are more suitable for tasks like summarization. The answer is more concise and takes less time.**_

#### Student Task #1:
With tactics learned in CH1, edit the prompt to get more concise answer from the assistant. Do you find any difference in the result?

In [None]:
# edit the prompt to get more concise answer from assistant

### 2.2 Summarization for a targeted audience

In [None]:
prompt = f"""
Summarize the text delimited by triple backticks into a single sentence for 7-year-old to understand.
```{text}```
"""

davinci_response, davinci_price, davinci_time = get_completion(prompt, model=davinci_model)
curie_response, curie_price, curie_time = get_completion(prompt, model=curie_model)
babbage_response, babbage_price, babbage_time = get_completion(prompt, model=babbage_model)
ada_response, ada_price, ada_time = get_completion(prompt, model=ada_model)
print(f"Davinci Response: {davinci_response}\n")
print(f"Curie Response: {curie_response}\n")
print(f"Babbage Response: {babbage_response}\n")
print(f"Ada Response: {ada_response}\n")

new_rows = pd.DataFrame([{'model': 'davinci', 'price': davinci_price, 'time': davinci_time},
                                       {'model': 'curie', 'price': curie_price, 'time': curie_time},
                                       {'model': 'babbage', 'price': babbage_price, 'time': babbage_time},
                                       {'model': 'ada', 'price': ada_price, 'time': ada_time}])
model_pricing = pd.concat([model_pricing, new_rows], ignore_index=True)

#### Student Task #2:
Edit the prompt to summarize the text for eye-catching newspaper title. Compare different results.

In [None]:
# Edit the prompt to summarize the text for eye-catching newspaper title

### 2.3 Summarize Cause & Effect

In [None]:
prompt = f"""
Summarize the major event's cause and effect for the text delimited by triple backticks into a single sentence less than 50 words.
```{text}```
"""

davinci_response, davinci_price, davinci_time = get_completion(prompt, model=davinci_model)
curie_response, curie_price, curie_time = get_completion(prompt, model=curie_model)
babbage_response, babbage_price, babbage_time = get_completion(prompt, model=babbage_model)
ada_response, ada_price, ada_time = get_completion(prompt, model=ada_model)
print(f"Davinci Response: {davinci_response}\n")
print(f"Curie Response: {curie_response}\n")
print(f"Babbage Response: {babbage_response}\n")
print(f"Ada Response: {ada_response}\n")

new_rows = pd.DataFrame([{'model': 'davinci', 'price': davinci_price, 'time': davinci_time},
                                       {'model': 'curie', 'price': curie_price, 'time': curie_time},
                                       {'model': 'babbage', 'price': babbage_price, 'time': babbage_time},
                                       {'model': 'ada', 'price': ada_price, 'time': ada_time}])
model_pricing = pd.concat([model_pricing, new_rows], ignore_index=True)

#### Student Task #3: Model Comparison
Use the model comparison chart to briefly summarize your findings after comparing different model output & time taken. eg. Davinci: Performance (+++), time (+). You may also leverage other python packages to visualize your findings.

|Model| Performance  |Time|
|---|---|---|
|Davinci|||
|Curie|||
|Babbage|||
|Ada|||


 #### Student Task #4: Text Classification
 Edit the prompt to make the models generate key topic categories for the text. Compare different model performance.

In [None]:
# Edit the prompt to make the models generate key topic categories for the text

#### Student Task #5:
Edit the prompt to make the models generate more precise results. Compare different model performance.

In [None]:
# Edit the prompt to make the models generate more precise results. 

#### Student Task #6: Model Comparison

Write code to create two bar charts comparing the **price** and **time for completion** between the models. We recommend using the `matplotlib.pyplot` library for making visualizations.

Instructions for completion:

* Utilize the `model_comparison` dataframe to calculate the averages of price and time for each model
* Produce the bar chart in a currency amount. Note that the `price` column in the `model_comparison` dataframe is in the unit of tokens. Refer to the Azure [OpenAI pricing page](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) to convert the units.

In [None]:
""" STUDENT CHALLENGE """

import matplotlib.pyplot as plt

### 1. Bar chart to compare pricing


### 2. Bar chart to compare time for completion

### 2.4 Generate Nick Names

#### Student Task #7:
Use different models to create nick names for players from examples words. Compare different model performance. (You can set the temperature value high to increase randomness and more innovative responses.)

Player description: The champion of Men's 100 metre freestyle swimming. Seed words: fast, strong, talented.Nick names: Swimming Genius, Dark Horse, 100-Metre-Freestyle Killer

Player description: The champion of Women Figure Skating. Seed words: elegant, talented, soft.

In [None]:
# your code:

#### Model Comparison
|Model| Performance  |Time|Tokens|Pricing |
|---|---|---|
|Davinci|||||
|Curie|||||
|Babbage|||||
|Ada|||||

### 2.5 Embeddings
This section focuses on how to retrieve embeddings using different embedding models, and find similarity between documents. 

#### Student Task #8:
Compare the summaries of two swimming games at the 2020 Summer Olympics using the data provided below.

See whether there are differences using different embedding models to compare.

In [None]:
from openai.embeddings_utils import get_embedding, cosine_similarity

In [None]:
import pandas as pd
game_summary = [
    "The mixed 100 metre medley relay event at the 2020 Summer Olympics was held in 2021 at the Tokyo Aquatics Centre. These Games marked the first time to feature a mixed-gender swimming event in the program. Each 4-person team features two male and two female swimmers in no particular order. The medals for the competition were presented by Kirsty Coventry IOC Executive Board Member, Zimbabwe; Olympian, 2 Gold Medals, 4 Silver Medals, 1 Bronze Medal, and the medalists bouquets were presented by Errol Clarke, FINA Bureau Member; Barbados.",
    "The men's 200 metre breaststroke event at the 2020 Summer Olympics was held from 27 to 29 July 2021 at the Tokyo Aquatics Centre. It was the event's twenty-sixth consecutive appearance, having been held at every edition since 1908."
]

game_highlight = [
    'The 2020 Summer Olympics featured the first ever mixed-gender swimming event, the 100 metre medley relay. Medals were presented by Kirsty Coventry and bouquets by Errol Clarke.',
    "The men's 200 metre breaststroke event was held at the 2020 Summer Olympics in Tokyo, making it the event's 26th consecutive appearance since 1908."
]

olympics_game_df = pd.DataFrame({"summary":game_summary, "qualification":game_highlight})

olympics_game_df.head()   

In [None]:
@timer
def get_embedding(text, model="text-embedding-ada"):
    response = openai.Embedding.create(
        input=text,
        engine=model
    )
    return response["data"][0]["embedding"]

In [None]:
model = "text-embedding-ada"


In [None]:
model = "text-similarity-ada"


In [None]:
model = "text-similarity-curie"


#### Model Comparison
|Model| Performance  |Time|
|---|---|---|
|text-embedding-ada-002|||
|text-similarity-ada-001|||
|text-similarity-curie-001|||