# Lab 01 - OpenAI Quickstart

# Overview  
"Large language models are functions that map text to text. Given an input string of text, a large language model tries to predict the text that will come next"(1). This "quickstart" notebook will introduce users to high-level LLM concepts, core package requirements for getting started with AML, an soft introduction to prompt design, and severa short examples of different use cases.  

For more quickstart examples please refer to the official Azure Open AI Quickstart Documentation https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quickstart?pivots=programming-language-studio

## Table of Contents  

[Overview](#overview)  
[How to use OpenAI Service](#how-to-use-openai-service)  
[1. Creating your OpenAI Service](#1.-creating-your-openai-service)  
[2. Installation](#2.-installation)    
[3. Credentials](#3.-credentials)  

[Use Cases](#use-cases)    
[1. Summarize Text](#1.-summarize-text)  
[2. Classify Text](#2.-classify-text)  
[3. Generate New Product Names](#3.-generate-new-product-names)  
[4. Fine Tune a Classifier](#4.fine-tune-a-classifier)  
[5. Embeddings!]((#5.-embeddings!))

[References](#references)

### Getting started with Azure OpenAI Service

New customers will need to [apply for access](https://aka.ms/oai/access) to Azure OpenAI Service.  
After approval is complete, customers can log into the Azure portal, create an Azure OpenAI Service resource, and start experimenting with models via the studio  

[Great resource for getting started quickly](https://techcommunity.microsoft.com/t5/educator-developer-blog/azure-openai-is-now-generally-available/ba-p/3719177 )


### Build your first prompt  
This short exercise will provide a basic introduction for submitting prompts to an OpenAI model for a simple task "summarization".  

![](images/generative-AI-models-reduced.jpg)  


**Steps**:  
1. Install OpenAI library in your python environment  
2. Load standard helper libraries and set your typical OpenAI security credentials for the OpenAI Service that you've created  
3. Choose a model for your task  
4. Create a simple prompt for the model  
5. Submit your request to the model API!

### 1. Install OpenAI

In [None]:
pip install openai

### 2. Import helper libraries and instantiate credentials

In [None]:
import os
import openai

openai.api_type = "azure"
openai.api_version = "2022-12-01"

API_KEY = "<USE_YOUR_KEY>"
assert API_KEY, "ERROR: Azure OpenAI Key is missing"
openai.api_key = API_KEY

RESOURCE_ENDPOINT = "https://openai-bootcamp-mma.openai.azure.com/"
assert RESOURCE_ENDPOINT, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in RESOURCE_ENDPOINT.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"
openai.api_base = RESOURCE_ENDPOINT

### 3. Finding the right model  
The GPT-3 models can understand and generate natural language. The service offers four model capabilities, each with different levels of power and speed suitable for different tasks. Davinci is the most capable model, while Ada is the fastest. The following list represents the latest versions of GPT-3 models, ordered by increasing capability(1).  

* text-ada-001
* text-babbage-001
* text-curie-001
* text-davinci-003  

[Azure OpenAI models](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models])  
![](images/a-b-c-d-models-reduced.jpg)  

| | Similarity embedding | Text search embedding | Code Search Embedding |
| --- | --- | --- |
| |These models are good at capturing **semantic similarity** between two or more pieces of text. | These models help measure whether long documents are relevant to a short search query. There are two input types supported by this family: **doc**, for embedding the documents to be retrieved, and **query**, for embedding the search query. |Similar to text search embedding models, there are two input types supported by this family: **code**, for embedding code snippets to be retrieved, and text, for embedding natural language search queries. | 
|**Use cases** | Clustering, regression, anomaly detection, visualization | Search, context relevance, information retrieval | Code search and relevance |
|**Models** |text-similarity-ada-001 <br> text-similarity-babbage-001 <br> text-similarity-curie-001 <br> text-similarity-davinci-001 | text-search-ada-doc-001  <br> text-search-ada-query-001  <br> text-search-babbage-doc-001  <br> text-search-babbage-query-001  <br> text-search-curie-doc-001  <br>text-search-curie-query-001  <br> text-search-davinci-doc-001  <br> text-search-davinci-query-001 | code-search-ada-code-001 <br> code-search-ada-text-001 <br> code-search-babbage-code-001 <br> code-search-babbage-text-001 | 




### Model Taxonomy  
Let's choose a general text GPT-3 model, using the second most powerful model (Curie)

**Model taxonomy**: {family} - {capability} - {input-type} - {identifier}  

{family}     --> text   (general text GPT-3 model)  
{capability} --> curie  (curie is second most powerful in ada-babbage-curie-davinci family)  
{input-type} --> n/a    (only specified for search models)  
{identifier} --> 001    (version 001)  

model = "text-curie-001"

In [None]:
# Select the General Purpose curie model for text
model = "text-davinci-003"

## 4. Prompt Design  

"The magic of large language models is that by being trained to minimize this prediction error over vast quantities of text, the models end up learning concepts useful for these predictions. For example, they learn concepts like"(1):

* how to spell
* how grammar works
* how to paraphrase
* how to answer questions
* how to hold a conversation
* how to write in many languages
* how to code
* etc.

#### How to control a large language model  
"Of all the inputs to a large language model, by far the most influential is the text prompt(1).

Large language models can be prompted to produce output in a few ways:

Instruction: Tell the model what you want
Completion: Induce the model to complete the beginning of what you want
Demonstration: Show the model what you want, with either:
A few examples in the prompt
Many hundreds or thousands of examples in a fine-tuning training dataset"



#### There are three basic guidelines to creating prompts:

**Show and tell**. Make it clear what you want either through instructions, examples, or a combination of the two. If you want the model to rank a list of items in alphabetical order or to classify a paragraph by sentiment, show it that's what you want.

**Provide quality data**. If you're trying to build a classifier or get the model to follow a pattern, make sure that there are enough examples. Be sure to proofread your examples — the model is usually smart enough to see through basic spelling mistakes and give you a response, but it also might assume this is intentional and it can affect the response.

**Check your settings.** The temperature and top_p settings control how deterministic the model is in generating a response. If you're asking it for a response where there's only one right answer, then you'd want to set these lower. If you're looking for more diverse responses, then you might want to set them higher. The number one mistake people use with these settings is assuming that they're "cleverness" or "creativity" controls.


Source: https://github.com/Azure/OpenAI/blob/main/How%20to/Completions.md

![](images/prompt_design.jpg)
image is creating your first text prompt!

### 5. Submit!

In [None]:
# Create your first prompt
text_prompt = "Should oxford commas always be used?"

In [None]:
# Simple API Call
openai.Completion.create(
    engine=model,
    prompt=text_prompt,
    max_tokens=60
)

### Repeat the same call, how do the results compare?

In [None]:
openai.Completion.create(
    engine=model,
    prompt=text_prompt,
    max_tokens=60
)

# For More Help  
[OpenAI Commercialization Team](AzureOpenAITeam@microsoft.com)  
AI Specialized CSAs [aka.ms/airangers](aka.ms/airangers)

# Contributors
* Brandon Cowen
* Ashish Chauhun
* Louis Li  
