### Sending request to Azure OpenAI API Using Python SDK 

In the below example are **important parameters**, more parameters are added to openai.ChatCompletion.create() to generate a response. Here’s what each means:
- The **engine** parameter specifies which language model to use (“gpt-35-turbo” is the most powerful GPT-3 model at the time of writing)
- The **messages** parameter is the text prompt to generate a response to
- The **max_tokens** parameter sets the maximum number of tokens (words) that the model should generate
- The **temperature** parameter controls the level of randomness in the generated text
- The **stop** parameter can be used to specify one or more strings that should be used to indicate the end of the generated text
- If you want to generate multiple responses, you can set **n** to the number of responses you want returned
- The **strip()** method removes any leading and trailing spaces from the text.

### Setting up the environment variables for Azure OpenAI

In [1]:
import os, openai
from dotenv import load_dotenv

load_dotenv()

# Set OpenAI configuration settings values
azure_openai_api_type        = os.environ["OPENAI_API_TYPE"]
azure_azure_openeai_key      = os.environ["OPENAI_API_KEY"]
azure_azur_openeai_endpoint  = os.environ["OPENAI_API_BASE"]
azure_openai_api_version     = os.environ["OPENAI_API_VERSION"]
azure_openai_api_model       = os.environ["OPENAI_API_MODEL"]

# Temperature & Tokens
azure_openai_api_temperature = 0.7
azure_openai_api_max_tokens  = 125

In [2]:
user_input = f"""
What is powerbi?
"""

messages=[
    {
        "role": "user", "content": user_input
    }
]

In [3]:
openai.api_type = azure_openai_api_type
openai.api_base = azure_azur_openeai_endpoint
openai.api_version = azure_openai_api_version
openai.api_key = azure_azure_openeai_key
# Send request to Azure OpenAI model
print("Sending request for summary to Azure OpenAI endpoint...\n\n")
response = openai.ChatCompletion.create(
    engine=azure_openai_api_model,
    temperature = azure_openai_api_temperature,
    max_tokens  = azure_openai_api_max_tokens,
    messages    = messages
)
output = []
output = response
print(output)

Sending request for summary to Azure OpenAI endpoint...


{
  "id": "chatcmpl-8JgdEQkpGwvMNLAgxhqX3XVBQYuYc",
  "object": "chat.completion",
  "created": 1699702800,
  "model": "gpt-35-turbo",
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Power BI is a business analytics tool developed by Microsoft. It provides interactive visualizations and business intelligence capabilities with an interface that is easy to use for end-users to cre

In [4]:
print(response.usage)

{
  "prompt_tokens": 13,
  "completion_tokens": 114,
  "total_tokens": 127
}


### Extracting number of tokens from the text (prompts)
- Manual or one time excercise
    - [Tokenizer](https://platform.openai.com/tokenizer)
- Programatically handle number of tokens
    - https://github.com/openai/tiktoken

In [5]:
!pip install tiktoken




[notice] A new release of pip is available: 23.1.2 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [14]:
import tiktoken

# Ref: https://learn.microsoft.com/en-us/answers/questions/1193969/how-to-integrate-tiktoken-library-with-azure-opena
def num_tokens_from_messages(messages, model):
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens

In [15]:
num_tokens_from_messages(messages, model=azure_openai_api_model)

13

### [Here are some rules of thumb for understanding tokens in terms of lengths:](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)
- 1 token ~= 4 chars in English
- 1 token ~= ¾ words
- 100 tokens ~= 75 words
- 1-2 sentence ~= 30 tokens
- 1 paragraph ~= 100 tokens
- 1,500 words ~= 2048 tokens




### To manage tokens effectively in OpenAI, here are some suggestions:

- [Understand your token usage: Knowing how many tokens your prompts and completions typically use can help you manage your usage more effectively. You can use OpenAI’s interactive Tokenizer tool to calculate the number of tokens and see how text is broken into tokens.](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)
- [Use shorter prompts: If you’re hitting token limits, consider using shorter prompts.](https://platform.openai.com/docs/guides/production-best-practices/improving-latencies)
- [Break text into smaller pieces: If your text is too long to fit within the token limit, you might need to break it down into smaller pieces.](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)
- [Cache common queries: If you find that you’re frequently processing the same queries, consider caching these so they don’t need to be processed repeatedly.](https://platform.openai.com/docs/guides/production-best-practices/api-keys)
- Set a monthly budget: You can set a monthly budget in your billing settings, after which OpenAI will stop serving your requests3.[azure openai](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/), [openai](https://openai.com/pricing)
- Remember, the number of tokens used in an API call affects the cost, as usage is priced by token1. So, effective token management can also help control costs.

### Extra
- [Summary of the tokenizers](https://huggingface.co/docs/transformers/tokenizer_summary)
- [Subword Tokenization - Handling Misspellings and Multilingual Data](https://www.thoughtvector.io/blog/subword-tokenization/)
- [Subword tokenizers](https://www.tensorflow.org/text/guide/subwords_tokenizer)
- **Tokenizers words cons**
    - Big vocabularies can be complicated and even can error out such as out-of-vocabulary words
- **Tokenizers Characters cons**
    - Loss of context within words and much longer sequences for a given input
- **Tokenizers Sub-words cons**
    - "Smart" vocabulary built from characters which co-occur frequently
    - more robust to novel words