# Understanding LLM APIs

We will explore OpenAI models API to generate text.

<!--- @wandbcode{llmapps-intro} -->

### Setup

<details>
    <summary>What does -qq mean in pip install?</summary>

The `-qq` flag in pip install is used to minimize the output from the installation process. When you use `pip install` to install a Python package, it normally outputs a lot of information to the console, such as the names of the packages it's installing, their versions, and so on. If you don't want to see all this output, you can use the `-qq` option.

Here's what each `q` means:

- `-q`: Means "quiet". Using `-q` will provide less console output than the default. Warnings and errors will still be shown.
- `-qq`: Means "quieter". Using `-qq` will provide even less console output. Only errors will be shown. 

So if you want to minimize the output from pip as much as possible, you can use `pip install -qq`. This can be useful in scripts and other automated contexts where you don't want a lot of console output.
</details>

<details>
    <summary>Give me a brief summary of the tiktoken Python package.</summary>
    <a href="https://anaconda.org/conda-forge/tiktoken">The tiktoken Python package</a> is a fast Byte Pair Encoding (BPE) tokenizer that is designed for use with OpenAI's models. BPE is a form of subword tokenization that is commonly used in natural language processing. The tiktoken package allows you to tokenize text in a way that is compatible with OpenAI's models, which can be useful when preparing text data for these models.
</details>

<details>
    <summary>Tell me more about Byte Pair Encoding.</summary>

Byte Pair Encoding (BPE) is a type of subword tokenization method that is often used in natural language processing (NLP). It's a way of breaking down words into smaller units, which can help models handle words that aren't in their training data, among other benefits.

Here's a high-level overview of how BPE works:

Initialization: Start with a symbol vocabulary that contains each character in the alphabet (or byte pair if working with bytes) as a separate symbol. This vocabulary will be grown to include common combinations of symbols (which can be multi-character strings).

Pair Statistics Calculation: On your training corpus, calculate the statistics of symbol pairs (how frequently each pair of symbols appears together).

New Symbol Addition: Find the most frequently occurring pair of symbols, and add that pair as a new symbol to your vocabulary.

Iteration: Repeat the pair statistics calculation and new symbol addition steps until you've reached a predefined vocabulary size or until a certain number of iterations have passed.

The result of this process is that common character sequences (which often correspond to whole words or common parts of words) end up as single symbols in your vocabulary. This allows the model to handle a wide variety of words, including words that weren't in its training data (since it can break those words down into known subwords). It also gives the model a way to handle languages with large vocabularies or many compound words, like German.

BPE has been used in several state-of-the-art models in NLP, such as GPT-2 and GPT-3 from OpenAI, and BERT from Google.

For an example, consider the word 'lowly'. If 'low' and 'ly' are common tokens in the training corpus, BPE might treat 'low' and 'ly' as individual tokens, and 'lowly' would be tokenized as ['low', 'ly'].

Please note that when using BPE, it's important to apply the same tokenization process to your input data when making predictions with the model, otherwise, the model might not be able to correctly interpret the input.
</details>

In [1]:
import os
import openai
import tiktoken
import wandb
from dotenv import load_dotenv
from getpass import getpass
from pprint import pprint
from wandb.integration.openai import autolog

_ = load_dotenv()
os.environ["WANDB_NOTEBOOK_NAME"] = "using_apis.ipynb"

You will need an OpenAI API key to run this notebook. You can get one [here](https://platform.openai.com/account/api-keys).

In [2]:
if os.getenv("OPENAI_API_KEY") is None:
    if any(["VSCODE" in x for x in os.environ.keys()]):
        print("Please enter password in the VS Code prompt at the top of your VS Code window!")
    os.environ["OPENAI_API_KEY"] = getpass("Paste your OpenAI key from: https://platform.openai.com/account/api-keys\n")

assert os.getenv("OPENAI_API_KEY", "").startswith("sk-"), "This doesn't look like a valid OpenAI API key"
openai.api_key = os.getenv("OPENAI_API_KEY", "")
print("OpenAI API key configured")

OpenAI API key configured


Let's enable W&B autologging to track our experiments.

In [3]:
# start logging to W&B
autolog({"project":"llmapps", "job_type": "introduction"})

[34m[1mwandb[0m: Currently logged in as: [33methan-ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


# Tokenization

In [4]:
encoding = tiktoken.encoding_for_model("text-davinci-003")
enc = encoding.encode("Weights & Biases is awesome!")
print(enc)
print(encoding.decode(enc))

[1135, 2337, 1222, 8436, 1386, 318, 7427, 0]
Weights & Biases is awesome!


we can decode the tokens one by one

In [5]:
for token_id in enc:
    print(f"{token_id}\t{encoding.decode([token_id])}")

1135	We
2337	ights
1222	 &
8436	 Bi
1386	ases
318	 is
7427	 awesome
0	!


> Note how the leading tokens contain spacing.

# Sampling

Let's sample some text from the model. For this, let's create a wrapper function around the temperature parameters.
Higher temperature will result in more random samples.

In [6]:
def generate_with_temperature(temp):
    "Generate text with a given temperature, higher temperature means more randomness"
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt="Say something about Weights & Biases",
        max_tokens=50,
        temperature=temp,
    )
    return response.choices[0].text

In [7]:
for temp in [0, 0.5, 1, 1.5, 2]:
    pprint(f"TEMP: {temp}, GENERATION: {generate_with_temperature(temp)}")

('TEMP: 0, GENERATION: \n'
 '\n'
 'Weights & Biases is an amazing tool for tracking and analyzing machine '
 'learning experiments. It provides powerful visualizations and insights into '
 'model performance, enabling data scientists to quickly identify areas of '
 'improvement and optimize their models.')
('TEMP: 0.5, GENERATION: \n'
 '\n'
 'Weights & Biases is a powerful tool for tracking, analyzing, and visualizing '
 'machine learning experiments. It provides an easy-to-use dashboard for '
 'monitoring and comparing model performance, and it also offers a suite of '
 'features to help you')
('TEMP: 1, GENERATION: \n'
 '\n'
 'Weights & Biases is a powerful tool that helps data scientists and machine '
 'learning engineers track, compare, and analyze machine learning experiments. '
 'It allows users to visualize different aspects of model exploration, compare '
 'model performance across experiments, and track metrics to')
('TEMP: 1.5, GENERATION: \n'
 '\n'
 'Weights & Biases is an a

You can also use the [`top_p` parameter](https://platform.openai.com/docs/api-reference/completions/create#completions/create-top_p) to control the diversity of the generated text. This parameter controls the cumulative probability of the next token. For example, if `top_p=0.9`, the model will pick the next token from the top 90% most likely tokens. The higher the `top_p` the more likely the model will pick a token that it hasn't seen before. You should only use one of `temperature` or `top_p` at a given time.

In [8]:
def generate_with_topp(topp):
    "Generate text with a given top-p, higher top-p means more randomness"
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt="Say something about Weights & Biases",
        max_tokens=50,
        top_p=topp,
    )
    return response.choices[0].text

In [9]:
for topp in [0.01, 0.1, 0.5, 1]:
    pprint(f'TOP_P: {topp}, GENERATION: {generate_with_topp(topp)}')

('TOP_P: 0.01, GENERATION: \n'
 '\n'
 'Weights & Biases is an amazing tool for tracking and analyzing machine '
 'learning experiments. It provides powerful visualizations and insights into '
 'model performance, enabling data scientists to quickly identify areas of '
 'improvement and optimize their models.')
('TOP_P: 0.1, GENERATION: \n'
 '\n'
 'Weights & Biases is an amazing tool for tracking and analyzing machine '
 'learning experiments. It provides powerful visualizations and insights into '
 'model performance, enabling data scientists to quickly identify areas of '
 'improvement and optimize their models.')
('TOP_P: 0.5, GENERATION: \n'
 '\n'
 'Weights & Biases is an amazing tool for tracking and analyzing machine '
 'learning experiments. It provides powerful visualizations and metrics to '
 'help teams better understand their models and make data-driven decisions. It '
 'also offers collaboration features to help teams stay organized')
('TOP_P: 1, GENERATION: \n'
 '\n'
 'Weig

# Chat API

Let's switch to chat mode and see how the model responds to our messages. We have some control over the model's response by passing a `system-role`, here we can steer to model to adhere to a certain behaviour.

> We are using `gpt-3.5-turbo`, this model is faster and cheaper than `davinci-003`

In [10]:
MODEL = "gpt-3.5-turbo"
response = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say something about Weights & Biases"},
    ],
    temperature=0,
)

response

<OpenAIObject chat.completion id=chatcmpl-7VSviwcnw2XyhNEdpZmR6ARDyCIhk at 0x114b2e220> JSON: {
  "id": "chatcmpl-7VSviwcnw2XyhNEdpZmR6ARDyCIhk",
  "object": "chat.completion",
  "created": 1687733730,
  "model": "gpt-3.5-turbo-0301",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Weights & Biases is a machine learning platform that helps data scientists and machine learning engineers track and visualize their experiments. It provides tools for experiment management, hyperparameter tuning, and model visualization, making it easier to understand and improve machine learning models. Weights & Biases also offers integrations with popular machine learning frameworks like TensorFlow, PyTorch, and Keras."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 27,
    "completion_tokens": 74,
    "total_tokens": 101
  }
}

As you can see above, the response is a JSON object with relevant information about the request.

In [11]:
pprint(response.choices[0].message.content)

('Weights & Biases is a machine learning platform that helps data scientists '
 'and machine learning engineers track and visualize their experiments. It '
 'provides tools for experiment management, hyperparameter tuning, and model '
 'visualization, making it easier to understand and improve machine learning '
 'models. Weights & Biases also offers integrations with popular machine '
 'learning frameworks like TensorFlow, PyTorch, and Keras.')


In [12]:
wandb.finish()

0,1
usage/completion_tokens,▁▂▂▂▂▁▁▂▂█
usage/elapsed_time,▂▃▂▂▂▂▁█▃▅
usage/prompt_tokens,▁▁▁▁▁▁▁▁▁█
usage/total_tokens,▁▂▂▂▂▁▁▂▂█

0,1
usage/completion_tokens,74.0
usage/elapsed_time,3.17477
usage/prompt_tokens,27.0
usage/total_tokens,101.0


# References

1. [Source of this notebook](https://github.com/wandb/edu/blob/main/llm-apps-course/notebooks/01.%20Using_APIs.ipynb)