# L1 Language Models, the Chat Format and Tokens

In this first module we will go into some of the details of how LLMs work, we will understand how tokenizers operate, and how the chat API works.

## Text Generation Process

LLMs are based on supervised learning, so the typical workflow is
Get labeled data -> train model on data -> deploy and call model.
An LLM is trained by repeatedly predicting the next word. Today there are two main types of LLMs: Base LLMs and Instruction Tuned LLMs. Base LLMs predict the next word based on the training data. However consider the question

> what is the capital of France?

it might be that the training set contains other questions (say from a quiz set), and would complete with `What is France largest city?` and so on, which is not what we want. To go from a Base LLM to an Instruction Tuned LLM you follow this process:

1. You train a Base LLM on a lot of data. This process can take months even on a large supercomputer.
2. Fine-tune the model on examples where the output follows an input instruction. This is the case where many people write instruction-answer pairs and create a new training (fine-tuning) set.
3. Obtain human ratings of the quality of different LLM outputs on criteria such as whether it is helpful, honest, and harmless.
4. Further tune the LLM to increase the probability of generating the more highly rated outputs. Typically this is done via Reinforcement Learning from Human Feedback (RLHF).

The process of training a Base LLM can take months on large supercomputers. The process of going from a Base LLM to an INstruction Tuned LLM can take days on much smaller hardware and dataset.

## Setup
#### Load the API key and relevant Python libaries.
In this course, we've provided some code that loads the OpenAI API key for you.

In [None]:
import os
import openai
import tiktoken
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

#### helper function
This may look familiar if you took the earlier course "ChatGPT Prompt Engineering for Developers" Course. 

Throughout this course, we will use OpenAI's `gpt-3.5-turbo` model and the [chat completions endpoint](https://platform.openai.com/docs/guides/chat).

This helper function will make it easier to use prompts and look at the generated outputs. 

**Note**: In June 2023, OpenAI updated gpt-3.5-turbo. The results you see in the notebook may be slightly different than those in the video. Some of the prompts have also been slightly modified to produce the desired results.

In [None]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output 
    )
    return response.choices[0].message["content"]

**Note**: This and all other lab notebooks of this course use OpenAI library version `0.27.0`. 

In order to use the OpenAI library version `1.0.0`, here is the code that you would use instead for the get_completion function: 

```python
client = openai.OpenAI()

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content
```

## Prompt the model and get a completion

In [None]:
response = get_completion("What is the capital of France?")

In [None]:
print(response)

## Tokens

If you ask GPT to reverse "lollipop", it fails in a bizarre way. The reason for this is tokenization. Tokenizers break less commonly occurring words into sub-words. One trick to get the actual reverse string is to separate each letter with dashes. For the English language a token is on average 4 characters or 3/4 of a word.

Different models have different limits on the number of tokens in the input **context** plus the output **completion**. GPT 3.5, for example, has ~4000 tokens.

In [None]:
response = get_completion("Take the letters in lollipop \
and reverse them")
print(response)

"lollipop" in reverse should be "popillol"

In [None]:
response = get_completion("""Take the letters in \
l-o-l-l-i-p-o-p and reverse them""")

In [None]:
response

## Helper function (chat format)

### System, User, and Assistant Messages

An important aspect of LLMs is the existence of different roles. More precisely there is a system, a user, and a assistant role.

- The **system** sets the behavior of the assistant.
- The **assistant** is the chat model.
- The **user** is who receives the output from the assistant (i.e., you).

The helper function below receives multiple messages, one for each of the above profiles.

Here's the helper function we'll use in this course.

In [None]:
def get_completion_from_messages(messages, 
                                 model="gpt-3.5-turbo", 
                                 temperature=0, 
                                 max_tokens=500):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, # this is the degree of randomness of the model's output
        max_tokens=max_tokens, # the maximum number of tokens the model can ouptut 
    )
    return response.choices[0].message["content"]

In the example below we only consider the `system` and the `user` roles, but one can also pass messages for the `assistant`, which can be used to "remind" what the assistant previously answered. This is useful to continue conversations.

In [None]:
messages =  [  
{'role':'system', 
 'content':"""You are an assistant who\
 responds in the style of Dr Seuss."""},    
{'role':'user', 
 'content':"""write me a very short poem\
 about a happy carrot"""},  
] 
response = get_completion_from_messages(messages, temperature=1)
print(response)

In [None]:
# length
messages =  [  
{'role':'system',
 'content':'All your responses must be \
one sentence long.'},    
{'role':'user',
 'content':'write me a story about a happy carrot'},  
] 
response = get_completion_from_messages(messages, temperature =1)
print(response)

In [None]:
# combined
messages =  [  
{'role':'system',
 'content':"""You are an assistant who \
responds in the style of Dr Seuss. \
All your responses must be one sentence long."""},    
{'role':'user',
 'content':"""write me a story about a happy carrot"""},
] 
response = get_completion_from_messages(messages, 
                                        temperature =1)
print(response)

In [None]:
def get_completion_and_token_count(messages, 
                                   model="gpt-3.5-turbo", 
                                   temperature=0, 
                                   max_tokens=500):
    
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
        max_tokens=max_tokens,
    )
    
    content = response.choices[0].message["content"]
    
    token_dict = {
'prompt_tokens':response['usage']['prompt_tokens'],
'completion_tokens':response['usage']['completion_tokens'],
'total_tokens':response['usage']['total_tokens'],
    }

    return content, token_dict

In [None]:
messages = [
{'role':'system', 
 'content':"""You are an assistant who responds\
 in the style of Dr Seuss."""},    
{'role':'user',
 'content':"""write me a very short poem \ 
 about a happy carrot"""},  
] 
response, token_dict = get_completion_and_token_count(messages)

In [None]:
print(response)

In [None]:
print(token_dict)

#### Notes on using the OpenAI API outside of this classroom

To install the OpenAI Python library:
```
!pip install openai
```

The library needs to be configured with your account's secret key, which is available on the [website](https://platform.openai.com/account/api-keys). 

You can either set it as the `OPENAI_API_KEY` environment variable before using the library:
 ```
 !export OPENAI_API_KEY='sk-...'
 ```

Or, set `openai.api_key` to its value:

```
import openai
openai.api_key = "sk-..."
```

This second solution, however, is not recommended, and a much better way is to rely on `dotenv`. This is why we have written:

```python
from dotenv import load_dotenv, find_dotenv
- = load_dotenv(find_dotenv())
open.api_key = os.environ['OPENAI_API_KEY']
```

Note that the credentials are supposed to be stored in the `.env` file, so don't use this name for Python virtual environments.
#### A note about the backslash
- In the course, we are using a backslash `\` to make the text fit on the screen without inserting newline '\n' characters.
- GPT-3 isn't really affected whether you insert newline characters or not.  But when working with LLMs in general, you may consider whether newline characters in your prompt may affect the model's performance.

Prompting is revolutionizing AI application development, but this works well with unstructured data, and to a more limited extend with computer vision. It does not work with structured data.