# L1 Language Models, the Chat Format and Tokens

I'd like to share with you 
an overview of how LLMs, Large Language Models, work. 
We'll go into how they are trained, as well as details like
- the tokenizer and 
- how that can affect the output of when you prompt an LLM. 

And we'll also take a look at the chat format for LLMs, which 
is a way of specifying both system as 
well as user messages and understand what you 
can do with that capability.

![LLM](immagini\01_LLM.png)

The main tool used to train an LLM is actually supervised learning. 

In supervised learning, a computer learns an input-output 
or X or Y mapping using labeled training data.  So 
for example, if you're using supervised learning to 
learn to classify the sentiment of restaurant reviews, you might 
collect a training set. (x --> y)

The process for supervised learning is 
typically to get labeled data and then train 
AI model on data. 
And after training, you can then deploy 
and call the model.

It turns out that supervised learning is a 
core building block for training Large Language Models. 
Specifically, a Large Language Model can be built by using 
supervised learning to repeatedly predict the next word. 

Let's say that in your training sets of a lot of text data, you 
have to sentence, "My favorite food is a bagel 
with cream cheese and lox.". 
Then this sentence is turned into a sequence of training examples, 
where given a sentence fragment, "My favorite food is a", 
if you want to predict the next word in this case was "bagel", 
or given the sentence fragment or sentence prefix, 
"My favorite food is a bagel", the next word in this case would be 
"with", and so on. 

And given a large training set of hundreds of 
billions or sometimes even more words, you can then create 
a massive training set where you can start 
off with part of a sentence or part of a piece of 
text and repeatedly ask the language model to 
learn to predict what is the next word. 

![supervised_learning_llm](immagini\02_supervised_learning_llm.png)

So today there are broadly two major types 
of Large Language Models. The first is a "Base LLM" 
and the second, which is what is increasingly used, 
is the "Instruction Tuned LLM". 

#### - BASE LLM
So the base LLM repeatedly predicts the next 
word based on text training data.
#### - INSTRUCTION TUNED LLM
An Instruction Tuned LLM instead tries to follow 
instructions.

<span style="color:blue">How do you go from a Base LLM to an Instruction Tuned LLM?</span> 

This is what the process of training an Instruction Tuned LLM, 
like ChatGPT, looks like. 

You first train a Base LLM on a lot of data, 
so hundreds of billions of words, maybe even more. And this is a 
process that can take months on a large 
supercomputing system. 
After you've trained the Base LLM, you would then further train 
the model by fine-tuning it on a smaller set of examples, where the 
output follows an input instruction.

And so, for example, you may 
have **contractors** help you write a lot of examples of an instruction, 
and then a good response to an instruction. 
And that creates a training set to carry 
out this additional fine-tuning. So that learns to predict what is 
the next word if it's trying to follow an instruction.

After that, 
to improve the **quality** of the LLM's output, a 
common process now is to obtain human ratings of the quality of many different 
LLM outputs on criteria, such as whether 
the output is **helpful, honest, and harmless.**

And you can then further tune the LLM 
to increase the probability of its **generating the 
more highly rated outputs**. And the most common technique 
to do this is RLHF, which stands for Reinforcement Learning from 
Human Feedback.

And whereas training the 
Base LLM can take months, the process of going 
from the Base LLM to the Instruction Tuned 
LLM can be done in maybe days on a much more modest size data sets, 
and much more modest size computational resources. 

![base_instruction_tuned_llm](immagini\03_base_instruction_tuned_llm.png)

## Setup
#### Load the API key and relevant Python libaries.
In this course, we've provided some code that loads the OpenAI API key for you.

In [None]:
import os
import openai
import tiktoken
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

#### helper function
This may look familiar if you took the earlier course "ChatGPT Prompt Engineering for Developers" Course

In [None]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message["content"]

# helper function to get a completion given a prompt

## Prompt the model and get a completion

In [None]:
response = get_completion("What is the capital of France?")

In [None]:
print(response)

*OUTPUT*

The capital of France is Paris.

## Tokens

In [None]:
response = get_completion("Take the letters in lollipop \
and reverse them")
print(response)

*OUTPUT*

The reversed letters of "lollipop" are "pillipol".

"lollipop" in reverse should be "popillol"

**Take the letters in ... and reverse them. This seems like an easy task.
So why is ChatGPT unable to do what seems like a relatively simple task? 
It turns out that there's one more important detail for how a Large Language Model works, which is it doesn't actually epeatedly predict the next word, it instead repeatedly predicts the NEXT TOKEN.**

What an LLM actually does is it will take a sequence of characters and group the 
characters together to form tokens that 
comprise commonly occurring sequences of characters.

And because ChatGPT isn't seeing the individual letters, is 
instead seeing these three tokens, it's more difficult for it to 
correctly print out these letters in reverse order. 
 
So here's a trick you can use to fix this. 
If I were to add dashes to the word dashes, 
between these letters, and spaces would work too, 
or other things would work too, and tell it to take the letters and 
lollipop and reverse them, then it actually does 
a much better job, this L-O-L-L-I-P-O-P. 

In [None]:
response = get_completion("""Take the letters in \
l-o-l-l-i-p-o-p and reverse them""")

In [None]:
response

*OUTPUT*

'p-o-p-i-l-l-o-l'

![Tokens](immagini\04_token.png)

For the English language, one token roughly on average, 
corresponds to about four characters or about three 
quarters of a word. 
And so different Large Language Models 
will often have different limits on the number of input plus output 
tokens it can accept.

## - INPUT = the message is called "CONTEXT"
## - OUTPUT = the message is called "COMPLETION"

The model GPT 3.5 Turbo, for example, the 
most commonly used chat GPT model, has a limit of roughly **4,000 tokens 
in the INPUT + OUTPUT.**

## Helper function (chat format)
Here's the helper function we'll use in this course.

Another powerful way to use an LLM API. 
Which involves specifying separate:
- system, 
- user and
- assistant messages. 

In [None]:
def get_completion_from_messages(messages, 
                                 model="gpt-3.5-turbo", 
                                 temperature=0, 
                                 max_tokens=500):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature,
        # this is the degree of randomness of the model's output
        max_tokens=max_tokens, # the maximum number of tokens the model can output 
    )
    return response.choices[0].message["content"]

In [None]:
messages =  [  
{'role':'system',
 'content':"""You are an assistant who\
 responds in the style of Dr Seuss."""},    
{'role':'user', 
 'content':"""write me a very short poem\
 about a happy carrot"""},  
] 
response = get_completion_from_messages(messages, temperature=1)
print(response)

*OUTPUT*

In the garden it sat, oh so bright,
A carrot that brought pure delight.
Its vibrant orange, oh so cheery,
Bringing joy to all, oh how merry!

With a crisp crunch, a happy sound,
This carrot's deliciousness did astound.
From soil to table, a journey complete,
A happy carrot, oh so sweet!

It brought smiles to children, big and small,
As they munched and chomped, having a ball.
A happy carrot, full of glee,
Bringing happiness, for all to see!

# CHAT FORMAT:

### - The system message specifies the overall tone of what you want the Large Language Model to do.
### - The user message is a specific instruction that you wanted to carry out given this higher level behavior that was specified in the system message. 

#### It will then output an appropriate response following what you asked for in the USER MESSAGE and consistent with the overall behavior set in the SYSTEM MESSAGE. 

![Token](immagini\05_message_chatbot.png)

If you want to set the tone, to tell it to have 
a one sentence long output, then in the system message, 
I can say all your responses must be one sentence long. 
 
And when I execute this, it outputs a single sentence.

In [None]:
# length
messages =  [  
{'role':'system',
 'content':'All your responses must be \
one sentence long.'},    
{'role':'user',
 'content':'write me a story about a happy carrot'},  
] 
response = get_completion_from_messages(messages, temperature =1)
print(response)

*OUTPUT*

Once upon a time, there was a plump, cheerful carrot named Carl who lived in a vibrant vegetable garden.

In [None]:
# combined
messages =  [  
{'role':'system',
 'content':"""You are an assistant who \
responds in the style of Dr Seuss. \
All your responses must be one sentence long."""},    
{'role':'user',
 'content':"""write me a story about a happy carrot"""},
] 
response = get_completion_from_messages(messages, 
                                        temperature =1)
print(response)

*OUTPUT*

Once upon a time, there was a carrot named Larry who was always merry.

If you are using an 
LLM and you want to know how many tokens are you using, 
here's a helper function that is a little bit 
more sophisticated in that it gets a response 
from the OpenAI API endpoint and then it 
uses other values in the response to tell 
you how many prompt tokens, completion tokens, and 
total tokens were used in your API call.

In [None]:
def get_completion_and_token_count(messages, 
                                   model="gpt-3.5-turbo", 
                                   temperature=0, 
                                   max_tokens=500):
    
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
        max_tokens=max_tokens,
    )
    
    content = response.choices[0].message["content"]
    
    token_dict = {
'prompt_tokens':response['usage']['prompt_tokens'],
'completion_tokens':response['usage']['completion_tokens'],
'total_tokens':response['usage']['total_tokens'],
    }

    return content, token_dict

In [None]:
messages = [
{'role':'system', 
 'content':"""You are an assistant who responds\
 in the style of Dr Seuss."""},    
{'role':'user',
 'content':"""write me a very short poem \ 
 about a happy carrot"""},  
] 
response, token_dict = get_completion_and_token_count(messages)

In [None]:
print(response)

*OUTPUT*

Oh, the happy carrot, so bright and orange,
Grown in the garden, a joyful forage.
With a smile so wide, from top to bottom,
It brings happiness, oh how it blossoms!

In the soil it grew, with love and care,
Nourished by sunshine, fresh air to share.
Its leaves so green, reaching up so high,
A happy carrot, oh my, oh my!

With a crunch and a munch, it's oh so tasty,
Filled with vitamins, oh so hasty.
A happy carrot, a delight to eat,
Bringing joy and health, oh what a treat!

So let's celebrate this veggie so grand,
With a happy carrot in each hand.
For in its presence, we surely find,
A taste of happiness, one of a kind!

In [None]:
print(token_dict)

*OUTPUT*
```json
{'prompt_tokens': 37, 'completion_tokens': 164, 'total_tokens': 201}

```

#### Notes on using the OpenAI API outside of this classroom

To install the OpenAI Python library:
```
!pip install openai
```

The library needs to be configured with your account's secret key, which is available on the [website](https://platform.openai.com/account/api-keys). 

You can either set it as the `OPENAI_API_KEY` environment variable before using the library:
 ```
 !export OPENAI_API_KEY='sk-...'
 ```

Or, set `openai.api_key` to its value:

```
import openai
openai.api_key = "sk-..."
```

Now, I want to share with you one more tip for 
how to use a Large Language Model. 
Commonly the OpenAI API requires using an API key that's 
tied to either a free or a paid account. 
And so many developers will write the API 
key in plain text like this into their 
Jupyter notebook. 
And this is a less secure way of using API keys that 
I would not recommend you use, because it's just too easy to 
share this notebook with someone else or check 
this into GitHub or something and thus end 
up leaking your API key to someone else. 
In contrast, what you saw me do in the Jupyter 
notebook was this piece of code, where I use a library "dotenv", 
and then run this command "load_dotenv", "find_dotenv" to read 
a local file which is called ".env" that contains my secret key. 
And so with this code snippet, I have locally stored a file called 
".env" that contains my API key. 
And this loads it into the operating systems environmental 
variable. 
And then "os.getenv, ('OPENAI_API_KEY')" stores it into this variable. And in this 
whole process, I don't ever have to enter the API key in plain text and 
unencrypted plain text into my Jupyter notebook. 
 
So this is a relatively more secure and a better 
way to access the API key. And in fact, this is a general 
method for storing different API keys from lots 
of different online services that you might want to use and call 
from your Jupyter notebook. 

![Token](immagini\06_api.png)

#### A note about the backslash
- In the course, we are using a backslash `\` to make the text fit on the screen without inserting newline '\n' characters.
- GPT-3 isn't really affected whether you insert newline characters or not.  But when working with LLMs in general, you may consider whether newline characters in your prompt may affect the model's performance.