# 1- How LLMs Generate Text: Predicting the Next Token

LLMs, like the ones you will use in this course, generate text by predicting one piece at a time. These pieces are called `tokens`. The model looks at the text you have already given (the context) and tries to guess what comes next, one token at a time. This process is repeated until the model finishes its response.

Let's break down what a `token` is and how this prediction process works.

## What Is a Token?

A `token` is a small chunk of text. Depending on the language model, it can be a word, part of a word, or even just a character. LLMs do not see text as whole sentences or paragraphs. Instead, they break everything down into tokens.

For example, let's look at the phrase:

```
A bottle of water
```

Depending on the model, this might be split into tokens like:

- `A`
- `bottle`
- `of`
- `water`

Or, in some cases, `bottle` might be split into `bott` and `le` if the model uses smaller pieces. For this lesson, you can think of tokens as words or short word parts.

Tokens are important because the model predicts text one token, not one word or sentence at a time.


## How LLMs Predict the Next Token

Now, let's see how LLMs generate text step by step. The model always looks at the context — the already written tokens — and predicts what comes next.

Let's start with a simple prompt:

```
A bottle of
```

At this point, the model has three tokens: `A`, `bottle`, and `of`. It now needs to predict the next token. The model looks at the context (`A bottle of`) and tries to guess what is most likely to come next.

Common next tokens might be:

- `water`
- `wine`
- `milk`

The model chooses the most likely one based on its training. If you continue, the process repeats. For example, if the model predicts `water`, the new context is:

```
A bottle of water
```

Now, the model can predict what comes after `water`, such as a period or another word.

This process — predicting one token at a time — continues until the model decides the response is complete.

## Why Context Matters

The `context`, or the tokens that come before, is very important. It changes what the model predicts next. Let's look at two examples to see how context affects the prediction.


## How Chat Interfaces Use LLMs as Assistants

Modern chat interfaces make it easy to use LLMs as assistants. While LLMs always predict the next token, chat interfaces add instructions or context to guide the model in acting helpfully and following your requests.

For example, when you type a question or instruction, the chat interface may add a hidden prompt like "You are a helpful assistant." This tells the LLM to answer your question, follow your instructions, and continue your text.

The chat interface also keeps track of the conversation history so the LLM can give more relevant and coherent responses. This setup allows you to interact with LLMs naturally, as if chatting with an assistant.

# 2- Understanding Model Versions

In this lesson, we will focus on understanding `model versions`. This knowledge will help you make better choices when working with different models and crafting your prompts.


### Before we Start
You probably already noticed that LLM's answer can differ each time you send a prompt.

This is because LLMs are probabilistic—they generate responses by predicting the most likely next word or token, but there's always some randomness involved. This means you can't be sure the model will always return exactly what you want. Sometimes, the answer can be very different with each run of a request, even if you use the same prompt.

This is important to remember during the practices: since we can't know exactly what the LLM will return, you may get an unexpected answer. If that happens, you can try rerunning the prompt in another chat to see if you get a better result, or you can refine your prompt to guide the model more clearly.


## What Are Model Versions?
LLMs are updated and improved over time. Each update is called a `model version`. New versions are released for several reasons:

- To include more recent information in the training data.
- To fix mistakes or improve how the model understands prompts.
- To make the model faster or more accurate.

For example, you might see models named `Claude Sonnet 3.7` and `Claude Sonnet 4`. The higher number usually means a newer version, often with more up-to-date information and better performance.


### Pay Attention

**Note**: Modern language models are typically designed to be cautious when addressing questions outside their knowledge base. If asked about something they have no information on, they often respond honestly with "I don't know." However, it's important to remember that a model's primary function is to generate plausible answers, not necessarily correct ones. As a result, they may sometimes produce inaccurate or misleading information. This is called "LLM hallucination".

While LLMs can be very helpful assistants, they should not be relied upon as definitive sources of truth.