# 1- How LLMs Generate Text: Predicting the Next Token

LLMs, like the ones you will use in this course, generate text by predicting one piece at a time. These pieces are called `tokens`. The model looks at the text you have already given (the context) and tries to guess what comes next, one token at a time. This process is repeated until the model finishes its response.

Let's break down what a `token` is and how this prediction process works.

## What Is a Token?

A `token` is a small chunk of text. Depending on the language model, it can be a word, part of a word, or even just a character. LLMs do not see text as whole sentences or paragraphs. Instead, they break everything down into tokens.

For example, let's look at the phrase:

```bash
A bottle of water
```

Depending on the model, this might be split into tokens like:

- `A`
- `bottle`
- `of`
- `water`

Or, in some cases, `bottle` might be split into `bott` and `le` if the model uses smaller pieces. For this lesson, you can think of tokens as words or short word parts.

Tokens are important because the model predicts text one token, not one word or sentence at a time.


## How LLMs Predict the Next Token

Now, let's see how LLMs generate text step by step. The model always looks at the context — the already written tokens — and predicts what comes next.

Let's start with a simple prompt:

```
A bottle of
```

At this point, the model has three tokens: `A`, `bottle`, and `of`. It now needs to predict the next token. The model looks at the context (`A bottle of`) and tries to guess what is most likely to come next.

Common next tokens might be:

- `water`
- `wine`
- `milk`

The model chooses the most likely one based on its training. If you continue, the process repeats. For example, if the model predicts `water`, the new context is:

```
A bottle of water
```

Now, the model can predict what comes after `water`, such as a period or another word.

This process — predicting one token at a time — continues until the model decides the response is complete.

## Why Context Matters

The `context`, or the tokens that come before, is very important. It changes what the model predicts next. Let's look at two examples to see how context affects the prediction.


## How Chat Interfaces Use LLMs as Assistants

Modern chat interfaces make it easy to use LLMs as assistants. While LLMs always predict the next token, chat interfaces add instructions or context to guide the model in acting helpfully and following your requests.

For example, when you type a question or instruction, the chat interface may add a hidden prompt like "You are a helpful assistant." This tells the LLM to answer your question, follow your instructions, and continue your text.

The chat interface also keeps track of the conversation history so the LLM can give more relevant and coherent responses. This setup allows you to interact with LLMs naturally, as if chatting with an assistant.

# 2- Understanding Model Versions

In this lesson, we will focus on understanding `model versions`. This knowledge will help you make better choices when working with different models and crafting your prompts.


### Before we Start
You probably already noticed that LLM's answer can differ each time you send a prompt.

This is because LLMs are probabilistic—they generate responses by predicting the most likely next word or token, but there's always some randomness involved. This means you can't be sure the model will always return exactly what you want. Sometimes, the answer can be very different with each run of a request, even if you use the same prompt.

This is important to remember during the practices: since we can't know exactly what the LLM will return, you may get an unexpected answer. If that happens, you can try rerunning the prompt in another chat to see if you get a better result, or you can refine your prompt to guide the model more clearly.


## What Are Model Versions?
LLMs are updated and improved over time. Each update is called a `model version`. New versions are released for several reasons:

- To include more recent information in the training data.
- To fix mistakes or improve how the model understands prompts.
- To make the model faster or more accurate.

For example, you might see models named `Claude Sonnet 3.7` and `Claude Sonnet 4`. The higher number usually means a newer version, often with more up-to-date information and better performance.


### Pay Attention

**Note**: Modern language models are typically designed to be cautious when addressing questions outside their knowledge base. If asked about something they have no information on, they often respond honestly with "I don't know." However, it's important to remember that a model's primary function is to generate plausible answers, not necessarily correct ones. As a result, they may sometimes produce inaccurate or misleading information. This is called "LLM hallucination".

While LLMs can be very helpful assistants, they should not be relied upon as definitive sources of truth.

# 3- Understanding Reasoning Models: Guiding LLMs to Think Step by Step

This lesson is about how you can get even better answers from LLMs — especially for complex or multi-step problems — by guiding them to "think" before answering.

When you ask a model a simple question, it often gives you a direct answer. However, for more complicated tasks, such as solving a math problem or analyzing a scenario, the model can make mistakes if it tries to answer too quickly. This is where `reasoning techniques` come in. Encouraging the model to break down its thought process can help it arrive at more accurate and logical answers.


## Chain of Thought (CoT): The Step-by-Step Approach
The `Chain of Thought (CoT)` technique is a way to prompt LLMs to solve problems step by step, just like you might do on paper. Instead of asking for a final answer right away, you guide the model to show its reasoning process.

Let's see how this works, starting with a simple math problem.

Suppose you ask:

```bash
Q: What is 197 * 971?
```

If you ask this, the model will try to predict the next token. It can't actually do the math; it just predicts something plausible. Sometimes, it gets it right, but often, it makes a mistake, especially with large numbers or multi-step problems.

To help the model, you can add a phrase like `Think step by step` to your prompt. This tells the model to break down the problem:

```bash
Q: What is 197 * 971? Think step by step.
```
But you can be even more helpful by showing the model exactly how to break down the steps. Let's build this up together.

First, you can show the model how to multiply using place value:


```bash
Step 1: Multiply the unit digit of 197 (which is 7) by 971.
Step 2: Multiply the tens digit of 197 (which is 9) by 971, and add a zero at the end.
Step 3: Multiply the hundreds digit of 197 (which is 1) by 971, and add two zeroes at the end.
Step 4: Add all the results together.
```

Chain of Thought (CoT): Full Example
Now, let's put this into a full prompt, step by step:


```bash
Q: What's 212 * 385?

A: Step 1: Multiply 2 (units place of 212) by 385
       2 * 385 = 770
Step 2: Multiply 1 (tens place of 212) by 385 and add a zero at the end
       1 * 385 = 385 => 3850
Step 3: Multiply 2 (hundreds place of 212) by 385 and add two zeroes at the end
       2 * 385 = 770 => 77000
Step 4: Add all the results from steps 1, 2, and 3
       770 + 3850 + 77000 = 81620

Therefore, 212 * 385 = 81620

Q: What's 197 * 971?
```

In this prompt, we provide the model with a solution example for a different problem and then ask for a solution to an actual problem. This way, we show the model how to "think" properly.

By guiding the model through each step, you help it avoid mistakes and clarify its reasoning. This is the core idea behind Chain of Thought prompting.

You can use this approach for many problems, not just math. For example, you can ask the model to explain its reasoning in logic puzzles, story analysis, or code debugging.


## How Reasoning Models Use Chain of Thought

Some advanced LLMs are trained to use Chain of Thought reasoning automatically, especially when they see specific cues in your prompt. These are called `reasoning models`. They are designed to handle multi-step problems by breaking them down internally, even if you don't explicitly ask them to.

For example, if you prompt a reasoning model with:

```bash
Q: What's 202*588?
```
A reasoning model might automatically start breaking down the steps, similar to the example above, and show its work before giving the final answer.

However, not all models do this by default. For regular models, you often need to guide them by providing an example or by saying, `Think step by step`. Reasoning models are more likely to follow this process independently, but you can still help them by being clear in your prompt.


## When to Use Reasoning Models vs. Regular Models
Let's examine the pros and cons of using reasoning models and when to use each type.

| Category | Reasoning Models (with CoT)                                                                                                                | Regular Models                                                                            |
| -------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
| **Pros** | • Better at multi-step problems (math, logic, analysis)<br>• More transparent reasoning<br>• Less likely to make mistakes on complex tasks | • Faster responses<br>• Good for simple, factual questions<br>• Uses less computing power |
| **Cons** | • Slower, longer answers<br>• May over-explain simple questions<br>• Usually more expensive to run                                         | • Can make mistakes on complex tasks<br>• Less transparent reasoning                      |



`When to use reasoning models:`

- Solving math problems with multiple steps
- Logic puzzles or riddles
- Explaining something
- Brainstorming sessions
- Analyzing stories or scenarios
- Advanced coding tasks


`When to use regular models:`

- Quick factual lookups (e.g., "What is the capital of France?")
- Simple, direct questions
- When you want to create something, e.g., generate a story or a code snippet
- When you want a short answer
- When you want to save money
