In [1]:
import torch
from torch.cpu import is_available

print(f"Pytorch version : {torch.__version__}")
if torch.cuda.is_available():
    print("CUDA GPU")
elif torch.mps.is_available():
    print("Apple Silicon GPU") 
else:
    print("CPU Only")  

Pytorch version : 2.9.1+cu128
CUDA GPU


--> Packages that will be being used in this book

In [2]:
from importlib.metadata import version

used_libraries = [
    "reasoning_from_scratch",
    "torch",
    "tokenizers" 
]

for lib in used_libraries:
    print(f"{lib} verion : {version(lib)}")

reasoning_from_scratch verion : 0.1.8
torch verion : 2.9.1
tokenizers verion : 0.22.1


### Using the Tensor cores
- Important to note that If you have moderen Voltas architecture or newer you can take advantage of Tensor cores, which are specialized in the matrix multiplications.
To enable them simply execute the following code:


In [3]:
# This is enforcing the float32 for matrix multiplication., by default torch set it to highest : read more at https://docs.pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html
torch.set_float32_matmul_precision("high")
# Better way is to use the AMP ( Automatic Mixed Precision)
# Have a look on the template related to AMP : /DATA/pyare/Routine/LLM/Reasoning/reasoning-from-scratch-pyare/ch02/02_main-chapter-code/notes/single_gpu_AMP_training_template.py

## 2.4 Preparing the Input Text for LLMs

In [4]:
# lets import the tokenizer
from reasoning_from_scratch.qwen3 import download_qwen3_small
download_qwen3_small(kind="base", tokenizer_only=True, out_dir = "qwen3")

The cammand downloads the `tokenizer-base.json` file. Now we can load the tokenizer settings from the tokenizer file into the `Qwen3Tokenizer`: 

In [5]:
from pathlib import Path
from sre_parse import Tokenizer 
from reasoning_from_scratch.qwen3 import Qwen3Tokenizer

tokenizer_path = Path("qwen3")/"tokenizer-base.json" 
tokenizer = Qwen3Tokenizer(tokenizer_file_path=tokenizer_path)

  from sre_parse import Tokenizer


<img src = "/DATA/pyare/Routine/LLM/Reasoning/reasoning-from-scratch-pyare/ch02/02_main-chapter-code/notes/Notes_Images/02__image006.png" alt ="3" width=700>

In [6]:
prompt = "Explain large language models."
input_token_ids_list = tokenizer.encode(prompt)

In [7]:
text = tokenizer.decode(input_token_ids_list)
print(text)

Explain large language models.


In [8]:
# lets look at the token ids
for i in input_token_ids_list:
    print(f"{[i]} --> {tokenizer.decode([i])}")

[840] --> Ex
[20772] --> plain
[3460] -->  large
[4128] -->  language
[4119] -->  models
[13] --> .


- Explain --> Split into the Ex and Plain and it depends on the tokenizer algorithem here we used the Byte Pair Encoding (BPE) which is subword based method.
- BPE can represent both common and rare words using a mix of full words and subword units.
- Spaces are also often included in tokens. For example , "large", which often helps the LLM dectect word boundaries.
- Quen3Tokenizer has a vocabulary of about 151,000 tokens, which is considered relatively large as of this writting ( for comparision, the early GPT-2 has vocabulary size of approximately 50,000 tokens and Llama 3 has a vocabulary size of approximately 128,000 tokens.)
- More tokens means increase in size and computationals cost, nearly doubling the tokens approximately doubles the computational cost of running the model as it needs to generate more tokens to complete the response.

### Exercise 2.1: Encoding unknown words
Experiment with the tokenizer to see if and how it handles unknown words. For this, get creative and make up words that don't exist. Also, if you speak multiple languages, try to encode words in a different language than English.

In [None]:
## LAter

## 2.5 Loading pre-trained models.
- In this chapter we will use the 0.6B Qwen3 pre-trained as base model.
Why Qwen3? 
- `Qwen3 0.6B` is more memory-efficient compared to `Llama 3 1B` and `OLMo 2 1B`.
- custom reimplementation of the Qwen3 and are compatible with original implementation pre-trained Qwen3 model weights.