<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
Supplementary code for the <a href="http://mng.bz/orYv">Build a Large Language Model From Scratch</a> book by <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>Code repository: <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>

## FLOPS Analysis

- FLOPs (Floating Point Operations Per Second) measure the computational complexity of neural network models by counting the number of floating-point operations executed
- High FLOPs indicate more intensive computation and energy consumption

In [2]:
! pip install -r requirements-extra.txt

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting thop (from -r requirements-extra.txt (line 1))
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/bb/0f/72beeab4ff5221dc47127c80f8834b4bcd0cb36f6ba91c0b1d04a1233403/thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Installing collected packages: thop
Successfully installed thop-0.1.1.post2209072238


In [2]:
from importlib.metadata import version

import matplotlib
import torch

print("thop version:", version("thop"))
print("torch version:", version("torch"))
print(torch.version.cuda)

thop version: 0.1.1-2209072238
torch version: 2.0.1
None


In [2]:
# 尝试安装gpu相关的包
try:
    import torch.cuda
    print("torch cuda version:", version("torch"))
except ImportError:
    print("no torch cuda")

torch cuda version: 2.0.1


The relationship between CUDA, GPU, and the NVIDIA driver is foundational to understanding how applications leverage NVIDIA GPUs for computation, especially in contexts like deep learning and scientific computing. Here's a detailed explanation:

1. **GPU (Graphics Processing Unit):** This is the hardware component designed to accelerate graphics rendering and computational tasks. NVIDIA's GPUs are widely used for both gaming and compute-intensive applications, such as machine learning, data analysis, and scientific simulations.

2. **CUDA (Compute Unified Device Architecture):** CUDA is a parallel computing platform and programming model invented by NVIDIA. It allows developers to use NVIDIA GPUs for general purpose processing (an approach known as GPGPU, General-Purpose computing on Graphics Processing Units). CUDA provides a direct way to interact with the GPU's virtual instruction set and parallel computational elements, for executing compute kernels.

3. **NVIDIA Driver:** The NVIDIA driver is software that operates at the system level to enable communication between the operating system and the GPU hardware. It includes the necessary components to interface with CUDA applications, manage GPU resources, and execute the compiled CUDA kernels on the GPU.

**Relationship:**

- **CUDA and GPU:** CUDA is designed specifically for programming NVIDIA GPUs. It provides APIs and a runtime environment for developers to direct GPU acceleration for their applications. The CUDA platform is supported by CUDA-capable GPUs, which are specifically designed by NVIDIA to support parallel computing tasks using CUDA.

- **CUDA and NVIDIA Driver:** The NVIDIA driver includes the CUDA Driver API, which is necessary for executing applications developed with CUDA. The driver must be compatible with the version of CUDA used to develop an application. For example, newer versions of CUDA may require an updated NVIDIA driver that understands the latest CUDA features and instructions.

- **GPU and NVIDIA Driver:** The NVIDIA driver is essential for the operating system to recognize and utilize the GPU hardware. It translates high-level commands into low-level instructions for the GPU and manages resource allocation and scheduling for compute tasks.

In summary, the NVIDIA GPU is the hardware capable of accelerating computational tasks. CUDA is the software layer that allows developers to write programs that leverage the GPU for parallel computing. The NVIDIA driver acts as the intermediary, enabling the operating system and CUDA applications to communicate with the GPU hardware.

In [2]:
import torch
from thop import profile

from previous_chapters import GPTModel


BASE_CONFIG = {
    "vocab_size": 50257,     # Vocabulary size
    "context_length": 1024,  # Context length
    "drop_rate": 0.0,        # Dropout rate
    "qkv_bias": True         # Query-key-value bias
}

model_configs = {
    "gpt-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
    "gpt-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
    "gpt-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
    "gpt-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {device}")
input_tensor = torch.randint(0, 50257, (2, 1024)).to(device)

for size in model_configs:
    BASE_CONFIG.update(model_configs[size])
    
    model = GPTModel(BASE_CONFIG).bfloat16()
    model.to(device)

    # MACS = multiply-accumulate operations
    # MACS are typically counted as two FLOPS (one multiply and one accumulate)
    macs, params = profile(model, inputs=(input_tensor,), verbose=False)
    flops = 2*macs
    print(f"{size:18}: {flops:.1e} FLOPS")
    
    del model
    torch.cuda.empty_cache()

Device: cuda
gpt-small (124M)  : 5.1e+11 FLOPS
gpt-medium (355M) : 1.4e+12 FLOPS
gpt-large (774M)  : 3.2e+12 FLOPS
gpt-xl (1558M)    : 6.4e+12 FLOPS


In [5]:
import torch

# Check if CUDA is available
if torch.cuda.is_available():
    # Print the number of GPUs available
    print(f'Number of GPUs available: {torch.cuda.device_count()}')
    # Loop through and print each GPU's name
    for i in range(torch.cuda.device_count()):
        print(f'GPU {i}: {torch.cuda.get_device_name(i)}')
else:
    print('CUDA is not available. PyTorch is using CPU.')

CUDA is not available. PyTorch is using CPU.


In [1]:
import torch
print(torch.__version__)
print(torch.cuda.is_available())#cuda是否可用
torch.cuda.device_count()#返回GPU的数量
torch.cuda.get_device_name(0)#返回gpu名字，设备索引默认从0开始


2.0.1+cu118
True


'NVIDIA GeForce RTX 4060 Laptop GPU'