In [None]:
# For tips on running notebooks in Google Colab, see
# https://docs.pytorch.org/tutorials/beginner/colab
%matplotlib inline

[Learn the Basics](intro.html) \|\|
[Quickstart](quickstart_tutorial.html) \|\|
[Tensors](tensorqs_tutorial.html) \|\| [Datasets &
DataLoaders](data_tutorial.html) \|\|
[Transforms](transforms_tutorial.html) \|\| **Build Model** \|\|
[Autograd](autogradqs_tutorial.html) \|\|
[Optimization](optimization_tutorial.html) \|\| [Save & Load
Model](saveloadrun_tutorial.html)

Build the Neural Network
========================

Neural networks comprise of layers/modules that perform operations on
data. The [torch.nn](https://pytorch.org/docs/stable/nn.html) namespace
provides all the building blocks you need to build your own neural
network. Every module in PyTorch subclasses the
[nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).
A neural network is a module itself that consists of other modules
(layers). This nested structure allows for building and managing complex
architectures easily.

In the following sections, we\'ll build a neural network to classify
images in the FashionMNIST dataset.


In [1]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Get Device for Training
=======================

We want to be able to train our model on an
[accelerator](https://pytorch.org/docs/stable/torch.html#accelerators)
such as CUDA, MPS, MTIA, or XPU. If the current accelerator is
available, we will use it. Otherwise, we use the CPU.

# **解释下面代码的运行逻辑：**
这段代码是 PyTorch 在较新版本（通常为 2.4+）中引入的**硬件抽象层（Hardware Abstraction Layer）**写法。它的核心目的是实现“代码跨平台自适应”，即一套代码可以在 NVIDIA GPU、Mac GPU 或 CPU 上无缝运行。

其执行逻辑可以拆解为以下三个步骤：

### 1. 检查可用性 (`is_available`)

程序首先执行 `if` 后面的判断：`torch.accelerator.is_available()`。

* 这个函数会探测当前系统环境。
* 它不仅检查 **CUDA**（NVIDIA），还会检查 **MPS**（Apple Silicon）、**XPU**（Intel GPU）等其他硬件加速器。
* **逻辑：** 如果找到了任何一种支持的硬件加速器，返回 `True`；否则返回 `False`。

### 2. 获取硬件类型 (`current_accelerator().type`)

如果前一步结果为 `True`，程序会执行前面的代码：

* `torch.accelerator.current_accelerator()`：获取当前正在使用的加速器对象。
* `.type`：读取该对象的类型字符串。
* **逻辑：** * 如果你在配备 NVIDIA 显卡的电脑上运行，它会返回字符串 **`"cuda"`**。
* 如果你在 MacBook (M1/M2/M3) 上运行，它会返回字符串 **`"mps"`**。
* 如果在 Intel 的专用显卡上运行，可能返回 **`"xpu"`**。



### 3. 设定备选方案 (`else "cpu"`)

如果第一步的结果为 `False`（即系统没检测到任何 GPU/NPU 加速硬件）：

* **逻辑：** 直接将变量 `device` 赋值为字符串 **`"cpu"`**。

---

### 为什么不直接用 `torch.cuda.is_available()`？

过去我们习惯写 `device = "cuda" if torch.cuda.is_available() else "cpu"`，但这种老写法有两个弊端：

1. **平台局限性：** 它只能识别 NVIDIA 的显卡。如果你在 Mac 上运行，它会跳过强劲的 GPU 跑去用 CPU。
2. **维护麻烦：** 随着硬件多样化（比如华为的 Ascend、Intel 的 GPU），代码里会有长串的 `if-elif-else`。

**使用 `torch.accelerator` 的好处：**
它是 PyTorch 官方为了统一接口而推出的。它将所有的加速器（Accelerator）抽象化了。无论底层硬件怎么变，你的代码只需要通过 `.type` 就能自动拿到正确的后端名称，极大地提高了代码的**可移植性**。

---

### 验证方法

你可以直接在你的环境中打印一下这个变量：

```python
import torch

# 执行这段逻辑
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"

print(f"当前代码运行在: {device}")

```

> **注意：** 如果你的 PyTorch 版本较低（低于 2.4），这段代码可能会报错。在这种情况下，通常还是使用 `torch.cuda.is_available()` 或判断 `torch.backends.mps.is_available()`。


In [2]:
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


Define the Class
================

We define our neural network by subclassing `nn.Module`, and initialize
the neural network layers in `__init__`. Every `nn.Module` subclass
implements the operations on input data in the `forward` method.


In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of `NeuralNetwork`, and move it to the `device`,
and print its structure.


In [4]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use the model, we pass it the input data. This executes the model\'s
`forward`, along with some [background
operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866).
Do not call `model.forward()` directly!

Calling the model on the input returns a 2-dimensional tensor with dim=0
corresponding to each output of 10 raw predicted values for each class,
and dim=1 corresponding to the individual values of each output. We get
the prediction probabilities by passing it through an instance of the
`nn.Softmax` module.

# **解释argmax():**
`argmax` 的全称是 **Arguments of the Maxima**。简单来说，它的作用是：**不关心最大值是多少，只关心最大值在哪里（索引位置）**。

在深度学习的分类任务中，模型输出的通常是一个包含概率的张量（比如 `pred_probab`），我们使用 `argmax` 来找出概率最大的那个类别编号。

---

### 1. 参数详解：那个 `1` 是什么意思？

在 `pred_probab.argmax(1)` 中，参数 `1` 代表的是**维度（Dimension/Axis）**。

假设你的 `pred_probab` 是一个二维张量，形状为 `[batch_size, num_classes]`（例如 `[64, 10]`，表示 64 张图片，每张图片对应 10 个类别的概率）。

* **`dim=1`（横向搜索）：** 沿着“类别”这个维度去找最大值。
* **结果：** 为每一行（每一个样本）返回一个最大概率所在的索引。
* **输出形状：** `[batch_size]`（例如 64 个预测出来的数字）。
* **这是最常用的用法**，因为我们需要知道每个样本到底被分到了哪一类。


* **`dim=0`（纵向搜索）：** 沿着“批次”这个维度去找最大值。
* **结果：** 比较这 64 个样本中，谁在某个类别上的概率最高。
* **输出形状：** `[num_classes]`。



---

### 2. 直观示例

假设我们有 3 个样本，预测 3 个类别（猫、狗、猪）：

```python
import torch

# 每一行代表一个样本对 [猫, 狗, 猪] 的预测概率
pred_probab = torch.tensor([
    [0.1, 0.8, 0.1],  # 样本0：概率最大是 0.8，索引是 1 (狗)
    [0.7, 0.2, 0.1],  # 样本1：概率最大是 0.7，索引是 0 (猫)
    [0.3, 0.3, 0.4]   # 样本2：概率最大是 0.4，索引是 2 (猪)
])

# 执行 argmax(1)
y_pred = pred_probab.argmax(1)

print(y_pred)
# 输出: tensor([1, 0, 2])

```

---

### 3. 与 `max()` 的区别

这是一个很容易混淆的点：

* **`torch.max()`**: 同时返回最大值和索引。
* **`torch.argmax()`**: 只返回索引，代码更简洁，内存占用也稍微小一点点。

### 4. 数学表达

对于一个向量 ，`argmax` 的定义为：



即找到所有  中使  最大的那个下标。

---

### 总结

在你的代码 `y_pred = pred_probab.argmax(1)` 中：

1. **输入**：模型对这一批数据的预测概率分布。
2. **操作**：在每一行中找到概率最大的那个位置。
3. **输出**：这批数据对应的预测类别标签。



In [12]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
print(logits)
print(logits.shape)
print(logits.shape[1])
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")
print(f"Predicted class: {y_pred.item()}")

tensor([[-0.0192,  0.0665,  0.0887,  0.0481,  0.0901,  0.1000,  0.0050,  0.0464,
         -0.0874,  0.0204]], grad_fn=<AddmmBackward0>)
torch.Size([1, 10])
10
Predicted class: tensor([5])
Predicted class: 5


------------------------------------------------------------------------


Model Layers
============

Let\'s break down the layers in the FashionMNIST model. To illustrate
it, we will take a sample minibatch of 3 images of size 28x28 and see
what happens to it as we pass it through the network.


In [13]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


nn.Flatten
==========

We initialize the
[nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html)
layer to convert each 2D 28x28 image into a contiguous array of 784
pixel values ( **the minibatch dimension (at dim=0) is maintained**).


In [14]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


nn.Linear
=========

The [linear
layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
is a module that applies a linear transformation on the input using its
stored weights and biases.


In [15]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


nn.ReLU
=======

Non-linear activations are what create the complex mappings between the
model\'s inputs and outputs. They are applied after linear
transformations to introduce *nonlinearity*, helping neural networks
learn a wide variety of phenomena.

In this model, we use
[nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html)
between our linear layers, but there\'s other activations to introduce
non-linearity in your model.


In [16]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.0371, -0.3446, -0.0345, -0.2609, -0.0826, -0.4072, -0.0081,  0.4677,
          0.0743, -0.8458, -0.2360,  0.0428,  0.2078, -0.5927, -0.1950,  0.3243,
          0.0107, -0.0573, -0.1963,  0.0134],
        [ 0.1939, -0.1306,  0.0989, -0.4317, -0.0335, -0.4360, -0.0767,  0.2482,
          0.6045, -0.0702, -0.2597, -0.2193,  0.4714, -0.8357, -0.2376,  0.1359,
          0.1311, -0.2352,  0.0890,  0.2184],
        [ 0.1240, -0.4474, -0.0571, -0.3469,  0.2320, -0.6487,  0.4066,  0.4426,
          0.3283, -0.4445, -0.3286, -0.2075,  0.4123, -0.8525, -0.1893, -0.0849,
          0.1082, -0.2444,  0.4329, -0.3370]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.4677, 0.0743,
         0.0000, 0.0000, 0.0428, 0.2078, 0.0000, 0.0000, 0.3243, 0.0107, 0.0000,
         0.0000, 0.0134],
        [0.1939, 0.0000, 0.0989, 0.0000, 0.0000, 0.0000, 0.0000, 0.2482, 0.6045,
         0.0000, 0.0000, 0.0000, 0.4714, 0.0000, 0.00

nn.Sequential
=============

[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html)
is an ordered container of modules. The data is passed through all the
modules in the same order as defined. You can use sequential containers
to put together a quick network like `seq_modules`.


In [18]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax
==========

The last linear layer of the neural network returns [logits]{.title-ref}
- raw values in \[-infty, infty\] - which are passed to the
[nn.Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)
module. The logits are scaled to values \[0, 1\] representing the
model\'s predicted probabilities for each class. `dim` parameter
indicates the dimension along which the values must sum to 1.


In [19]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

Model Parameters
================

Many layers inside a neural network are *parameterized*, i.e. have
associated weights and biases that are optimized during training.
Subclassing `nn.Module` automatically tracks all fields defined inside
your model object, and makes all parameters accessible using your
model\'s `parameters()` or `named_parameters()` methods.

In this example, we iterate over each parameter, and print its size and
a preview of its values.


In [20]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0026,  0.0315, -0.0333,  ...,  0.0063,  0.0201, -0.0243],
        [-0.0347, -0.0070,  0.0072,  ..., -0.0080, -0.0139, -0.0081]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0006,  0.0286], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0352,  0.0243,  0.0417,  ..., -0.0022, -0.0087, -0.0109],
        [-0.0344,  0.0099, -0.0364,  ..., -0.0310,  0.0173, -0.0073]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | 

------------------------------------------------------------------------


Further Reading
===============

-   [torch.nn API](https://pytorch.org/docs/stable/nn.html)
