In [2]:
# For tips on running notebooks in Google Colab, see
# https://docs.pytorch.org/tutorials/beginner/colab
%matplotlib inline

[Learn the Basics](intro.html) \|\| **Quickstart** \|\|
[Tensors](tensorqs_tutorial.html) \|\| [Datasets &
DataLoaders](data_tutorial.html) \|\|
[Transforms](transforms_tutorial.html) \|\| [Build
Model](buildmodel_tutorial.html) \|\|
[Autograd](autogradqs_tutorial.html) \|\|
[Optimization](optimization_tutorial.html) \|\| [Save & Load
Model](saveloadrun_tutorial.html)

Quickstart
==========

This section runs through the API for common tasks in machine learning.
Refer to the links in each section to dive deeper.

Working with data
-----------------

PyTorch has two [primitives to work with
data](https://pytorch.org/docs/stable/data.html):
`torch.utils.data.DataLoader` and `torch.utils.data.Dataset`. `Dataset`
stores the samples and their corresponding labels, and `DataLoader`
wraps an iterable around the `Dataset`.


In [3]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

PyTorch offers domain-specific libraries such as
[TorchText](https://pytorch.org/text/stable/index.html),
[TorchVision](https://pytorch.org/vision/stable/index.html), and
[TorchAudio](https://pytorch.org/audio/stable/index.html), all of which
include datasets. For this tutorial, we will be using a TorchVision
dataset.

The `torchvision.datasets` module contains `Dataset` objects for many
real-world vision data like CIFAR, COCO ([full list
here](https://pytorch.org/vision/stable/datasets.html)). In this
tutorial, we use the FashionMNIST dataset. Every TorchVision `Dataset`
includes two arguments: `transform` and `target_transform` to modify the
samples and labels respectively.


In [4]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [5]:
from torchvision.transforms import ToTensor

# 1. 定义标签映射字典（核心）
# Labels是0-9的数字，需要转换为对应的服装类别名称
label_map = {
    0: 'T-shirt/top',
    1: 'Trouser',
    2: 'Pullover',
    3: 'Dress',
    4: 'Coat',
    5: 'Sandal',
    6: 'Shirt',
    7: 'Sneaker',
    8: 'Bag',
    9: 'Ankle boot'
}
# 3. 取一个样本，查看标签并转换为类别名
image, label = training_data[0]  # 取第1个样本
print(f"数字标签：{label}")
print(f"对应服装类别：{label_map[label]}")
image, label = training_data[1]  # 取第1个样本
print(f"数字标签：{label}")
print(f"对应服装类别：{label_map[label]}")

数字标签：9
对应服装类别：Ankle boot
数字标签：0
对应服装类别：T-shirt/top


We pass the `Dataset` as an argument to `DataLoader`. This wraps an
iterable over our dataset, and supports automatic batching, sampling,
shuffling and multiprocess data loading. Here we define a batch size of
64, i.e. each element in the dataloader iterable will return a batch of
64 features and labels.


In [6]:
batch_size = 64

# Create data loaders. DataLoader 负责将数据集 (DataSet) 分成多个小批量
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in train_dataloader: # test_dataloader:
    # N → Number of samples（样本数量 / Batch Size）
    # C → Number of channels（通道数量，灰度图为1，彩色图为3）
    # H → Height（图像高度），每张图片的高度是 28 像素。
    # W → Width（图像宽度），每张图片的高度是 28 像素。 
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


Read more about [loading data in PyTorch](data_tutorial.html).


------------------------------------------------------------------------


Creating Models
===============

To define a neural network in PyTorch, we create a class that inherits
from
[nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).
We define the layers of the network in the `__init__` function and
specify how data will pass through the network in the `forward`
function. To accelerate operations in the neural network, we move it to
the
[accelerator](https://pytorch.org/docs/stable/torch.html#accelerators)
such as CUDA, MPS, MTIA, or XPU. If the current accelerator is
available, we will use it. Otherwise, we use the CPU.


In [7]:
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cuda device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Read more about [building neural networks in
PyTorch](buildmodel_tutorial.html).


------------------------------------------------------------------------


In [8]:
# Create a random tensor with shape (1, 28, 28) to represent a single grayscale image
X = torch.rand(1, 28, 28, device=device)
print(X.size())
# print(X): 1 x 28 x 28 的随机张量
# tensor([[[0.6170, 0.7235, 0.7325, 0.5641, 0.6598, 0.0372, 0.2379, 0.1680,
        #   0.0598, 0.6594, 0.6945, 0.2031, 0.8482, 0.1765, 0.0518, 0.5201,
        #   0.0914, 0.0186, 0.0645, 0.3898, 0.5518, 0.9396, 0.9768, 0.8926,
        #   0.3076, 0.5498, 0.5773, 0.6735], ...
logits = model(X)
print(logits)

# print(logits): 1 x 10 的张量，表示10个类别的
pred_probab = nn.Softmax(dim=1)(logits)

print(pred_probab)
# print(pred_probab): 1 x 10 的张量，表示10个类别

y_pred = pred_probab.argmax(1) # 返回最大值的索引
print(f"Predicted class: {y_pred}")

torch.Size([1, 28, 28])
tensor([[ 0.0767, -0.0650,  0.0573, -0.0169,  0.0626, -0.0712, -0.1471,  0.0768,
         -0.0158,  0.1026]], device='cuda:0', grad_fn=<AddmmBackward0>)
tensor([[0.1070, 0.0929, 0.1049, 0.0974, 0.1055, 0.0923, 0.0855, 0.1070, 0.0976,
         0.1098]], device='cuda:0', grad_fn=<SoftmaxBackward0>)
Predicted class: tensor([9], device='cuda:0')


In [9]:
# a sample minibatch of 3 images of size 28x28
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


In [10]:
# convert each 2D 28x28 image into a contiguous array of 784 pixel values
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


In [11]:
# The linear layer is a module that applies a linear transformation on the 
# input using its stored weights and biases.
# 线性层是一个模块，它使用存储的权重和偏置对输入应用线性变换。
# create a linear layer that maps 784 input features to 20 output features
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


In [12]:
# Non-linear activations are what create the complex mappings between the model’s
# inputs and outputs. They are applied after linear transformations to introduce
# nonlinearity, helping neural networks learn a wide variety of phenomena.
# nn.ReLU 激活函数
print(f"Before ReLU: {hidden1[0][0:5]}\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1[0][0:5]}")

Before ReLU: tensor([-0.4391, -0.0891,  0.3867,  0.4496,  0.5437], grad_fn=<SliceBackward0>)

After ReLU: tensor([0.0000, 0.0000, 0.3867, 0.4496, 0.5437], grad_fn=<SliceBackward0>)


In [13]:
# nn.Sequential is an ordered container of modules. The data is passed through 
# all the modules in the same order as defined.
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
print(logits.size())

torch.Size([3, 10])


In [14]:
# Softmax is typically used as the final layer of a classification model
# to convert the non-normalized output logits into probabilities that sum to one.
# The last linear layer of the neural network returns logits - raw values in [-infty, infty]
softmax = nn.Softmax(dim=1)
print(logits[0][0:5])
pred_probab = softmax(logits)
print(logits.size())
print(logits[0][0:5])

tensor([-0.0804, -0.2295, -0.2537, -0.0164,  0.1159], grad_fn=<SliceBackward0>)
torch.Size([3, 10])
tensor([-0.0804, -0.2295, -0.2537, -0.0164,  0.1159], grad_fn=<SliceBackward0>)


Optimizing the Model Parameters
===============================

To train a model, we need a [loss
function](https://pytorch.org/docs/stable/nn.html#loss-functions) and an
[optimizer](https://pytorch.org/docs/stable/optim.html).


In [15]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training
dataset (fed to it in batches), and backpropagates the prediction error
to adjust the model\'s parameters.


In [16]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model\'s performance against the test dataset to
ensure it is learning.


In [17]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (*epochs*).
During each epoch, the model learns parameters to make better
predictions. We print the model\'s accuracy and loss at each epoch;
we\'d like to see the accuracy increase and the loss decrease with every
epoch.


In [18]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.316953  [   64/60000]


loss: 2.299897  [ 6464/60000]
loss: 2.281999  [12864/60000]
loss: 2.269393  [19264/60000]
loss: 2.249248  [25664/60000]
loss: 2.216794  [32064/60000]
loss: 2.230864  [38464/60000]
loss: 2.195065  [44864/60000]
loss: 2.201070  [51264/60000]
loss: 2.151703  [57664/60000]
Test Error: 
 Accuracy: 30.7%, Avg loss: 2.157954 

Epoch 2
-------------------------------
loss: 2.179088  [   64/60000]
loss: 2.163750  [ 6464/60000]
loss: 2.108271  [12864/60000]
loss: 2.114677  [19264/60000]
loss: 2.071590  [25664/60000]
loss: 2.010280  [32064/60000]
loss: 2.033781  [38464/60000]
loss: 1.960065  [44864/60000]
loss: 1.975223  [51264/60000]
loss: 1.878955  [57664/60000]
Test Error: 
 Accuracy: 56.2%, Avg loss: 1.891744 

Epoch 3
-------------------------------
loss: 1.936095  [   64/60000]
loss: 1.896116  [ 6464/60000]
loss: 1.785518  [12864/60000]
loss: 1.809897  [19264/60000]
loss: 1.714134  [25664/60000]
loss: 1.667183  [32064/60000]
loss: 1.666209  [38464/60000]
loss: 1.582073  [44864/60000]
loss: 

Read more about [Training your model](optimization_tutorial.html).


------------------------------------------------------------------------


Saving Models
=============

A common way to save a model is to serialize the internal state
dictionary (containing the model parameters).


In [19]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


Loading Models
==============

The process for loading a model includes re-creating the model structure
and loading the state dictionary into it.


In [20]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))

<All keys matched successfully>

This model can now be used to make predictions.


In [21]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


Read more about [Saving & Loading your
model](saveloadrun_tutorial.html).
