# Optimizing the model parameters

Now that we have a model and data it's time to train, validate and test our model by optimizing its parameters on our data. Training a model is an iterative process; in each iteration (called an *epoch*). The model makes a guess about the output, calculates the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters (as we saw in the previous module), and **optimizes** these parameters using gradient descent. 

现在我们有了模型和数据，是时候通过优化我们的数据参数来训练、验证和测试我们的模型了。 训练模型是一个迭代过程； 在每次迭代中（称为 *epoch*）。 该模型对输出进行猜测，计算其猜测中的误差 (*loss*)，收集关于其参数的误差的导数（正如我们在上一个模块中看到的），并**优化**这些参数 使用梯度下降。

## Prerequisite code 

We will load the code from the previous modules on **Datasets & DataLoaders** and **Build Model**

我们将在**数据集和数据加载器**和**建立模型**上加载前面模块的代码。

In [1]:
%matplotlib inline
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Using downloaded and verified file: data/FashionMNIST/raw/train-images-idx3-ubyte.gz
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Using downloaded and verified file: data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Using downloaded and verified file: data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Using downloaded and verified file: data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting data

## Setting hyperparameters 

Hyperparameters are adjustable parameters that let you control the model optimization process. 
Different hyperparameter values can impact model training and the level of accuracy.

We define the following hyperparameters for training:
 - **Number of Epochs** - the number times the entire training dataset is pass through the network. 
 - **Batch Size** - the number of data samples seen by the model in each epoch. Iterates are the number of batches needs to compete an epoch.
 - **Learning Rate** - the size of steps the model match as it searchs for best weights that will produce a higher model accuracy. Smaller values means the model will take a longer time to find the best weights, while larger values may result in the model step over and misses the best weights which yields unpredictable behavior during training.


超参数是可调整的参数，可让您控制模型优化过程。
不同的超参数值会影响模型训练和准确度水平。

我们定义了以下用于训练的超参数：
  - **Number of Epochs** - 整个训练数据集通过网络的次数。
  - **Batch Size** - 模型在每个时期看到的数据样本数。 迭代次数是竞争一个纪元所需的批次数。
  - **Learning Rate** - 模型在搜索可产生更高模型精度的最佳权重时匹配的步长。 较小的值意味着模型将花费更长的时间来找到最佳权重，而较大的值可能会导致模型跳过并错过最佳权重，从而在训练期间产生不可预测的行为。

In [2]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Add an optimization loop

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each 
iteration of the optimization loop is called an **epoch**. 

Each epoch consists of two main parts:
 - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.
 - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.

Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to 
see the `full-impl-label` of the optimization loop.

一旦我们设置了超参数，我们就可以使用优化循环来训练和优化我们的模型。 每个
优化循环的迭代称为 **epoch**。

每个opoch由两个主要部分组成：
  - **训练阶段** - 迭代训练数据集并尝试收敛到最佳参数。
  - **验证/测试循环** - 遍历测试数据集以检查模型性能是否正在提高。

让我们简要地熟悉一下训练循环中使用的一些概念。 跳转到
请参阅优化循环的``full-impl-label``。

### Add a loss function

当出现一些训练数据时，我们未经训练的网络可能不会给出正确的
回答。 **损失函数**衡量得到的结果与目标值的相异程度，
它是我们希望在训练过程中最小化的损失函数。 为了计算损失，我们做了一个
使用给定数据样本的输入进行预测，并将其与真实数据标签值进行比较。

Common loss functions include:
- `nn.MSELoss` (Mean Square Error) used for regression tasks
- `nn.NLLLoss` (Negative Log Likelihood) used for classification
- `nn.CrossEntropyLoss` combines `nn.LogSoftmax` and `nn.NLLLoss`

We pass our model's output logits to `nn.CrossEntropyLoss`, which will normalize the logits and compute the prediction error.

常见的损失函数包括：
- `nn.MSELoss`（均方误差）用于回归任务
- `nn.NLLLoss`（负对数似然）用于分类
- `nn.CrossEntropyLoss` 结合了 `nn.LogSoftmax` 和 `nn.NLLLoss`

我们将模型的输出对数传递给 nn.CrossEntropyLoss，它将对对数进行归一化并计算预测误差。

In [3]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

### Optimization pass

Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).
All optimization logic is encapsulated in  the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many different optimizers
available in PyTorch such as `ADAM' and 'RMSProp`, that work better for different kinds of models and data.

We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.



优化是在每个训练步骤中调整模型参数以减少模型误差的过程。 **优化算法**定义了这个过程是如何执行的（在这个例子中我们使用随机梯度下降）。
所有优化逻辑都封装在``optimizer``对象中。 在这里，我们使用 SGD 优化器； 此外，还有许多不同的优化器
在 PyTorch 中可用，例如``ADAM``和``RMSProp``，它们更适用于不同类型的模型和数据。

我们通过注册需要训练的模型参数并传入学习率超参数来初始化优化器。

In [4]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:
 * Call `optimizer.zero_grad()` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
 * Back-propagate the prediction loss with a call to `loss.backwards()`. PyTorch deposits the gradients of the loss w.r.t. each parameter. 
 * Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass.

在训练循环中，优化分三步进行：
  * 调用 `optimizer.zero_grad()` 重置模型参数的梯度。 默认情况下渐变相加； 为了防止重复计算，我们在每次迭代时明确地将它们归零。
  * 通过调用 `loss.backwards()` 反向传播预测损失。 PyTorch 存储损失 w.r.t. 的梯度。 每个参数。
  * 一旦我们有了梯度，我们就调用``optimizer.step()``通过在反向传播中收集的梯度来调整参数。

## Full implementation

We define `train_loop` that loops over our optimization code, and `test_loop` that 
evaluates the model's performance against our test data.

我们定义了 ``train_loop``来循环我们的优化代码，而 ``test_loop``则根据我们的测试数据来评估模型的性能。
对照测试数据评估模型的性能。

In [5]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):        
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
            
    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to `train_loop` and `test_loop`.
Feel free to increase the number of epochs to track the model's improving performance.

我们初始化损失函数和优化器，并将其传递给`train_loop`和`test_loop`。
随意增加epochs的数量来跟踪模型的改进性能。

In [7]:
!nvidia-smi

Sun Feb  5 14:49:04 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           On   | 00000001:00:00.0 Off |                    0 |
| N/A   32C    P8    34W / 149W |      7MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.297787  [    0/60000]
loss: 2.293754  [ 6400/60000]
loss: 2.282526  [12800/60000]
loss: 2.282340  [19200/60000]
loss: 2.287089  [25600/60000]
loss: 2.272902  [32000/60000]
loss: 2.275425  [38400/60000]
loss: 2.265132  [44800/60000]
loss: 2.245488  [51200/60000]
loss: 2.233117  [57600/60000]
Test Error: 
 Accuracy: 38.6%, Avg loss: 0.035080 

Epoch 2
-------------------------------
loss: 2.236138  [    0/60000]
loss: 2.222393  [ 6400/60000]
loss: 2.186252  [12800/60000]
loss: 2.200873  [19200/60000]
loss: 2.193500  [25600/60000]
loss: 2.165872  [32000/60000]
loss: 2.193122  [38400/60000]
loss: 2.167030  [44800/60000]
loss: 2.151676  [51200/60000]
loss: 2.108763  [57600/60000]
Test Error: 
 Accuracy: 44.3%, Avg loss: 0.033319 

Epoch 3
-------------------------------
loss: 2.130956  [    0/60000]
loss: 2.094978  [ 6400/60000]
loss: 2.028166  [12800/60000]
loss: 2.069986  [19200/60000]
loss: 2.043037  [25600/60000]
loss: 2.007254  [32000/600

You may have noticed that the model is initially not very good (that's OK!). Try running the loop for more `epochs` or adjusting the `learning_rate` to a bigger number. It might also be the case that the model configuration we chose might not be the optimal one for this kind of problem (it isn't). Later courses will delve more into the model shapes that work for vision problems.


你可能已经注意到，这个模型最初不是很好（没关系！）。试着在循环中运行更多的 ``epochs ``或将 ``learning_rate``调整到一个更大的数字。也可能是我们选择的模型配置可能不是这种问题的最佳配置（它不是）。以后的课程将更多地深入研究对视觉问题有效的模型形状。

Saving Models
-------------

When you are satisfied with the model's performance, you can use `torch.save` to save it. PyTorch models store the learned parameters in an internal state dictionary, called `state_dict`. These can be persisted wit the `torch.save` method:

当你对模型的性能感到满意时，你可以使用`torch.save`来保存它。PyTorch模型将学到的参数存储在内部状态字典中，称为 ``state_dict``。这些参数可以通过`torch.save`方法进行持久化。


In [8]:
torch.save(model.state_dict(), "data/model.pth")

print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth
