## 多分类（MNIST）
* 10个输出
* 输出满足分布的要求，大于0，且所有和为1
* 最后一层为Softmax层
$$
P(y=i)=\frac{e^{z_i}}{\sum_{j=0}^{K-1}e^{z_j}},i\in {0,...,K-1}
$$

* loss函数，标签为one-hot
* NLLLoss: Negative Log Likelihood Loss $-Ylog\hat{Y}$

In [1]:
import numpy as np
y=np.array([1,0,0])
z=np.array([0.2,0.1,-0.1])
y_pred=np.exp(z)/np.exp(z).sum()
loss=(-y*np.log(y_pred)).sum()
print(loss)

0.9729189131256584


Pytorch中直接将z和y求损失：torch.nn.CrossEntropyLoss()

In [8]:
import numpy as np
import torch
y=torch.LongTensor([2,0,1]) # 变成长整型Tensor

y_pred1 = torch.Tensor([[0.1,0.2,0.9],
                      [1.1,0.1,0.2],
                      [0.2,2.1,0.1]])

y_pred2 = torch.Tensor([[0.8,0.2,0.3],
                      [0.2,0.3,0.5],
                      [0.2,0.2,0.5]])

criterion = torch.nn.CrossEntropyLoss()

l1 = criterion(y_pred1,y)
l2 = criterion(y_pred2,y)

print(l1.data,l2.data)

tensor(0.4966) tensor(1.2389)


In [4]:
# MNIST数据集分类
import torch
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch.optim as optim
import torchvision

# 1.数据准备
batch_size = 64

transform = transforms.Compose([transforms.ToTensor(),
                               transforms.Normalize((0.1307,),(0.3081))]) #通道(0.1307,)均值，通道(0.3081)标准差
train_dataset = torchvision.datasets.MNIST(root = "./dataset/mnist", train=True, download=True,transform=transform)
train_loader = DataLoader(train_dataset,shuffle=True,batch_size=batch_size)
test_dataset = torchvision.datasets.MNIST(root = "./dataset/mnist", train=False, download=True,transform=transform)
test_loader = DataLoader(test_dataset,shuffle=False,batch_size=batch_size)

# 构造模型
# 2. 构造模型
class Model(torch.nn.Module):
    def __init__(self):
        super(Model,self).__init__() #调用父类的构造，必须要有
        self.linear1 = torch.nn.Linear(784,512) #Linear对象中包含了weight和bias这两个Tensor,自动实现wx+b
        self.linear2 = torch.nn.Linear(512,256)
        self.linear3 = torch.nn.Linear(256,128)
        self.linear4 = torch.nn.Linear(128,64)
        self.linear5 = torch.nn.Linear(64,10)
        
    def forward(self,x):
        x = x.view(-1,784)
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        x = F.relu(self.linear3(x))
        x = F.relu(self.linear4(x))
        x = F.relu(self.linear5(x))
        return x

model = Model()

# 3. 构造损失函数和优化器
# 这里损失函数用BCE   
criterion = torch.nn.CrossEntropyLoss() #对于输入z，做softmax,Log,-YlogY

# optim中有一个类叫SGD torch.optim.SGD() weight_decay(加一个w^Tw的优化目标)
optimizer = torch.optim.SGD(model.parameters(),lr=0.01,momentum=0.5)#告诉优化器对哪些Tensor做梯度优化，由model中的paramenters告知

# 4. 训练周期
# 把一轮epoch封装成一个函数
def train(epoch):
    running_loss = 0
    for batch_idx,data in enumerate(train_loader,0):
        inputs,target = data
        optimizer.zero_grad()
        
        outputs = model(inputs)
        loss = criterion(outputs,target)
        loss.backward()
        optimizer.step()
        
        running_loss +=loss.item()
        
def test():
    correct = 0
    total = 0
    with torch.no_grad(): #主要是用于停止autograd模块的工作,以起到加速和节省显存的作用
        for data in test_loader:
            images,labels = data
            outputs = model(images)
            _,predicted = torch.max(outputs.data,dim=1) #返回最大值及其索引
            total +=labels.size(0)
            correct += (predicted == labels).sum().item()
    print(f"Accuracy on test set:{100*correct/total}%")
    
for epoch in range(100):
    train(epoch)
    test()

Accuracy on test set:51.47%
Accuracy on test set:55.54%
Accuracy on test set:56.86%
Accuracy on test set:68.06%
Accuracy on test set:68.72%
Accuracy on test set:68.82%
Accuracy on test set:68.99%
Accuracy on test set:68.98%
Accuracy on test set:69.09%
Accuracy on test set:69.16%
Accuracy on test set:69.11%
Accuracy on test set:69.15%
Accuracy on test set:69.15%
Accuracy on test set:69.13%
Accuracy on test set:69.21%
Accuracy on test set:69.2%
Accuracy on test set:69.12%
Accuracy on test set:69.15%
Accuracy on test set:69.24%
Accuracy on test set:69.27%
Accuracy on test set:69.21%
Accuracy on test set:69.27%
Accuracy on test set:69.19%
Accuracy on test set:69.27%
Accuracy on test set:69.25%
Accuracy on test set:69.2%
Accuracy on test set:69.24%
Accuracy on test set:69.22%
Accuracy on test set:69.19%
Accuracy on test set:69.22%
Accuracy on test set:69.22%
Accuracy on test set:69.26%
Accuracy on test set:69.25%
Accuracy on test set:69.23%
Accuracy on test set:69.2%
Accuracy on test set:69

图像：
* W*H*C 宽*高*通道数，一般要转成C*W*H
* 取值在[0,255]，一般将其压缩成[0,1]
* 使用transforms.Compose来是实现

图像需要从矩阵拼接成向量x.view(-1,784) 28*28=784,-1代表不确定的数

当使用冲量momentum时，则把每次x的更新量v考虑为本次的梯度下降量- dx * lr与上次x的更新量v乘上一个介于[0,1]的因子momentum的和，即v = - dx * lr + v * momemtum。