#### Cell命令模式目前支持的Jupyter Notebook快捷
- Enter : 转入编辑模式
- Shift-Enter : 运行本单元，选中或插入（最后一个Cell的时候）下个单元
- Ctrl-Enter : 运行本单元
- Alt-Enter : 运行本单元，在其下插入新单元
- Y : 单元转入代码状态
- M :单元转入markdown状态 （目前尚不支持R 原生状态）
- Up : 选中上方单元
- K : 选中上方单元
- Down : 选中下方单元
- J : 选中下方单元
- A : 在上方插入新单元
- B : 在下方插入新单元
- D,D : 删除选中的单元
- L : 转换行号
- Shift-Space : 向上滚动
- Space : 向下滚动
#### Cell编辑模式下支持的Vscode快捷键（只描述与编辑相关的那些快捷键）
- Ctrl + X ：剪切/剪切行（空选定）
- Ctrl + C : 复制/复制行（空选定）
- Ctrl + Delete / Backspace :删除右边、左边的字
- Alt + ↑ / ↓ :向上/向下移动行
- Shift + Alt + ↓ / ↑ : 向上/向下复制行
- Ctrl + Shift + K : 删除行
- Ctrl + Shift + \ : 跳到匹配的括号
- Ctrl + ] / [ : 缩进/突出行
- Ctrl + ← / → : 光标到字首/字尾
- Ctrl + / : 切换行注释
- Shift + Alt + A : 切换块注释
- Ctrl + H : 查找/替换

# learning base

## loss function

### CrossEntropyLoss
官方公式为：
$$
\ell(x,y) = L = \{ l_1, ..., l_N\}^T \\
l_n=-w_{y_n}\log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^{C} \exp({x_{n,c})}}
$$
在下述例子中，默认使用mean，$w_{y_n}$就是1/N

#### base test

In [2]:
import torch
import torch.nn as nn

# 假设模型输出是对应三个类别的概率
# model_output.shape = [n, c] 预测一个样本，输出三个类别
model_output = torch.tensor([[0.8, 0.1, 0.5]])

# 真实标签是第一类别（索引为0-2）
for i in range(3):
    true_label_index = torch.tensor([i])

    # 使用CrossEntropyLoss计算损失
    criterion = nn.CrossEntropyLoss()
    loss = criterion(model_output, true_label_index)

    print(f'Loss: {loss.item()}')

Loss: 0.8053160905838013
Loss: 1.5053160190582275
Loss: 1.105316162109375


$x_{n,y_n}$那就是$y_n$的第个$x$，这里$y_n$的$n$我倾向于batch，$x_{n,c}$就为0.3, 0.1, 0.5

In [16]:
import math
l1 = -math.log(math.exp(0.8)/(math.exp(0.8)+math.exp(0.1)+math.exp(0.5)))
l2 = -math.log(math.exp(0.1)/(math.exp(0.8)+math.exp(0.1)+math.exp(0.5)))
l3 = -math.log(math.exp(0.5)/(math.exp(0.8)+math.exp(0.1)+math.exp(0.5)))
print("l1:{}\nl2:{}\nl3:{}".format(l1,l2,l3))

l1:0.8053160526833752
l2:1.5053160526833753
l3:1.1053160526833754


所以代码更具体的应该为：
$$
\begin{aligned}
l_n&=-w_{y_n}\log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^{C} \exp(x_{n,c})}\\
&=-w_{y_n}*[x_{n,y_n}+\log(softmax(x_{n,c}))]\\
&=w_{y_n}*[nn.NLLLloss()-nn.LogSoftmax()]
\end{aligned}
$$

#### test with linear model

In [11]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# 定义神经网络结构
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        return x

# 定义模型超参数
input_size = 1  # 输入特征的大小，针对数据集格式
hidden_size = 4  # 隐藏层大小
output_size = 1  # 输出类别的数量
learning_rate = 0.01
batch_size = 1
epochs = 1000

# 创建模型实例
model = SimpleNet(input_size, hidden_size, output_size)
# 定义损失函数和优化器
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# 准备数据
x = np.array([1, 2, 3, 4, 1.1, 2.2, 3.3])
y_true = x**2
inputs = torch.FloatTensor(x).view(-1, 1)
labels = torch.FloatTensor(y_true).view(-1, 1)

# 训练模型
for epoch in range(epochs):
    total_loss = 0.0
    optimizer.zero_grad() # 清零梯度
    outputs = model(inputs) # 前向传播
    loss = criterion(outputs, labels) # 计算损失
    loss.backward() # 反向传播
    optimizer.step() # 更新权重
    total_loss += loss.item()
    if (epoch + 1) % 100 == 0:
        print(f'Epoch {epoch+1}/{epochs}, Loss: {total_loss/len(x)}')

Epoch 100/1000, Loss: 1.116616862160819
Epoch 200/1000, Loss: 0.44855516297476633
Epoch 300/1000, Loss: 0.1374076179095677
Epoch 400/1000, Loss: 0.11943644285202026
Epoch 500/1000, Loss: 0.11935197455542428
Epoch 600/1000, Loss: 0.1193519915853228
Epoch 700/1000, Loss: 0.11935187237603324
Epoch 800/1000, Loss: 0.11935187237603324
Epoch 900/1000, Loss: 0.11935187237603324
Epoch 1000/1000, Loss: 0.11935187237603324


In [12]:
model.fc1.weight.T  # shape(4, 1)

tensor([[-1.4887, -1.0781,  0.7716,  1.7271]], grad_fn=<PermuteBackward0>)

In [13]:
model.fc2.weight  # shape(1, 4)

Parameter containing:
tensor([[-1.4106, -1.0018,  0.1997,  0.8675]], requires_grad=True)

In [14]:
print(model.fc1.bias)  # shape(4)
print(model.fc2.bias)  # shape(1)

Parameter containing:
tensor([ 3.4327, -0.2063, -0.0263,  0.0605], requires_grad=True)
Parameter containing:
tensor([-0.1655], requires_grad=True)


In [19]:
# model inference
model.eval()  # train mode is equal to eval in this code
model(torch.FloatTensor([[4],]))

tensor([[14.5754]], grad_fn=<AddmmBackward0>)

In [47]:
# for evaluating the math argorithm
input_data = 4
out = 0
hidden_unit = []
for i in range(4):
    hidden_unit.append(input_data*model.fc1.weight[i][0]+model.fc1.bias[i])
    out += hidden_unit[i]*model.fc2.weight[0][i]
out = out + float(model.fc2.bias)
out

tensor(14.5754, grad_fn=<AddBackward0>)

经过上述代码可以看到，模型的线性连接层确实就是加权求和  
训练中，大概400个epoch就拟合了  
为了证明非线性activation的有效性，下面用一层网络替代

In [51]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(input_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        return x

input_size = 1 
hidden_size = 1  # 改成一层
output_size = 1
learning_rate = 0.01
batch_size = 1
epochs = 5000  # 收敛速度慢，增大训练次数

model = SimpleNet(input_size, hidden_size, output_size)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

x = np.array([1, 2, 3, 4, 1.1, 2.2, 3.3])
y_true = x**2
inputs = torch.FloatTensor(x).view(-1, 1)
labels = torch.FloatTensor(y_true).view(-1, 1)

for epoch in range(epochs):
    total_loss = 0.0
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    total_loss += loss.item()
    if (epoch + 1) % 1000 == 0:
        print(f'Epoch {epoch+1}/{epochs}, Loss: {total_loss/len(x)}')

Epoch 1000/5000, Loss: 0.34475255012512207
Epoch 2000/5000, Loss: 0.12732360192707606
Epoch 3000/5000, Loss: 0.1193715078490121
Epoch 4000/5000, Loss: 0.1193520086152213
Epoch 5000/5000, Loss: 0.11935194901057652
