<a href="https://colab.research.google.com/github/ndsoi/ndsoi/blob/main/%E6%89%8B%E5%8A%A8%E7%BC%96%E5%86%99%E6%A8%A1%E5%9E%8B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

手动实现向前传播、反向传播调参

In [27]:
import tensorflow as tf
import math
import numpy as np

我们来实现一个简单的Python类NaiveDense，它创建了两个TensorFlow变量W和b，并定义了一个__call__()方法供外部调用，以实现上述变换。

In [14]:
class NavieDense:
  def __init__(self,input_size,output_size,activation):
    self.activation = activation
    self.W = tf.Variable(tf.random.uniform((input_size,output_size),minval=0,maxval=1e-1))    # 为什么选用的是random.uniform，可以是zeros吗
    self.b = tf.Variable(tf.zeros(output_size,))
  def __call__(self,inputs):
    return self.activation(tf.matmul(inputs,self.W)+self.b)

  # 为什么会有下一行注释?
  @property
  def weights(self):
    return [self.W,self.b]


简单的Sequential类
下面我们创建一个NaiveSequential类，将这些层链接起来。它封装了一个层列表，并定义了一个__call__()方法供外部调用。这个方法将按顺序调用输入的层。它还有一个weights属性，用于记录该层的参数。

In [15]:
class NavieSequential:
  def __init__(self,layers):
    self.layers = layers

  def __call__(self,inputs):
    re = inputs   #自己写的是 tf.Variable(tf.zeros((inputs.shape)))
    for layer in self.layers:
      re = layer(re)
    return re

  @property
  def weights(self):
    weights = []
    for layer in self.layers:
      weights += layer.weights
    return weights



利用这个NaiveDense类和NaiveSequential类，我们可以创建一个与Keras类似的模型。

In [16]:
# model = NavieSequential(input,[NavieDense(512,"relu"),NavieDense(10,"softmax")])

model = NavieSequential([
    NavieDense(input_size=28*28,output_size=512,activation=tf.nn.relu),NavieDense(input_size=512,output_size=10,activation=tf.nn.softmax)])


批量生成器
接下来，我们需要对MNIST数据进行小批量迭代。这很简单。

In [17]:
class BatchGenerator:
  def __init__(self,image,label,batch_size=128):
    self.image = image
    self.label = label
    self.batch_num = math.ceil(len(label)/batch_size)
    self.curindex = 0

  def next(self):
    if self.curindex >= self.batch_num:
      self.curindex = 0
    st = self.curindex
    ed = self.curindex+128
    self.curindex+=128
    return self.image[st:ed],self.label[st:ed]

  def batch_num(self):
    return self.batch_num


2.5.2　完成一次训练步骤
最难的一步就是“训练步骤”，即在一批数据上运行模型后更新模型权重。我们需要做到以下几点。
(1)计算模型对图像批量的预测值。
(2)根据实际标签，计算这些预测值的损失值。
(3)计算损失相对于模型权重的梯度。
(4)将权重沿着梯度的反方向移动一小步。

In [18]:
def train_one_step(image,label,model):
  with tf.GradientTape() as tape:
    prediction = model(image)
    per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(label,prediction)
    average_loss = tf.reduce_mean(per_sample_losses)
    gradients = tape.gradient(average_loss,model.weights)
    update_weights(gradients,model.weights)
    return average_loss

learning_rate = 1e-3

def update_weights(gradients,weights):
  for g,w in zip(gradients,weights):
    w.assign_sub(g*learning_rate)


一轮训练就是对训练数据的每个批量都重复上述训练步骤，而完整的训练循环就是重复多轮训练。

In [23]:
def fit(images,labels,model,epoches=5,batch_size=128):
  batch = BatchGenerator(images,labels,batch_size)
  for epoch in range(epoches):
    for num in range(batch.batch_num):
      image_batch,label_batch = batch.next()
     # for i in range(batch_size): 我以为模型只能接收一张图像的输入,但没关系，输入数据将会是(128,28*28)的维度，所以模型只需关注input_size=28*28
      loss = train_one_step(image_batch,label_batch,model)

      if num % 100 == 0:
        print(f"loss at batch {num}:{loss:.2f}")



运行

In [30]:
from tensorflow.keras.datasets import mnist
(train_images,train_labels),(test_images,test_labels) = mnist.load_data()

train_images = train_images.reshape((60000,28*28))
train_images = train_images.astype("float32")/255

test_images = test_images.reshape((10000,28*28))
test_images = test_images.astype("float32")/255

fit(train_images,train_labels,model,epoches=10)

loss at batch 0:0.83
loss at batch 100:0.84
loss at batch 200:0.82
loss at batch 300:0.80
loss at batch 400:0.78
loss at batch 0:0.84
loss at batch 100:0.82
loss at batch 200:0.80
loss at batch 300:0.78
loss at batch 400:0.77
loss at batch 0:0.66
loss at batch 100:0.64
loss at batch 200:0.63
loss at batch 300:0.61
loss at batch 400:0.60
loss at batch 0:0.73
loss at batch 100:0.72
loss at batch 200:0.71
loss at batch 300:0.70
loss at batch 400:0.68
loss at batch 0:0.57
loss at batch 100:0.56
loss at batch 200:0.55
loss at batch 300:0.54
loss at batch 400:0.53
loss at batch 0:0.59
loss at batch 100:0.58
loss at batch 200:0.58
loss at batch 300:0.57
loss at batch 400:0.56
loss at batch 0:0.47
loss at batch 100:0.46
loss at batch 200:0.45
loss at batch 300:0.45
loss at batch 400:0.44
loss at batch 0:0.56
loss at batch 100:0.55
loss at batch 200:0.55
loss at batch 300:0.54
loss at batch 400:0.53
loss at batch 0:0.43
loss at batch 100:0.43
loss at batch 200:0.42
loss at batch 300:0.42
loss a

训练完成，评估模型

In [31]:

predictions = model(test_images)

predictions = predictions.numpy()  # 转化为numpy张量

predicted_labels = np.argmax(predictions,axis=1)  # 最终输出的是一个(10000,10)的张量 axis=1指的就是查看概率

matches = predicted_labels == test_labels
print(f'accuracy:{matches.mean():.2f}')

accuracy:0.81
