# 3.7 softmax回归的简洁实现

我们在3.3节（线性回归的简洁实现）中已经了解了使用Tensorflow2.0实现模型的便利。下面，让我们再次使用Tensorflow2.0来实现一个softmax回归模型。首先导入所需的包或模块。


In [2]:
import tensorflow as tf
from tensorflow import keras


## 3.7.1 获取和读取数据

我们仍然使用Fashion-MNIST数据集和上一节中设置的批量大小。

In [3]:
fashion_mnist = keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

对数据进行处理，归一化，便于训练

In [4]:
x_train = x_train / 255.0
x_test = x_test / 255.0

In [7]:
x_train.shape, y_train.shape

((60000, 28, 28), (60000,))

在3.4节（softmax回归）中提到，softmax回归的输出层是一个全连接层。因此，我们添加一个输出个数为10的全连接层。 第一层是Flatten，将28 * 28的像素值，压缩成一行 (784, ) 第二层还是Dense，因为是多分类问题，激活函数使用softmax

In [8]:
model = keras.Sequential([keras.layers.Flatten(input_shape=(28, 28)), 
                         keras.layers.Dense(10, activation=tf.nn.softmax)])

In [9]:
help(keras.layers.Flatten)

Help on class Flatten in module tensorflow.python.keras.layers.core:

class Flatten(tensorflow.python.keras.engine.base_layer.Layer)
 |  Flatten(data_format=None, **kwargs)
 |  
 |  Flattens the input. Does not affect the batch size.
 |  
 |  If inputs are shaped `(batch,)` without a channel dimension, then flattening
 |  adds an extra channel dimension and output shapes are `(batch, 1)`.
 |  
 |  Arguments:
 |    data_format: A string,
 |      one of `channels_last` (default) or `channels_first`.
 |      The ordering of the dimensions in the inputs.
 |      `channels_last` corresponds to inputs with shape
 |      `(batch, ..., channels)` while `channels_first` corresponds to
 |      inputs with shape `(batch, channels, ...)`.
 |      It defaults to the `image_data_format` value found in your
 |      Keras config file at `~/.keras/keras.json`.
 |      If you never set it, then it will be "channels_last".
 |  
 |  Example:
 |  
 |  ```python
 |  model = Sequential()
 |  model.add(Convol

## 3.7.3 softmax和交叉熵损失函数

如果做了上一节的练习，那么你可能意识到了分开定义softmax运算和交叉熵损失函数可能会造成数值不稳定。因此，Tensorflow2.0的keras API提供了一个loss参数。它的数值稳定性更好。

In [10]:
loss = 'sparse_categorical_crossentropy'

## 3.7.4 定义优化算法

我们使用学习率为0.1的小批量随机梯度下降作为优化算法。

In [11]:
optimizer = tf.keras.optimizers.SGD(0.1)

## 3.7.5 训练模型

接下来，我们使用上一节中定义的训练函数来训练模型。

In [14]:
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=256)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f4bef0616d0>

接下来，比较模型在测试数据集上的表现情况

In [15]:
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test Acc:',test_acc)

Test Acc: 0.8386


方式二

In [None]:
# SparseCategoricalCrossentropy 已经除以了 batch_size,所以sgd 不需要再除以 batch_size 了
# https://stackoverflow.com/questions/58159154/how-to-calculate-categorical-cross-entropy-by-hand
# 默认，SUM_OVER_BATCH_SIZE，Scalar SUM divided by number of elements in losses. 
cross_entropy = losses.SparseCategoricalCrossentropy()

In [3]:
import numpy as np
# 2 个样本3个类别的预测概率
y_hat = np.array([[.9, .05, .05], [.5, .3, .2], [.05, .01, .94]])
# 2 个样本真实值
y = np.array([0, 1, 2], dtype='int32')
tf.reduce_sum(tf.math.log(tf.boolean_mask(y_hat, tf.one_hot(y, depth=3))))/3

<tf.Tensor: shape=(), dtype=float64, numpy=-0.4570695745672833>

In [4]:
cce = tf.keras.losses.SparseCategoricalCrossentropy()
loss = cce(
  tf.convert_to_tensor([0, 1, 2]),
  tf.convert_to_tensor([[.9, .05, .05], [.5, .3, .2], [.05, .01, .94]]))
print('Loss: ', loss.numpy())  # Loss: 0.3239

Loss:  0.45706955


In [None]:
# 迭代周期，学习率
num_epochs, lr = 5, 0.1
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params=None, lr=None, trainer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            with tf.GradientTape() as tape:
                y_hat = net(X, training=True)
                l = loss(y, y_hat)
                #print(params)
                grads = tape.gradient(l, net.trainable_variables)
                # tf.keras.optimizers.SGD 直接使用是随机梯度下降 theta(t+1) = theta(t) - learning_rate * gradient
                trainer.apply_gradients(zip(grads, net.trainable_variables))  
                
            train_l_sum += l.numpy()
            train_acc_sum += tf.reduce_sum(tf.cast(tf.argmax(y_hat, axis=1) == tf.cast(y, dtype=tf.int64), dtype=tf.int64)).numpy()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

trainer = tf.keras.optimizers.SGD(lr)
train_ch3(model, train_iter, test_iter, cross_entropy, num_epochs, batch_size, lr=lr, trainer=trainer)

## 小结

- Tensorflow2.0提供的函数往往具有更好的数值稳定性。
- 可以使用Tensorflow2.0更简洁地实现softmax回归。