In [1]:
import tensorflow as tf
from tensorflow import keras

print(tf.__version__)
print(keras.__version__)

2.4.1
2.4.0


导入数据集

In [2]:
fashion_mnist = keras.datasets.fashion_mnist
(x_train_full, y_train_full), (x_test, y_test) = fashion_mnist.load_data()

Fashion MNIST 数据集是 28*28 的灰度图片，数值为 0-255 之间的整数。

In [3]:
x_train_full.shape

(60000, 28, 28)

In [4]:
x_train_full.dtype

dtype('uint8')

数据已经划分为训练集和测试集，但是没有验证集。下面创建验证集。由于要使用梯度下降，因此对输入特征必须 scale 处理。下面将像素值缩放到 0-1 之间：

In [8]:
x_valid, x_train = x_train_full[:5000] / 255.0, x_train_full[5000:]/255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

下表是输出值和 label 之间的映射关系：

In [6]:
class_names=["T-shirt/top","Trouser","Pullover","Dress","Coat",
    "Sandal","Shirt","Sneaker","Bag","Ankleboot"]

In [9]:
class_names[y_train[0]]

'Coat'

## 构建模型

下面创建包含两层 hidden 层的分类 MLP

In [11]:
model = keras.models.Sequential() # 创建 Sequential 模型

model.add(keras.layers.Flatten(input_shape=[28, 28])) # 将输入转换为 1D 数组
model.add(keras.layers.Dense(300, activation='relu'))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))


In [12]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

In [13]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 300)               235500    
_________________________________________________________________
dense_4 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1010      
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________


可以看到，参数超级多。
第一个 hidden 层包含 784\*300 个连接 weights，以及300 个bias 项：784*300+300=235500。

可以通过 index 或 name 获取 model 的 layers 内容：

In [14]:
model.layers

[<tensorflow.python.keras.layers.core.Flatten at 0x1620028e460>,
 <tensorflow.python.keras.layers.core.Dense at 0x1620028e2b0>,
 <tensorflow.python.keras.layers.core.Dense at 0x1620028e820>,
 <tensorflow.python.keras.layers.core.Dense at 0x162002c9f40>]

- 通过索引查询 layers

In [15]:
hidden1 = model.layers[1]

In [16]:
hidden1.name

'dense_3'

In [17]:
model.get_layer('dense_3') is hidden1

True

所有的参数可以通过 `get_weights()` 和 `set_weights()` 查询或设置。对 `Dense` 层，包括连接 weights 和 bias 值：

In [19]:
weights, biases = hidden1.get_weights()
weights

array([[ 0.04637101,  0.00495404,  0.06134693, ..., -0.05147616,
        -0.04291172,  0.03060553],
       [-0.00050663,  0.03336366, -0.02447746, ..., -0.03111519,
         0.06637718,  0.01661943],
       [ 0.06311549, -0.01302806,  0.04543936, ...,  0.01381277,
         0.01091377, -0.06899497],
       ...,
       [-0.0547167 ,  0.01363814, -0.05577764, ..., -0.02544192,
        -0.07414022, -0.03448113],
       [ 0.02192867, -0.02372253,  0.00198369, ..., -0.06862842,
        -0.03765351, -0.05261379],
       [ 0.02422819,  0.00155612, -0.04348344, ...,  0.02211551,
         0.06717317, -0.01400331]], dtype=float32)

In [20]:
weights.shape

(784, 300)

In [21]:
biases

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0.

In [22]:
biases.shape

(300,)

`Dense` 层随机初始化 weights，biases 初始化为 0.如果需要采用不同的初始化方法，在创建 layer 时可以设置 `kernel_initializer` (kernel 是connection weights matrix 的别称)或 `bias_initializer`。