## 过拟合与欠拟合

- 过拟合：就是模型的只对训练的数据敏感，对于其他未见的样本表现不好，模型泛化能力较弱；
- 欠拟合：模型对训练集和未见过的数据表现都差

![](imgs/01.png)


## 数据集的划分

模型中三种数据类型：
- 训练集：用来训练模型的数据
- 验证集：在训练的过程中用来测试训练模型的精度，也是可以是训练集的一部分
- 测试集：用来测试模型，不会放到训练集中，用来检验模型的泛化能力

验证集的作用：
- 根据验证集的性能表现来调整学习率、权值衰减系数
- 根据验证集的性能表现来重新调整网络拓扑结构
- 根据验证集的性能表现判断是否过拟合和欠拟合

验证集在模型中使用方式：


In [25]:
import tensorflow as tf
from tensorflow.keras import datasets,layers,optimizers,Sequential,metrics

def preprocess(x,y):
    x = tf.cast(x,dtype=tf.float32)/255.
    x =tf.reshape(x,[28*28])
    y =tf.cast(y,dtype=tf.int32)
    y = tf.one_hot(y,depth=10)
    return x,y

In [12]:
# 创建训练数据
batch_size = 128
(x,y),(x_val,y_val) = datasets.mnist.load_data()
print('datase:',x.shape,y.shape,x.min(),x.max(),y.min(),y.max())

datase: (60000, 28, 28) (60000,) 0 255 0 9


In [14]:
db = tf.data.Dataset.from_tensor_slices((x,y))
# 变换前
sample = next(iter(db))
print(sample[0].shape,sample[1].shape)

db = db.map(preprocess).shuffle(60000).batch(batch_size)

# 变换后
sample2 = next(iter(db))
print(sample2[0].shape,sample2[1].shape)

db_val = tf.data.Dataset.from_tensor_slices((x_val,y_val))
db_val = db_val.map(preprocess).batch(batch_size)


(28, 28) ()
(128, 784) (128, 10)


In [16]:
network = Sequential([layers.Dense(256,activation='relu'),layers.Dense(128,activation='relu'),layers.Dense(64,activation='relu'),layers.Dense(32,activation='relu'),layers.Dense(10,activation='relu')])

network.build(input_shape=(None,28*28))
network.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 256)               200960    
_________________________________________________________________
dense_6 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_7 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_8 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_9 (Dense)              (None, 10)                330       
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________


In [17]:
network.compile(optimizer=optimizers.Adam(lr=0.01),loss=tf.losses.CategoricalCrossentropy(from_logits=True),metrics=['accuracy'])

In [18]:
# fit 里面可以设置验证集
# 一般
network.fit(db,epochs=5,validation_data=db_val,validation_freq=2)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x18796cb6390>

In [19]:
# 利用验证集来测试模型的准确性
network.evaluate(db_val)



[0.14019764959812164, 0.9634000062942505]

In [21]:
sample = next(iter(db_val))
x = sample[0]
y = sample[1]

pred = network.predict(x)

y = tf.argmax(y,axis=1)
pred = tf.argmax(pred,axis=1)

print(pred)
print(y)

tf.Tensor(
[7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7
 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9
 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6 9 6 0 5 4 9 9 2 1 9 4 8
 7 3 9 7 4 4 4 9 2 5 4 7 6 7 4 0 5], shape=(128,), dtype=int64)
tf.Tensor(
[7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7
 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9
 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6 9 6 0 5 4 9 9 2 1 9 4 8
 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5], shape=(128,), dtype=int64)


### 验证集的使用方式

- 将数据集分割成三份
    - train
    - validation
    - test

In [23]:
# 例如上面的数据创建
(x,y),(x_test,y_test) = datasets.mnist.load_data()
print('datasets',x.shape,y.shape,x_test.shape,y_test.shape)

datasets (60000, 28, 28) (60000,) (10000, 28, 28) (10000,)


In [27]:
# 再继续分割训练集成两份
idx = tf.range(x.shape[0])
idx = tf.random.shuffle(idx)

# 创建训练集
x_train,y_train = tf.gather(x,idx[:50000]),tf.gather(y,idx[:50000])
# 创建验证集
x_val,y_val = tf.gather(x,idx[50000:]),tf.gather(y,idx[50000:])

db_train = tf.data.Dataset.from_tensor_slices((x_train,y_train))
db_train = db_train.map(preprocess).shuffle(50000).batch(batch_size)

db_val = tf.data.Dataset.from_tensor_slices((x_val,y_val))
db_val = db_val.map(preprocess).shuffle(10000).batch(batch_size)

db_test = tf.data.Dataset.from_tensor_slices((x_test,y_test))
db_test = db_test.map(preprocess).batch(batch_size)