Estimator是Tensorflow中可用的一个高级API，可以快速搭建模型，训练模型，评估模型，预测模型，保存模型，加载模型。

In [1]:
import os
import time
import numpy as np
import tensorflow as tf

In [2]:
# 打印GPU信息
tf.test.gpu_device_name()

'/device:GPU:0'

In [3]:
# 打印tensorflow-gpu版本
tf.__version__

'2.10.0'

In [3]:
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus, cpus)

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


In [4]:
print(tf.__version__)
print(tf.config.experimental.list_physical_devices('GPU'))

2.10.0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [4]:
LABEL_DIMENSIONS = 10
# 训练数据集
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()
TRAINING_SIZE = len(train_images)
TEST_SIZE = len(test_images)

train_images = np.asarray(train_images, dtype=np.float32) / 255.0  # 训练数据集
train_images = train_images.reshape([TRAINING_SIZE, 28, 28, 1])

test_images = np.asarray(test_images, dtype=np.float32) / 255.0  # 测试数据集
test_images = test_images.reshape([TEST_SIZE, 28, 28, 1])

train_labels = tf.keras.utils.to_categorical(train_labels, LABEL_DIMENSIONS)
test_labels = tf.keras.utils.to_categorical(test_labels, LABEL_DIMENSIONS)

# 将标签转换为float32
train_labels = train_labels.astype(np.float32)
test_labels = test_labels.astype(np.float32)
print(train_images.shape, train_labels.shape, test_images.shape, test_labels.shape)

(60000, 28, 28, 1) (60000, 10) (10000, 28, 28, 1) (10000, 10)


使用tf.Keras functional API构建卷积模型

In [5]:
inputs = tf.keras.Input(shape=(28, 28, 1))
x = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu')(inputs)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(units=128, activation='relu')(x)
predictions = tf.keras.layers.Dense(units=LABEL_DIMENSIONS, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=predictions)
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_2 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 1600)              0     

编译模型

In [6]:
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

定义一个策略，目的是在GPU上训练模型，将tf.keras.estimator.model_to_estimator()函数传入模型和配置信息

In [7]:
strategy = tf.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(train_distribute=strategy)
estimator = tf.keras.estimator.model_to_estimator(keras_model=model, config=config)

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Initializing RunConfig with distribution strategies.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Using the Keras model provided.


  "`tf.keras.backend.set_learning_phase` is deprecated and "


INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\23668\\AppData\\Local\\Temp\\tmp9rowxr56', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': <tensorflow.python.distribute.mirrored_strategy.MirroredStrategy object at 0x00000220A5C43DA0>, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas

使用tf.data定义用于训练和测试的输入函数

In [13]:
def input_fn(images, labels, batch_size, epochs):
    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    
    SHUFFLE_SIZE = 5000
    dataset = dataset.shuffle(buffer_size=SHUFFLE_SIZE).repeat(epochs).batch(batch_size)  # 随机打乱数据集
    dataset = dataset.prefetch(None)  # 预取数据集, 提升性能
    return dataset

In [14]:
# 训练模型
BATCH_SIZE = 512
EPOCHS = 50
estimator_train_result = estimator.train(input_fn=lambda: input_fn(train_images, train_labels,
                                         batch_size=BATCH_SIZE,
                                         epochs=EPOCHS))
print(estimator_train_result)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcas

In [15]:
estimator.evaluate(input_fn=lambda: input_fn(test_images, test_labels,epochs=1, batch_size=BATCH_SIZE))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2024-06-23T20:02:23
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\23668\AppData\Local\Temp\tmp9rowxr56\model.ckpt-5860
INFO:tensorflow:Running local_init_op.


  updates = self.state_updates


INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 1.42013s
INFO:tensorflow:Finished evaluation at 2024-06-23-20:02:24
INFO:tensorflow:Saving dict for global step 5860: accuracy = 0.8441, global_step = 5860, loss = 0.42820936
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5860: C:\Users\23668\AppData\Local\Temp\tmp9rowxr56\model.ckpt-5860


{'accuracy': 0.8441, 'loss': 0.42820936, 'global_step': 5860}