TensorFlow提供一個非常便利的可視化訓練過程工具，也就是TensorBoard。可視化訓練過程對我們瞭解模型非常有幫助，在tf2.0的時候我們可以使用`tf.summary.create_file_writer`，而原始keras的高階api中也提供callback function的接口，一起瞭解怎麼使用。

In [1]:
import tensorflow as tf
import numpy as np

In [2]:
tf.__version__

'2.0.0'

指定硬體資源，相關可[參考](https://hackmd.io/@shaoeChen/ryWIV4vkL)

In [3]:
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')

In [4]:
gpus 

[PhysicalDevice(name=u'/physical_device:GPU:0', device_type=u'GPU'),
 PhysicalDevice(name=u'/physical_device:GPU:1', device_type=u'GPU')]

In [5]:
tf.config.experimental.set_visible_devices(devices=gpus[1], device_type='GPU')

In [6]:
tf.config.experimental.set_memory_growth(device=gpus[1], enable=True)

原本的class就不要用了，讓它隨風去，一樣取得MNIST資料集並做標準化。

In [7]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = np.expand_dims(x_train / 255., -1)
x_test = np.expand_dims(x_test / 255., -1)

In [8]:
x_train.shape, x_test.shape

((60000, 28, 28, 1), (10000, 28, 28, 1))

將資料集與標籤做為參數提供給`tf.data`

In [9]:
datasets = tf.data.Dataset.from_tensor_slices((x_train, y_train))

In [10]:
datasets

<TensorSliceDataset shapes: ((28, 28, 1), ()), types: (tf.float64, tf.uint8)>

In [11]:
datasets = datasets.shuffle(buffer_size=128, seed=10).batch(128)

資料集處理完畢之後，就可以開始設置記錄訓練過程的工具

首先在執行目錄下建立一個資料夾`tensorboard`(依你需求即可)

然後利用`tf.summary.create_file_writer`指定該目錄

In [16]:
from datetime import datetime

In [17]:
stamp = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = 'tensorboard/%s' % stamp

In [14]:
summary_writer = tf.summary.create_file_writer(logdir)

接下來我們要在`summary_writer`上下文中訓練

讓我們先用之前的簡易架構來建置模型，不多做說明，只是過水走流程

In [12]:
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(filters=6, kernel_size=(5, 5), padding='valid', activation='tanh'),
    tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(filters=16, kernel_size=(5, 5), padding='valid', activation='tanh'),
    tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(120, activation='tanh'),
    tf.keras.layers.Dense(84, activation='tanh'),
    tf.keras.layers.Dense(10, activation='softmax'),
])


In [13]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('train_accuracy')

訓練過程中我們將剛才所定義的`summary_writer`做為上下文寫入記錄

In [17]:
@tf.function
def train_step(images, labels, epoch):
    with tf.GradientTape() as tape:
        predictions = model(images)
        loss = loss_object(labels, predictions)
        
    # 剛才定義的記錄器
    with summary_writer.as_default():
        tf.summary.scalar(name="loss", data=loss, step=epoch)
        
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    train_loss(loss)
    train_accuracy(labels, predictions)

In [18]:
%%time
# 迭代迴圈
for epoch in range(5):
    for images, labels in datasets:
        train_step(images, labels, epoch)    
    print(train_loss.result())
    print(train_accuracy.result())



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

tf.Tensor(0.28371805, shape=(), dtype=float32)
tf.Tensor(0.9185, shape=(), dtype=float32)
tf.Tensor(0.18340261, shape=(), dtype=float32)
tf.Tensor(0.94654167, shape=(), dtype=float32)
tf.Tensor(0.14149855, shape=(), dtype=float32)
tf.Tensor(0.95845556, shape=(), dtype=float32)
tf.Tensor(0.11691186, shape=(), dtype=float32)
tf.Tensor(0.96554166, shape=(), dtype=float32)
tf.Tensor(0.10028773, shape=(), dtype=float32)
tf.Tensor(0.97042334, shape=(), dtype=float32)
CPU times: user 15.2 s, sys: 1.72 s, total: 16.9 s
Wall time: 13.7 s


在執行上面的訓練程式之後可以發現到，`tensorboard`資料夾內已經多出一些檔案，只要開啟tensorboard就可以順利的讀取稍早所設置的資訊

我們可以按照[官方文件](https://www.tensorflow.org/tensorboard/tensorboard_in_notebooks)的說明在jupyter notebook上觀察，也可以另開網頁，依需求即可。如果是想在jupyter notebook上觀察的話可執行下面兩段語法糖即可。

tensorboard的資料只看到一個loss，那是因為我們的收集資訊只有loss，因此只有loss，這意味著過程中的資訊有需要的話就可以自己再加進去。

如果需求重新訓練的話，要嘛刪掉文件，要嘛再加一個子資料夾，然後重新指定資料夾路徑就可以了。

tensorboard帶來的便利不止如此，我們還可以記錄graph與profile的信息，這可以在訓練過程中透過`tf.summary.trace_on`來記錄

In [18]:
tf.summary.trace_on(graph=True, profiler=True)

In [19]:
%%time
# 迭代迴圈
for epoch in range(1):
    for images, labels in datasets:
        train_step(images, labels, epoch)    
    print(train_loss.result())
    print(train_accuracy.result())
    
    with summary_writer.as_default():
        tf.summary.trace_export(name='model_info',step=epoch, profiler_outdir=logdir)

     



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

tf.Tensor(0.28394422, shape=(), dtype=float32)
tf.Tensor(0.9206333, shape=(), dtype=float32)
CPU times: user 1min 53s, sys: 20.2 s, total: 2min 13s
Wall time: 2min 11s


In [20]:
tf.summary.trace_off()   

不過實作上不知道為什麼，使用`tf.summary.trace_export`的時候執行效能好像還蠻低落的，也許是我設置錯誤吧?

如果你使用的是原始keras的fit或fit_genenrator的話，那就可以利用callback function來加入記錄

編譯模型

In [14]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss=tf.keras.losses.sparse_categorical_crossentropy,
    metrics=['accuracy']
)

訓練之前先定義callback function

In [23]:
callback_tensorboard = tf.keras.callbacks.TensorBoard(log_dir=logdir, histogram_freq=1)

接下來只需要在`model.fit`中設置這個callback function即可

In [24]:
datasets = datasets.repeat()

In [25]:
%%time
model.fit(datasets,
          epochs=5, 
          callbacks=[callback_tensorboard],
          steps_per_epoch=int(len(x_train)/128))

Train for 468 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
CPU times: user 19.3 s, sys: 1.72 s, total: 21 s
Wall time: 14.9 s


<tensorflow.python.keras.callbacks.History at 0x7f3c80157150>

這邊發現，如果用model.fit的方式訓練，就需要設置datasets為repeat，如果是GradientTape就不用，後續有空再另外驗證

總之，訓練完畢之後就可以利用TensorBoard來觀察訓練狀態，非常方便。