<a href="https://colab.research.google.com/github/footprinthere/DeepLearningWithTensorflow/blob/main/2021_07_14_part2_lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NN model on MNIST dataset

In [20]:
import numpy as np
import tensorflow as tf

In [21]:
# load dataset
def load_mnist():
    (train_data, train_labels), (test_data, test_labels) = tf.keras.datasets.mnist.load_data()
    # add channel dimmension
    train_data = np.expand_dims(train_data, axis=-1)
    test_data = np.expand_dims(test_data, axis=-1)
    # normalize
    train_data = train_data.astype(np.float32) / 255.0
    test_data = test_data.astype(np.float32) / 255.0
    # one-hot encoding
    train_labels = tf.keras.utils.to_categorical(train_labels, 10)
    test_labels = tf.keras.utils.to_categorical(test_labels, 10)

    return train_data, train_labels, test_data, test_labels

데이터를 불러오면서 정규화를 수행한다.

입력 데이터(train_data, test_data)의 마지막 차원은 channel dimmension으로 확보해두어야 한다. 이때 np.expand.dims()를 사용한다.

tf.keras.utils.to_categorical()을 사용하면 label 데이터를 one-hot 방식으로 변환할 수 있다.

In [22]:
# create Dense object
def dense(units, weight_init):
    return tf.keras.layers.Dense(
        units=units, activation='relu', use_bias=True, kernel_initializer=weight_init
    )

tf.keras.layers.Dense() 객체를 쉽게 생성하기 위한 함수를 정의한다. activation function으로 ReLU를 사용하도록 설정한다.

In [23]:
# create sequential NN model
def create_model(label_dim):
    # sequential model with Flatten
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten())
    # 2 hidden layers
    weight_init = tf.keras.initializers.glorot_uniform()    # Xavier initializer
    for i in range(2):
        model.add(dense(256, weight_init))
        model.add(tf.keras.layers.Dropout(rate=0.5))    # dropout
    # output layer
    model.add(dense(label_dim, weight_init))

    return model

Sequential 모델을 생성하는 함수를 정의한다.

weight initializer로는 Xavier initializer(tf.keras.initializers.glorot_uniform())을 사용한다.

tf.keras.layers.Dropout() 객체를 add 하여 rate=0.5의 dropout을 적용한다.

In [24]:
# loss function
def loss_func(model, images, labels):
    logits = model(images, training=True)
    loss = tf.reduce_mean(
        tf.keras.losses.categorical_crossentropy(y_true=labels, y_pred=logits, from_logits=True)
    )
    return loss

loss를 계산하는 함수를 정의한다. cross-entropy 방식을 사용한다.

In [34]:
# accuracy function
def accuracy_func(model, images, labels):
    logits = model(images, training=True)
    prediction = tf.equal(tf.argmax(logits, -1), tf.argmax(labels, -1))
    accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))
    return accuracy

# gradient function
def grad_func(model, images, labels):
    with tf.GradientTape() as tape:
        loss = loss_func(model, images, labels)
    return tape.gradient(loss, model.variables)

accuracy를 계산하는 함수와 gradient를 계산하는 함수를 각각 정의한다.

In [26]:
""" Dataset """
train_x, train_y, test_x, test_y = load_mnist()

""" Hyper-parameters """
learning_rate = 0.001
batch_size = 126

training_epochs = 1
training_iterations = len(train_x) // batch_size

label_dim = 10

MNIST 데이터셋을 로드하고 모델의 hyper-parameter를 설정한다.

In [27]:
""" Graph Input using Dataset API """
train_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y)).\
    shuffle(buffer_size=100000).\
    prefetch(buffer_size=batch_size).\
    batch(batch_size, drop_remainder=True)
test_dataset = tf.data.Dataset.from_tensor_slices((test_x, test_y)).\
    shuffle(buffer_size=100000).\
    prefetch(buffer_size=len(test_x)).\
    batch(batch_size=len(test_x))

training set을 한 번에 batch_size만큼만 불러와서 처리할 수 있도록 batch()로 세팅한다.

shuffle()은 데이터셋의 순서를 무작위로 섞어주며, prefetch()는 학습을 진행하는 동안 다음 단계를 위해 데이터의 일부를 대기시키도록 한다.

test set은 한 번에 모두 불러와 사용할 수 있도록 세팅한다.

In [35]:
""" Model """
network = create_model(label_dim)

""" Train """
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

for epoch in range(training_epochs):
    for idx, (train_input, train_label) in enumerate(train_dataset):
        grads = grad_func(network, train_input, train_label)
        optimizer.apply_gradients(zip(grads, network.variables))
        # train loss and accuracy
        train_loss = loss_func(network, train_input, train_label)
        train_accuracy = accuracy_func(network, train_input ,train_label)
        # test accuarcy
        for test_input, test_label in test_dataset:
            test_accuracy = accuracy_func(network, test_input, test_label)
        # print result
        print("epoch {:2}({:5}/{:5}) | loss {:.8f} | train accuracy {:.4f} | test accuracy {:.4f}".\
              format(epoch, idx, training_iterations, train_loss, train_accuracy, test_accuracy))

epoch  0(    0/  476) | loss 2.30697656 | train accuracy 0.1190 | test accuracy 0.1245
epoch  0(    1/  476) | loss 2.27788711 | train accuracy 0.1032 | test accuracy 0.1520
epoch  0(    2/  476) | loss 2.21571159 | train accuracy 0.1111 | test accuracy 0.1770
epoch  0(    3/  476) | loss 2.16436219 | train accuracy 0.2381 | test accuracy 0.2108
epoch  0(    4/  476) | loss 2.16342974 | train accuracy 0.1825 | test accuracy 0.2360
epoch  0(    5/  476) | loss 2.11502790 | train accuracy 0.2222 | test accuracy 0.2736
epoch  0(    6/  476) | loss 2.09807229 | train accuracy 0.2937 | test accuracy 0.3073
epoch  0(    7/  476) | loss 2.04900336 | train accuracy 0.3571 | test accuracy 0.3291
epoch  0(    8/  476) | loss 2.03467417 | train accuracy 0.4365 | test accuracy 0.3642
epoch  0(    9/  476) | loss 1.92839897 | train accuracy 0.4921 | test accuracy 0.3968
epoch  0(   10/  476) | loss 1.94261265 | train accuracy 0.3889 | test accuracy 0.4232
epoch  0(   11/  476) | loss 1.81530690 | t

Adam optimizer를 이용해 학습한다.