In [2]:
# https://github.com/dragen1860/TensorFlow-2.x-Tutorials/tree/master/03-Play-with-MNIST

## Play with MNIST
A detailed MNIST walk-through!

Let's start by loading MNIST from keras.datasets and preprocessing to get rows of normalized 784-dimensional vectors.

In [3]:
import  tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics

(xs, ys),_ = datasets.mnist.load_data()
print('datasets:', xs.shape, ys.shape, xs.min(), xs.max(), type(xs), type(ys))

xs = tf.convert_to_tensor(xs, dtype=tf.float32) / 255.
db = tf.data.Dataset.from_tensor_slices((xs,ys))
db = db.batch(32).repeat(10)

2021-09-14 22:38:33.751209: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
datasets: (60000, 28, 28) (60000,) 0 255 <class 'numpy.ndarray'> <class 'numpy.ndarray'>


2021-09-14 22:38:38.340092: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-14 22:38:38.343315: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-09-14 22:38:38.391861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 22:38:38.392535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-09-14 22:38:38.392590: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-14 22:38:38.417511: I tensorflow/stream_executor/platform/def

Now let's build our network as a **keras.Sequential** model and instantiate a stochastic gradient descent optimizer from **keras.optimizers**.

In [4]:
network = Sequential([layers.Dense(256, activation='relu'),
                     layers.Dense(256, activation='relu'),
                     layers.Dense(256, activation='relu'),
                     layers.Dense(10)])
network.build(input_shape=(None, 28*28))
# https://www.tensorflow.org/guide/keras/custom_layers_and_models
network.summary()

optimizer = optimizers.SGD(lr=0.01)
acc_meter = metrics.Accuracy()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_2 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_3 (Dense)              (None, 10)                2570      
Total params: 335,114
Trainable params: 335,114
Non-trainable params: 0
_________________________________________________________________


Finally, we can iterate through our dataset and train our model. In this example, we use tf.GradientTape to manually compute the gradients of the loss with respect to our network's trainable variables. GradientTape is just one of many ways to perform gradient steps in TensorFlow 2.0:

- **Tf.GradientTape:** Manually computes loss gradients with respect to given variables by recording operations within its context manager. This is the most flexible way to perform optimizer steps, as we can work directly with gradients and don't need a pre-defined Keras model or loss function.
- **Model.train():** Keras's built-in function for iterating through a dataset and fitting a Keras.Model on it. This is often the best choice for training a Keras model and comes with options for progress bar displays, validation splits, multiprocessing, and generator support.
- **Optimizer.minimize():** Computes and differentiates through a given loss function and performs a step to minimize it with gradient descent. This method is easy to implement, and can be conveniently slapped onto any existing computational graph to make a working optimization step.

In [5]:
for step, (x,y) in enumerate(db):

    with tf.GradientTape() as tape:
        # [b, 28, 28] => [b, 784]
        x = tf.reshape(x, (-1, 28*28))
        # [b, 784] => [b, 10]
        out = network(x)
        # [b] => [b, 10]
        y_onehot = tf.one_hot(y, depth=10)
        # [b, 10]
        loss = tf.square(out-y_onehot)
        # [b]
        loss = tf.reduce_sum(loss) / 32


    acc_meter.update_state(tf.argmax(out, axis=1), y)

    grads = tape.gradient(loss, network.trainable_variables)
    optimizer.apply_gradients(zip(grads, network.trainable_variables))


    if step % 200==0:

        print(step, 'loss:', float(loss), 'acc:', acc_meter.result().numpy())
        acc_meter.reset_states()

2021-09-14 22:38:41.155656: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-14 22:38:42.081139: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11


0 loss: 1.4643440246582031 acc: 0.0625
200 loss: 0.4284134805202484 acc: 0.6846875
400 loss: 0.3961877226829529 acc: 0.841875
600 loss: 0.3652478754520416 acc: 0.8653125
800 loss: 0.2641378343105316 acc: 0.8946875
1000 loss: 0.3198693096637726 acc: 0.89703125
1200 loss: 0.3060196340084076 acc: 0.9040625
1400 loss: 0.22825348377227783 acc: 0.91734374
1600 loss: 0.20560911297798157 acc: 0.913125
1800 loss: 0.2043185979127884 acc: 0.92859375
2000 loss: 0.20112140476703644 acc: 0.9440625
2200 loss: 0.1445087492465973 acc: 0.9290625
2400 loss: 0.20331504940986633 acc: 0.92859375
2600 loss: 0.2115728110074997 acc: 0.9375
2800 loss: 0.12368302047252655 acc: 0.9375
3000 loss: 0.223783478140831 acc: 0.9323437
3200 loss: 0.15414217114448547 acc: 0.9359375
3400 loss: 0.1362285017967224 acc: 0.93609375
3600 loss: 0.11636339873075485 acc: 0.93890625
3800 loss: 0.16760046780109406 acc: 0.9571875
4000 loss: 0.1829649806022644 acc: 0.95203125
4200 loss: 0.14509813487529755 acc: 0.93953127
4400 loss: 0