## Reproducible results with Tensorflow?




This notebook was tested with Python 3.6 and Tensorflow 2.0 and Tensorflow 2.1. It was created to demonstrate an example for non-deterministic behaviour in training.  The example was tested with these  containers      
nvcr.io/nvidia/tensorflow:19.12-tf2-py3     
nvcr.io/nvidia/tensorflow:20.01-tf2-py3     
nvcr.io/nvidia/tensorflow:20.02-tf2-py3    


Start like this (using your current directory):     
docker run -it --gpus all -p 8888:8888  --rm -v \`pwd\`:\`pwd\` -w \`pwd\`  nvcr.io/nvidia/tensorflow:19.12-tf2-py3      
or     
docker run -it --gpus all -p 8888:8888  --rm -v \`pwd\`:\`pwd\` -w \`pwd\`  nvcr.io/nvidia/tensorflow:20-02-tf2-py3     

If you wish to run a jupyter notebook on your local machine, just use 

jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root     

and open the notebook in your browser by typing        

localhost:8888 



In [5]:
import json
import pprint
import tensorflow as tf
import numpy as np
print(tf.version.VERSION)
print(tf.keras.__version__)


2.1.0
2.2.4-tf


In [6]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


A typical source of non-determinism is randomizing the weights for our models. So it is good practice to define a single variable that contains a static random seed and use it across your pipeline.
The example below shows the effect not using a static seed. It creates random numbers with two functions f and g and prints them. As expected we get different values for A1 and A2 for funtions f and g.  


In [2]:
@tf.function
def f():
  a = tf.random.uniform([1])
  b = tf.random.uniform([1])
  return a, b

@tf.function
def g():
  a = tf.random.uniform([1])
  b = tf.random.uniform([1])
  return a, b

print(f())  # prints '(A1, A2)'
print(g())  # prints '(A1, A2)'

(<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.7615539], dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.7131593], dtype=float32)>)
(<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.87904346], dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.7187301], dtype=float32)>)


The random seed can also be specified with a specific number, such as “1”, to ensure that the same sequence of random numbers is generated each time the code is run.

This number can be used again and makes sure to get the same random numbers again in your model. The expected behaviour of this is, that the two functions e and h give exactly the same result for A1 and A2.

In [3]:
tf.random.set_seed(123)
#v1 tf.set_random_seed(123)
@tf.function
def e():
  a = tf.random.uniform([1])
  b = tf.random.uniform([1])
  return a, b

@tf.function
def h():
  a = tf.random.uniform([1])
  b = tf.random.uniform([1])
  return a, b

print(e())  # prints '(A1, A2)'
print(h())  # prints '(A1, A2)'

(<tf.Tensor: id=62, shape=(1,), dtype=float32, numpy=array([0.31000066], dtype=float32)>, <tf.Tensor: id=63, shape=(1,), dtype=float32, numpy=array([0.96672165], dtype=float32)>)
(<tf.Tensor: id=83, shape=(1,), dtype=float32, numpy=array([0.31000066], dtype=float32)>, <tf.Tensor: id=84, shape=(1,), dtype=float32, numpy=array([0.96672165], dtype=float32)>)



Taking these tactical measures will get you part of the way to reproducibility, but in order to have full visibility into your experiments, you’ll need to adopt a much more detailed log of your experiments


### Now look at determinism for convolutional network using cuDNN.

Similarly, with Tensorflow, using some methods may result into non-deterministic behaviour. One such example is backward pass of broadcasting on GPU. The reduction ops on GPU use asynchronous atomic adds, and are therefore fundamentally nondeterministic for floating point

good reading
https://people.inf.ethz.ch/omutlu/pub/automatic-fast-portable-parallel-reduction-on-GPUs_cgo19.pdf



In [10]:
import tensorflow as tf
import datetime


In [11]:
#'''Trains a simple convnet on the MNIST dataset.

# Gets to 99.25% test accuracy after 12 epochs
# 2 seconds per epoch on a V100 GPU.
# Watch out that Keras is integrated in Tensorflow 2.x (!)

from __future__ import print_function
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
import tensorflow.keras.backend as K
import time

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

start_time=time.time()


model.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
elapsedtime = time.time() - start_time
print('Test loss:', score[0])
print('Test accuracy:', score[1])
print('Elapsed Time:', elapsedtime)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
 7808/60000 [==>...........................] - ETA: 1s - loss: 1.4004 - accuracy: 0.6445

KeyboardInterrupt: 

Typically you get slightly different numbers for the accuracy.


Test 1
accuracy: 0.8365
Test loss: 0.7739880109786987
Test accuracy: 0.8365
Elapsed Time: 29.10216975212097

Test 2
accuracy: 0.8492
Test loss: 0.701254560661316
Test accuracy: 0.8492
Elapsed Time: 28.91641116142273

Test 3
accuracy: 0.8356
Test loss: 0.7402863544464111
Test accuracy: 0.8356
Elapsed Time: 28.80294704437256


## General remarks

NGC TensorFlow containers, starting with version 19.06, implement GPU-deterministic TensorFlow functionality. In Python code running inside the container, this can be enabled as follows:

In the past, tf.math.reduce_sum and tf.math.reduce_mean operated non-deterministically when running on a GPU. This was resolved before TensorFlow version 1.12. These ops now function deterministically by default when running on a GPU.    

Beyond atomicAdd, there are ten other CUDA atomic functions whose use could lead to the injection of non-determinism, such as atomicCAS (the most generic, atomic compare and swap). Note also that the word 'atomic' was present in 167 files in the TensorFlow repo and some of these may be related to the use of CUDA atomic operations. It's important to remember that it's possible to use CUDA atomic operations without injecting non-determinism, and that, therefore, when CUDA atomic operations are present in op code, it doesn't guarantee that the op injects non-determinism into the computation.

Recommendation to relevant blogs :      

https://determined.ai/blog/reproducibility-in-ml/      
https://suneeta-mall.github.io/2019/12/22/Reproducible-ml-tensorflow.html      
https://medium.com/@ODSC/properly-setting-the-random-seed-in-ml-experiments-not-as-simple-as-you-might-imagine-219969c84752     

NVIDIA GTC Talk from 2019 (Duncan Riach) https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9911-determinism-in-deep-learning.pdf


