# Reproducibility in Deep Learning
> do you want to check your ideas in DL? you need Reproducibility (PyTorch, TF2.X)
- toc: true
- branch: master
- badges: true,
- comments: true
- image: images/reproducibility.jpg
- author: Sajjad Ayoubi
- categories: [tips]

# Reproducibility ?!
- deep learning training processes are stochastic in nature,
During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, also for comparing different things and evaluate new tricks and ideas
we need to train our neural nets in a deterministic way
- In the process of training a neural network, there are multiple stages where randomness is used, for example

  - random initialization of weights of the network before the training starts.
  - regularization, dropout, which involves randomly dropping nodes in the network while training.
  - optimization process like SGD or Adam also include random initializations.

- we will see that how can we use Frameworks in a deterministic way
- note in deterministic training you are a bit slow than stochastic

# PyTorch
- Mnist classification with Reproducibility
> from PyTorch Team: Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds, also Deterministic operations are often slower than nondeterministic operations



- the following works with all models (maybe not LSTMs I didn’t check that)

In [1]:
import numpy as np
import random, os

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

- create dataloder

In [None]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0), (255))])
train_ds = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

# if you set augmentations set worker_init_fn=(random.seed(0)) and num_workers=0 in dataloder
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=32, shuffle=True, num_workers=4)

- the following works with all models

In [4]:
def torch_seed(seed=0):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.cuda.manual_seed_all(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)




In [8]:
def train(reproducibility=True, n_run=2, device='cuda'):
    
    for n in range(n_run):
      print('run number: ', n+1)

      # set seed before create your model  
      if reproducibility:
          torch_seed(seed=0)
      # compile model
      model = nn.Sequential(nn.Flatten(), nn.Linear(28*28, 128), nn.GELU(), nn.Linear(128, 10)).to(device)
      loss_fn = nn.CrossEntropyLoss().to(device)
      optimizer = optim.AdamW(model.parameters(), lr=0.005, weight_decay=0.0)
      # training loop
      loss_avg = 0.0
      for i, data in enumerate(train_dl):
          inputs, labels = data
          optimizer.zero_grad()
          outputs = model(inputs.to(device))
          loss = loss_fn(outputs, labels.to(device))
          loss_avg = (loss_avg * i + loss) / (i+1)
          loss.backward()
          optimizer.step()
          if i%850==0:   
              print('[%d, %4d] loss: %.4f' %(i+1, len(train_dl), loss_avg))

In [9]:
train(reproducibility=False)

run number:  1
[1, 1875] loss: 2.2943
[851, 1875] loss: 0.8099
[1701, 1875] loss: 0.5946
run number:  2
[1, 1875] loss: 2.2945
[851, 1875] loss: 0.8078
[1701, 1875] loss: 0.5921


In [10]:
train(reproducibility=True)

run number:  1
[1, 1875] loss: 2.2983
[851, 1875] loss: 0.8051
[1701, 1875] loss: 0.5927
run number:  2
[1, 1875] loss: 2.2983
[851, 1875] loss: 0.8051
[1701, 1875] loss: 0.5927


- if you check your new ideas like me
- you have to always see how much is overhead of your implementation
- in pytorch for giving acutual time we use `synchronize`

In [None]:
%%timeit
# stay in GPUs until it done
torch.cuda.synchronize()

# Keras & TF 2.X
- Mnist classification with Reproducibility
> from Keras Team: when running on a GPU, some operations have non-deterministic outputs, in particular tf.reduce_sum(). This is due to the fact that GPUs run many operations in parallel, so the order of execution is not always guaranteed. Due to the limited precision of floats, even adding several numbers together may give slightly different results depending on the order in which you add them. You can try to avoid the non-deterministic operations, but some may be created automatically by TensorFlow to compute the gradients, so it is much simpler to just run the code on the CPU. For this, you can set the CUDA_VISIBLE_DEVICES environment variable to an empty string

- they said Keras REPRODUCIBILITY works just on CPUs
- but we need GPUs
- after a week seach I found a possible way on GPUs
  - based on this work [TensorFlow Determinism](https://github.com/NVIDIA/framework-determinism) from `NVIDIA`
  - now we can run Keras with REPRODUCIBILITY on GPUs :)

- Note: it works just for `TF >= 2.3`
  - also it works fine with `tf.data`
  - but you have to watch out (especially prefetch) 

- let's check this out

In [13]:
import random, os
import numpy as np

import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten

In [14]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [16]:
def tf_seed(seed=0):
	os.environ['PYTHONHASHSEED'] = str(seed)
	# For working on GPUs from "TensorFlow Determinism"
	os.environ["TF_DETERMINISTIC_OPS"] = str(seed)
	np.random.seed(seed)
	random.seed(seed)
	tf.random.set_seed(seed)

In [17]:
def train(reproducibility=True, n_run=2):
    
    for n in range(n_run):
      print('run number: ', n+1)

      # set seed before create your model  
      if reproducibility:
          tf_seed(seed=0)

      # compile model
      model = tf.keras.models.Sequential([Flatten(input_shape=(28, 28)), Dense(128, activation='gelu'), Dense(10)])
      loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
      model.compile(optimizer='adam', loss=loss_fn)
      # training 
      model.fit(x_train, y_train, epochs=1)

In [18]:
train(reproducibility=False)

run number:  1
run number:  2


In [19]:
train(reproducibility=True)

run number:  1
run number:  2


- if you want run it on CPUs see this

In [None]:
def tf_seed(seed=0):
    os.environ['PYTHONHASHSEED'] = str(seed)
    # if your machine has GPUs use following to off it
    os.environ['CUDA_VISBLE_DEVICE'] = ''
    np.random.seed(seed)
    random.seed(seed)
    python_random.seed(seed)
    tf.random.set_seed(seed)