# Training a Hierarchical DivNoising network for Convallaria data which is intrinsically noisy
This notebook contains an example on how to train a Hierarchical DivNoising Ladder VAE for an intrinsically noisy data containing both pixel noise and structured noise. This requires having a noise model (model of the imaging noise) for pixel noises which can be either measured from calibration data or bootstrapped from raw noisy images themselves. If you haven't done so, please first run '1-CreateNoiseModel.ipynb', which will download the data and create a noise model. 

In [None]:
import warnings
warnings.filterwarnings('ignore')
# We import all our dependencies.
import numpy as np
import torch
import sys
sys.path.append('../../../')
from models.lvae import LadderVAE
from lib.gaussianMixtureNoiseModel import GaussianMixtureNoiseModel
from boilerplate import boilerplate
import lib.utils as utils
import training
from tifffile import imread
from matplotlib import pyplot as plt
from tqdm import tqdm

In [None]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

### Specify ```path``` to load training data
Your data should be stored in the directory indicated by ```path```.

In [None]:
path="./data/Struct_Convallaria/"
observation= imread(path+"flower.tif")

# Training Data Preparation

For training we need to follow some preprocessing steps first which will prepare the data for training purposes.

We first divide the data into training and validation sets with 85% images allocated to training set  and rest to validation set. Then we augment the training data 8-fold by 90 degree rotations and flips.

In [None]:
train_data = observation[:int(0.85*observation.shape[0])]
val_data= observation[int(0.85*observation.shape[0]):]
print("Shape of training images:", train_data.shape, "Shape of validation images:", val_data.shape)
train_data = utils.augment_data(train_data) 

In [None]:
### We extract overlapping patches of size ```patch_size x patch_size``` from training and validation images.
### Usually 64x64 patches work well for most microscopy datasets
patch_size = 64

In [None]:
img_width = observation.shape[2]
img_height = observation.shape[1]
num_patches = int(float(img_width*img_height)/float(patch_size**2)*1)
train_images = utils.extract_patches(train_data, patch_size, num_patches)
val_images = utils.extract_patches(val_data, patch_size, num_patches)
val_images = val_images[:1000] # We limit validation patches to 1000 to speed up training but it is not necessary
test_images = val_images[:100]
img_shape = (train_images.shape[1], train_images.shape[2])
print("Shape of training images:", train_images.shape, "Shape of validation images:", val_images.shape)

# Configure Hierarchical DivNoising model

<code>model_name</code> specifies the name of the model with which the weights will be saved and wil be loaded later for prediction.<br>
<code>directory_path</code> specifies the directory where the model weights and the intermediate denoising and generation results will be saved. <br>
<code>gaussian_noise_std</code> is only applicable if dataset is synthetically corrupted with Gaussian noise of known std. For real datasets, it should be set to ```None```.<br>
<code>noiseModel</code> specifies a noise model for training. If noisy data is generated synthetically using Gaussian noise, set it to None. Else set it to the GMM based noise model (.npz file)  generated from '1-CreateNoiseModel.ipynb'.<br>
<code>batch_size</code> specifies the batch size used for training. The default batch size of $64$ works well for most microscopy datasets.<br>
<code>virtual_batch</code> specifies the virtual batch size used for training. It divides the <code>batch_size</code> into smaller mini-batches of size <code>virtual_batch</code>. Decrease this if batches do not fit in memory.<br>
<code>test_batch_size</code> specifies the batch size used for testing every $1000$ training steps. Decrease this if test batches do not fit in memory, it does not have any consequence on training. It is just for intermediate visual debugging.<br>
<code>lr</code> specifies the learning rate.<br>
<code>max_epochs</code> specifies the total number of training epochs. Around $150-200$ epochs work well generally.<br>
<code>steps_per_epoch</code> specifies how many steps to take per epoch of training. Around $400-500$ steps work well for most datasets.<br>
<code>num_latents</code> specifies the number of stochastic layers. The default setting of $6$ works well for most microscopy datasets for structured noise removal but quite good results can also be obtained with as less as $4$ layers. However, more stochastic layers may improve performance for some datasets at the cost of increased training time.<br>
<code>z_dims</code> specifies the number of bottleneck dimensions (latent space dimensions) at each stochastic layer per pixel. The default setting of $32$ works well for most datasets.<br>
<code>blocks_per_layer</code> specifies how many residual blocks to use per stochastic layer. Usually, setting it to be $4$ or more works well. However, more residual blocks improve performance at the cost of increased training time.<br>
<code>batchnorm</code> specifies if batch normalization is used or not. Turning it to True is recommended.<br>
<code>free_bits</code> specifies the threshold below which KL loss is not optimized for. This prevents the [KL-collapse problem](https://arxiv.org/pdf/1511.06349.pdf%3Futm_campaign%3DRevue%2520newsletter%26utm_medium%3DNewsletter%26utm_source%3Drevue). The default setting of $1.0$ works well for most datasets.<br>
<code>use_uncond_mode_at</code> specified which layers from the network can be selectively deactivated (refer to the paper to understand the motivation). Usually, for structured noise removal, setting it to $[0,1]$ works well. This correponds to selectively deactivating the contributions from the bottom two layers in the hierarchy with the bottom-most layer denoted with $0$ and the second layer from bottom denoted as layer $1$. If no deactivation is desired, just set the parameter ```use_uncond_mode_at=[]```.

**__Note:__** With these settings, training will take approximately $24$ hours on Tesla P100/Titan Xp GPU needing about 6 GB GPU memory. We optimized the code to run on less GPU memory. For faster training, consider increasing ```virtual_batch_size``` but since we have not tested with different settings of ```virtual_batch_size```, we do not yet know how this affects results. To reduce traing time, also consider reducing either ```num_latents``` or ```blocks_per_layer``` to $4$. These settings will bring down the training time to around $12-15$ hours while still giving good results.

In [None]:
model_name = "convallaria"
directory_path = "./Trained_model/" 

# Data-specific
gaussian_noise_std = None
noise_model_params= np.load("data/GMMNoiseModel_convallaria_3_2_calibration.npz")
noiseModel = GaussianMixtureNoiseModel(params = noise_model_params, device = device)

# Training-specific
batch_size=64
virtual_batch = 8
lr=3e-4
max_epochs = 500
steps_per_epoch = 400
test_batch_size=100

# Model-specific
num_latents = 6
z_dims = [32]*int(num_latents)
blocks_per_layer = 5
batchnorm = True
free_bits = 1.0
use_uncond_mode_at=[0,1]

# Train network

In [None]:
train_loader, val_loader, test_loader, data_mean, data_std = boilerplate._make_datamanager(train_images,val_images,
                                                                                           test_images,batch_size,
                                                                                           test_batch_size)

model = LadderVAE(z_dims=z_dims,blocks_per_layer=blocks_per_layer,data_mean=data_mean,data_std=data_std,noiseModel=noiseModel,
                  device=device,batchnorm=batchnorm,free_bits=free_bits,img_shape=img_shape,
                  use_uncond_mode_at=use_uncond_mode_at).cuda()

model.train() # Model set in training mode

training.train_network(model=model,lr=lr,max_epochs=max_epochs,steps_per_epoch=steps_per_epoch,
                           directory_path=directory_path,train_loader=train_loader,val_loader=val_loader,
                           test_loader=test_loader,virtual_batch=virtual_batch,
                           gaussian_noise_std=gaussian_noise_std,model_name=model_name, val_loss_patience=30)

# Plotting losses

In [None]:
trainHist=np.load(directory_path+"model/train_loss.npy")
reconHist=np.load(directory_path+"model/train_reco_loss.npy")
klHist=np.load(directory_path+"model/train_kl_loss.npy")
valHist=np.load(directory_path+"model/val_loss.npy")

In [None]:
plt.figure(figsize=(18, 3))
plt.subplot(1,3,1)
plt.plot(trainHist,label='training')
plt.plot(valHist,label='validation')
plt.xlabel("epochs")
plt.ylabel("loss")
plt.legend()

plt.subplot(1,3,2)
plt.plot(reconHist,label='training')
plt.xlabel("epochs")
plt.ylabel("reconstruction loss")
plt.legend()

plt.subplot(1,3,3)
plt.plot(klHist,label='training')
plt.xlabel("epochs")
plt.ylabel("KL loss")
plt.legend()
plt.show()