# CS231a PSET 3 Problem 3: Supervised Monocular Depth Estimation
In this problem we will train a big deep learning model to do monocular depth estimation.

**Using a GPU**. Make sure to first change your runtime to use a GPU: click Runtime -> Change runtime type -> Hardware Accelerator -> GPU and your Colab instance will automatically be backed by GPU compute.

First, you should upload the files in the 'p3/code' directory as well as the empty 'checkpoints' directory onto a location of your choosing in Drive. Also, you will need to put the [CLEVR-D](https://drive.google.com/file/d/1IM6gWAZSxae6iVUeEz9BeT73rvr8RwWf/view?usp=sharing) dataset in the same folder, by clicking 'Add shortcut to Drive' and moving it to that folder. You could also simply download the zip file and upload it in that folder, if you have the space needed.

Then, run the following:

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

# enter the foldername in your Drive where you have saved the .py files in 'p3/code' needed for this problem
# e.g. 'cs231a/monocular_depth_estimation':
FOLDERNAME = 'cs231a/monocular_depth_estimation'

assert FOLDERNAME is not None, "[!] Enter the foldername."

%ls .
%cd drive/MyDrive
%cd $FOLDERNAME

If all is set up correctly, you should be able to go to drive/MyDrive/{path to your folder} in the file explorer on the right and see the code files and shortcut to the zip file. 

# Checking out the data

Let's start by having a look at what's in our CLEVR-D dataset. For that, finish the marked sections in data.py, and then run the following code:


In [None]:
import data
import torch
import torchvision
import matplotlib.pyplot as plt
from importlib import reload 
reload(data)
plt.rcParams['figure.figsize'] = [8,10]
plt.rcParams['figure.dpi'] = 100 

train_data_loader, test_data_loader = data.get_data_loaders("cs231a-clevr-rgbd.zip",
                                                is_mono=True,
                                                batch_size=16,#small batch size to conserve memory
                                                train_test_split=0.8,
                                                pct_dataset=0.5)#half of dataset to keep things fast
test_data_iter = iter(test_data_loader)
data_sample = next(test_data_iter)
print("\nMean, min and max of RGB image - %.3f %.3f %.3f"%(
                                          torch.mean(data_sample['rgb']),
                                          torch.min(data_sample['rgb']),
                                          torch.max(data_sample['rgb'])))

print("Mean, min and max of depth image - %.3f %.3f %.3f\n"%(
                                          torch.mean(data_sample['depth']),
                                          torch.min(data_sample['depth']),
                                          torch.max(data_sample['depth'])))

rgb_tensor_to_image, depth_tensor_to_image = data.get_tensor_to_image_transforms()
fig, axs = plt.subplots(3, 2)
axs[0,0].set_title('RGB', size='large')
axs[0,1].set_title('Depth', size='large')
for i in range(3):
    axs[i, 0].imshow(rgb_tensor_to_image(data_sample['rgb'][i]))
    axs[i, 1].imshow(depth_tensor_to_image(data_sample['depth'][i]), cmap='gray')
    axs[i, 0].axis('off')
    axs[i, 1].axis('off')

# Training the model

Next, we can go ahead and train the model once you complete the appropriate parts of losses.py and training.py. 

Before we run training, let's visualize the training progress using [Tensorboard](https://www.tensorflow.org/tensorboard). When you run the following, you should see the scalars tab showing the loss gradually going down once training starts. If you go to the 'images' tab, you can also be able to observe the 'Ours' images getting better over time, with the 'Diff' images showing less disparity from the ground truth over time. Hit the refresh icon on the top right once you get training going in the next bit, and you should be able to see stuff show up:

In [None]:
!pip install tensorboardX
%load_ext tensorboard
%tensorboard --logdir "/content/drive/MyDrive/$FOLDERNAME/runs"

Let's first initialize the model to pass into the training function and confirm that given an rgb image it outputs a depth image.

In [None]:
import model
from utils import colorize

dense_depth_model = model.DenseDepth(encoder_pretrained=False)
dense_depth_model = dense_depth_model.to('cuda')
sample_image = next(test_data_iter)
with torch.no_grad():
    model_out = dense_depth_model(sample_image['rgb'].to('cuda')) 
fig, axs = plt.subplots(1, 2)
axs[0].imshow(rgb_tensor_to_image(sample_image['rgb'][0]))
axs[1].imshow(depth_tensor_to_image(model_out[0]),cmap='gray')
axs[0].axis('off')
axs[1].axis('off')
del sample_image
del model_out       

In initializing the model we could actually use feature transfer from prior training on [ImageNet](https://en.wikipedia.org/wiki/ImageNet) -- which is a common practice -- by setting encoder_pretrained=True. However, since our data is quite different from that of ImageNet that would actually hinder learning rather than improve it. You can try it if you'd like and see how it impacts the final results!

We can also make sure the model is correctly loaded onto the GPU and checks its size. Under Memory-Usage, you can see that it takes up a few gigabytes of this GPU's memory, making it a fairly large model:



In [None]:
!nvidia-smi 

Finally, once you finish up the TODO parts in losses.py and training.py you can start training!

In [None]:
import training 
import torch 
with torch.no_grad():
    torch.cuda.empty_cache()
training = reload(training)#reload when debugging to have updated code
training.train(5, train_data_loader, test_data_loader, lr=0.0001, model=dense_depth_model)

Yay! If you implemented everything correctly, the loss went down and you saw the model work well. There should be files in the checkpoints directory now, which correspond to model weights throughout different points in training. We won't be using these, but it's standard in deep learning to generate these to later load for running the model for experiments or for futher fine-tuning. We can now again take a look at its output for a given image and see what it does on test set inputs:

In [None]:
#we'll iterate to pick a nice set of images
for i in range(5): # feel free to change this to see other outputs
    sample_image = next(test_data_iter)
with torch.no_grad():
    model_out = dense_depth_model(sample_image['rgb'].to('cuda')) 
fig, axs = plt.subplots(3, 3)
axs[0,0].set_title('RGB', size='large')
axs[0,1].set_title('Predicted Depth', size='large')
axs[0,2].set_title('True Depth', size='large')
depth_inverse_normalize = data.get_inverse_transforms()[1]
for i in range(3):
    axs[i, 0].imshow(rgb_tensor_to_image(sample_image['rgb'][i]))
    axs[i, 1].imshow(depth_inverse_normalize(model_out[i]).data.cpu().numpy()[0], cmap='gray')
    axs[i, 2].imshow(depth_tensor_to_image(sample_image['depth'][i]), cmap='gray')
    axs[i, 0].axis('off')
    axs[i, 1].axis('off')
    axs[i, 2].axis('off')

We can see that the model is largely doing the right thing, although if you look closely it's also evident it is not able to capture some of the smaller nuances of depth disparities between objects.

# Conclusion

That's it! You have now trained a model for monocular depth estimation. As noted in the PDF, you now just need to download this notebook to submit alongside your python files.

Credits: this assignment was adapted from [this](https://github.com/pranjaldatta/DenseDepth-Pytorch) code base.


# (Extra credit) Representation learning an autoencoder

As we have done in the problem 2, representation learning can be used to (possibly) speed up and improve learning. An approach we could try for this problem is to use an [autoencoder](https://www.jeremyjordan.me/autoencoders/). Modifying the code to do this is actually not too challenging, since we just need to modify the dataset to output greyscale version of the RGB images and train a model to go from RGB images to greyscale images as before; the DenseNet model that acts as an encoder already outputs a 'bottleneck' representation, and so we'd just want to train it to do the RGB->greyscale conversion and then use the resulting features when starting to the rgb->depth training. The codebase is not currently set up to do this, but as per the PDF you can feel free to tweak it and see what you get!