# CS231a PSET 3 Problem 3: Monocular Depth Estimation and Representation Learning

In this problem we will train a deep learning model to do monocular depth estimation.

**Using a GPU**. Make sure to first change your runtime to use a GPU: click Runtime -> Change runtime type -> Hardware Accelerator -> GPU and your Colab instance will automatically be backed by GPU compute.

First, you should upload the files in 'code/p3' directory onto a location of your choosing in Drive and run the following to have access to them. Now, to get the data, run the following:

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

# Enter the foldername in your Drive where you have saved the unzipped
# '.py' files from the p3 folder and the "cs231a-clevr-rgbd.zip" file
# e.g. 'cs231a/pset3/p3'
FOLDERNAME = 'cs231a/pset3/p3'

assert FOLDERNAME is not None, "[!] Enter the foldername."

%ls .
%cd drive/MyDrive
%cd $FOLDERNAME

In [None]:
!pip install gdown
!gdown 1IM6gWAZSxae6iVUeEz9BeT73rvr8RwWf

Collecting gdown
  Obtaining dependency information for gdown from https://files.pythonhosted.org/packages/54/70/e07c381e6488a77094f04c85c9caf1c8008cdc30778f7019bc52e5285ef0/gdown-5.2.0-py3-none-any.whl.metadata
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Downloading gdown-5.2.0-py3-none-any.whl (18 kB)
Installing collected packages: gdown
Successfully installed gdown-5.2.0
Downloading...
From (original): https://drive.google.com/uc?id=1IM6gWAZSxae6iVUeEz9BeT73rvr8RwWf
From (redirected): https://drive.google.com/uc?id=1IM6gWAZSxae6iVUeEz9BeT73rvr8RwWf&confirm=t&uuid=5c8a49e6-e0f1-4412-83d9-5398ee2fac45
To: /Users/adamsun/Downloads/ps3_code/p3/cs231a-clevr-rgbd.zip
 16%|██████▏                                | 169M/1.07G [00:12<00:39, 22.6MB/s]

If all is set up correctly, you should now get the 1G dataset stored in this Colaborotary runtime. Note that you'll need to redownload this data whenever you reconnect to a fresh runtime!

# Checking out the data

Let's start by having a look at what's in our CLEVR-D dataset. For that, finish the marked sections in data.py, and then run the following code:


In [None]:
import data
import torch
import torchvision
import matplotlib.pyplot as plt
from importlib import reload 
reload(data)
plt.rcParams['figure.figsize'] = [8,10]
plt.rcParams['figure.dpi'] = 100 

train_data_loader, test_data_loader = data.get_data_loaders("cs231a-clevr-rgbd.zip",
                                                is_mono=True,
                                                batch_size=16,
                                                train_test_split=0.8,
                                                pct_dataset=0.2)#0.2 of dataset to keep things fast
test_data_iter = iter(test_data_loader)
data_sample = next(test_data_iter)
print("\nMean, min and max of RGB image - %.3f %.3f %.3f"%(
                                          torch.mean(data_sample['rgb']),
                                          torch.min(data_sample['rgb']),
                                          torch.max(data_sample['rgb'])))

print("Mean, min and max of depth image - %.3f %.3f %.3f\n"%(
                                          torch.mean(data_sample['depth']),
                                          torch.min(data_sample['depth']),
                                          torch.max(data_sample['depth'])))

rgb_tensor_to_image, depth_tensor_to_image = data.get_tensor_to_image_transforms()
fig, axs = plt.subplots(3, 2)
axs[0,0].set_title('RGB', size='large')
axs[0,1].set_title('Depth', size='large')
for i in range(3):
    axs[i, 0].imshow(rgb_tensor_to_image(data_sample['rgb'][i]))
    axs[i, 1].imshow(depth_tensor_to_image(data_sample['depth'][i]), cmap='gray')
    axs[i, 0].axis('off')
    axs[i, 1].axis('off')

# Training the model

Next, we can go ahead and train the model once you complete the appropriate parts of losses.py and training.py. 

Before we run training, let's visualize the training progress using [Tensorboard](https://www.tensorflow.org/tensorboard). When you run the following, you should see the scalars tab showing the loss gradually going down once training starts. If you go to the 'images' tab, you can also be able to observe the 'Ours' images getting better over time, with the 'Diff' images showing less disparity from the ground truth over time. Hit the refresh icon on the top right once you get training going in the next bit, and you should be able to see stuff show up:

In [None]:
!pip install tensorboardX
%load_ext tensorboard
%rm -rf "/content/drive/MyDrive/$FOLDERNAME/runs/*"
%tensorboard --logdir "/content/drive/MyDrive/$FOLDERNAME/runs"

Let's first initialize the model to pass into the training function and confirm that given an rgb image it outputs a depth image.

In [None]:
import model
from utils import colorize
# if you get a cuda out of memory error here, you need to restart the runtime 
# and re-run everything
dense_depth_model = model.DenseDepth()
dense_depth_model = dense_depth_model.to('cuda')
sample_image = next(test_data_iter)
with torch.no_grad():
    model_out = dense_depth_model(sample_image['rgb'].to('cuda')) 
fig, axs = plt.subplots(1, 2)
axs[0].imshow(rgb_tensor_to_image(sample_image['rgb'][0]))
axs[1].imshow(depth_tensor_to_image(model_out[0]),cmap='gray')
axs[0].axis('off')
axs[1].axis('off')
del sample_image
del model_out       

We can also make sure the model is correctly loaded onto the GPU and checks its size. Under Memory-Usage, you can see that it takes up approximately 3.5G of this GPU's memory, making it a modestly large model:



In [None]:
!nvidia-smi 

With that done, let's get training!

In [None]:
import training 
import torch 
# if you get a cuda out of memory error here, you need to restart the runtime 
# and re-run everything
with torch.no_grad():
    torch.cuda.empty_cache()
training = reload(training)#reload when debugging to have updated code
training.train(5, train_data_loader, test_data_loader, lr=0.0001, model=dense_depth_model)

Yay! If you implemented everything correctly, the loss went down and you saw the model work well. We can now again take a look at its output for a given image and see what it does on test set inputs:

In [None]:
#we'll iterate to pick a nice set of images
for i in range(3): # feel free to change this to see other outputs
    sample_image = next(test_data_iter)
with torch.no_grad():
    model_out = dense_depth_model(sample_image['rgb'].to('cuda')) 
fig, axs = plt.subplots(3, 3)
axs[0,0].set_title('RGB', size='large')
axs[0,1].set_title('Predicted Depth', size='large')
axs[0,2].set_title('True Depth', size='large')
depth_inverse_normalize = data.get_inverse_transforms()[1]
for i in range(3):
    axs[i, 0].imshow(rgb_tensor_to_image(sample_image['rgb'][i]))
    axs[i, 1].imshow(depth_inverse_normalize(model_out[i]).data.cpu().numpy()[0], cmap='gray')
    axs[i, 2].imshow(depth_tensor_to_image(sample_image['depth'][i]), cmap='gray')
    axs[i, 0].axis('off')
    axs[i, 1].axis('off')
    axs[i, 2].axis('off')

We can see that the model is sort of doing the right thing, but because we only trained it on a small subset of the data and for 5 epochs the result is rather blurry. Feel free to try increasing the number of epochs and pct_dataset to see if it improves!

# Conclusion

That's it, you are done! 

Credits: this assignment was adapted from [this](https://github.com/pranjaldatta/DenseDepth-Pytorch) code base.
