# CS231a PSET 3 Problem 4: Monocular Depth Estimation

Building on the idea of learning useful representations for downstream tasks we saw in the last problem, in this problem you will see how this can be done for the task of monocular depth estimation.

**Using a GPU**. Make sure to first change your runtime to use a GPU: click Runtime -> Change runtime type -> Hardware Accelerator -> GPU and your Colab instance will automatically be backed by GPU compute.

Now, let's download the [NYU Depth dataset](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html) we'll be working with. First, you should upload the 'problem4' directory as well as the 'checkpoints' and 'examples' directories onto a location of your choosing in Drive and run the following to have access to the code within it:

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

# enter the foldername in your Drive where you have saved the unzipped
# 'problem4' folder containing the '.py' files needed for this problem
# e.g. '/content/drive/MyDrive/cs231a/monocular_depth_estimation'
FOLDERNAME = None

assert FOLDERNAME is not None, "[!] Enter the foldername."

%cd drive/My\ Drive
%cp -r $FOLDERNAME/problem4/download_data.py ../../
%cd ../../

!python download_data.py

%cd drive/My\ Drive
%cd $FOLDERNAME

If all is set up correctly, you should now get the 4.4G dataset stored in this Colaborotary runtime. Note that you'll need to redownload this data whenever you reconnect to a fresh runtime!

# Checking out the data

Let's start by having a look at what's in the NYU dataset. For that, finish the marked sections in data.py, and then run the following code:


In [None]:
from problem4.data import get_data_loaders 
import matplotlib.pyplot as plt
import torchvision
import gc

plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 

tensorToImage = torchvision.transforms.ToPILImage()
trainloader, testloader = get_data_loaders("/content/data/nyu_depth.zip", 
                                           batch_size=4)
dataiter = iter(trainloader)
fig, axs = plt.subplots(3, 2)
for i in range(3):
    data = next(dataiter)
    axs[i, 0].imshow(tensorToImage(data['image'][0]))
    axs[i, 1].imshow(tensorToImage(data['depth'][0]), cmap='gray')
    axs[i, 0].axis('off')
    axs[i, 1].axis('off')

# Training the model

Next, we can go ahead and train the model once you complete the appropriate parts of losses.py and training.py. Let's just train for one epoch first (this will take around 3 hours!):

Before we run training, let's visualize the training progress using [Tensorboard](https://www.tensorflow.org/tensorboard). When you run the following, you should see the scalars tab showing the loss gradually going down once training starts. If you go to the 'images' tab, you can also be able to observe the 'Ours' images getting better over time, with the 'Diff' images showing less disparity from the ground truth over time:

In [None]:
!pip install tensorboardX
%load_ext tensorboard
%tensorboard --logdir runs/

In [None]:
import problem4.training 
from importlib import reload  
problem4.training = reload(problem4.training)#reload when debugging to have updated code
problem4.training.train(1, trainloader, testloader, lr=0.0001, pretrained=True)

# Testing the trained model

Now that the model has trained (for only one epoch!), we can take a look at how good it is at predicting depth given RGB images. Run the following:

In [None]:
import problem4.testing
problem4.testing = reload(problem4.testing)

problem4.testing.test('checkpoints/ckpt_0_pretrained.pth')

# Training without feature transfer

Now let's see what happens if we train without transferring over features. We will once again load up Tensorboard and then start training, and can observe the difference in the loss function and image quality between the two ways of training:

In [None]:
%tensorboard --logdir runs/

In [None]:
problem4.training = reload(problem4.training)#reload when debugging to have updated code
problem4.training.train(1, trainloader, testloader, lr=0.0001, pretrained=False)

In [None]:
problem4.testing.test('checkpoints/ckpt_0_not_pretrained.pth')

# Conclusion

That's it! You have now trained a model for monocular depth estimation, and saw how transfer learning of learned features can result in better convergence compared to learning from scratch. As noted in the PDF, you now just need to download this notebook to submit alongside your python files.

Credits: this assignment was adapted from [this](https://github.com/pranjaldatta/DenseDepth-Pytorch) code base.
