<a href="https://colab.research.google.com/github/gtbook/robotics/blob/main/S76_drone_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install -q -U gtbook

Note: you may need to restart the kernel to use updated packages.


In [2]:
import plotly

import torch
import torch.nn as nn
import torch.optim as optim

DEVICE = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")


# Neural Radiance Fields for Drones

<img src="Figures7/S76-Autonomous_camera_drone-07.jpg" alt="Splash image with a drone-like robot, steampunk style" width="40%" align=center style="vertical-align:middle;margin:10px 0px">

## What is a NeRF?

A **neural radiance field** of "NeRF" is a neural representation of a 3D scene, and hence could be very useful for drones to help with motion planning, obstacle avoidance, or even simply simulation of drone flights. NeRFs were introduced in the field of computer vision in 2020 by a team of researchers from Berkeley and Google, and have since seen an explosion of interest. The reasons are two-fold:

- their proposed scheme of learning a neural representation of the 3D scene was very simple
- the resulting NeRFs were capable of generating very realistic "renderings" of the learned scene

In a nutshell, given a large set of images taken of a 3D scene, the original NeRF trained a large (but simple) neural network to predict the value of every pixel in every image. By doing so, the neural network could then also *generate* new images that were not in the original training set. What is more, the neural network can also be used to predict the 3D structure of the underlying scene, making it possible to do much more than simply view synthesis.

The original NeRF paper (Mildenhall et al., ECCV 2020) was rather slow, because of the large neural network used. Since then, however, faster *voxel-based* versions have been developed. In this chapter, we first introduce a 1D version of this basic scheme, then move on to 3D (voxels), and finally show how it can be used to create a neural *radiance* field.

## Differentiable interpolation in 1D

A neural radiance field will predict density and color in 3D, but we can start off simpler by just interpolating functions in 1D. The key is to create a *differentiable* interpolation scheme, that we can then *train* using samples from the function we want to interpolate.  The class LineGrid below does just that: it is initialized with two parameters:
 
 - `size` is how many cells the 1D grid has
 - `features` is the dimensionality of the function we want to interpolate
 
Being able to learn multi-dimensional functions is crucial, as in a NeRF we will have to approximate RGB colors.

In pytorch you also have to define a `forward` method when defining a module, after which you can "call" the module. In our case, we will simply interpolate the grid values defined at the cell boundaries, for any value inside the grid. We can give any value $x\in[0,size]$, or even a tensor of them. 

In [3]:
class LineGrid(nn.Module):
    def __init__(self, size, features=1):
        super(LineGrid, self).__init__()
        self.grid = nn.Parameter(nn.init.normal_(torch.empty(size + 1, features)))

    def forward(self, x):
        X = torch.floor(x).long()
        a = x - X # blending weights (same size as x)

        # Directly use self.grid for interpolation
        return self.grid[X] * (1.0 - a).unsqueeze(-1) + self.grid[X+1] * a.unsqueeze(-1)

Here is an example of how to initialize a line grid and call the forward method:

In [4]:
grid_module = LineGrid(size=5, features=2)

x = torch.Tensor([1.5, 2.7, 3.6])
print("Interpolated Output:", grid_module(x))

Interpolated Output: tensor([[ 0.4488,  0.1907],
        [ 0.4798, -0.9854],
        [ 0.6225,  0.4794]], grad_fn=<AddBackward0>)


In the example above, the shape of the output is $3\times 2$ because we asked to interpolate a 2D function (`features` is 2) at 3 different locations (as `x.shape` is 3). The output looks rather random, however, because the grid was initialized with random values in the constructor. 

To "learn" a function we need to provide training data, and minimize a loss function. As an example, maybe we can learn a sine and cosine function at the same time? Let us create some training data by creating 500 samples of these two functions:

In [5]:
grid_size = 20 # we use a grid size of 20, allowing for x values between 0 and 20
num_samples = 500 # we use 500 samples to train our model
x_samples = torch.rand((num_samples,)) * grid_size
y_samples = torch.stack([torch.sin(x_samples * 2 * torch.pi / grid_size) + 0.1 * torch.randn((num_samples, )), 
                         torch.cos(x_samples * 2 * torch.pi / grid_size) + 0.1 * torch.randn((num_samples, ))], dim=1)
print(y_samples.shape)

torch.Size([500, 2])


The training code below is a standard way of training a neural network using PyTorch, which we will abuse here to optimize instead for the parameters of a LineGrid. That is possible because all the operations inside our LineGrid class are differentiable, so stochastic gradient descent (SGD) will just work. The loss we will minimize is the **Mean-Squared Error** loss function or MSE, which minimized the squared difference between the predicted values and the training data values. This is the standard loss function for so-called *regression* problems, where we are trying to optimize a continuous function. 

Inside the training loop below, you'll find the typical sequence of operations: zeroing gradients, performing a forward pass to get predictions, computing the loss, and doing a backward pass to update the model's parameters. Try to understand the code, as we will use it as is later to learn 3D neural radiance fields, and in fact this same training loop is at the core of most deep learning architectures. Now, let's take a closer look at the code itself, which is extensively documented for clarity:

In [6]:
def train(model, x_samples, y_samples, learning_rate=0.3, num_epochs=601, checkpoint_freq=100):
    # Initialize Stochastic Gradient Descent optimizer
    optimizer = optim.SGD(model.parameters(), lr=learning_rate)
    
    # Initialize the built-in Mean-Squared Error loss function
    mse = nn.MSELoss()

    # Loop over the dataset multiple times (each loop is an epoch)
    for epoch in range(num_epochs):
        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass: Compute predicted y by passing x_samples through the model
        output = model(x_samples)

        # Compute loss using built-in MSE loss function
        loss = mse(output, y_samples)

        # Backward pass: Compute gradient of the loss with respect to model parameters
        loss.backward()

        # Update model parameters using optimizer
        optimizer.step()

        # Print loss at specified checkpoint frequencies
        if epoch % checkpoint_freq == 0:
            print(f'Loss at epoch {epoch}: {loss.item()}')


Using this code is rather trivial: we initialize a grid with random values, and call `train`:

In [7]:
# Initialize model
model = LineGrid(size=grid_size, features=2)  # features=2 as we are regressing both sin and cos

# Run the training loop
train(model, x_samples, y_samples)

Loss at epoch 0: 1.3888176679611206
Loss at epoch 100: 0.14943578839302063
Loss at epoch 200: 0.03693899139761925
Loss at epoch 300: 0.017958099022507668
Loss at epoch 400: 0.012759515084326267
Loss at epoch 500: 0.010972117073833942
Loss at epoch 600: 0.010292686522006989


We can then perform inference and plot the result against the training data, and we see that we get decent approximations of sin and cos, even with noisy training data:

In [8]:
y_pred = model(x_samples).detach().numpy()

fig = plotly.graph_objects.Figure()
fig.add_scatter(x=x_samples, y=y_samples[:, 0], mode='markers', name='sin')
fig.add_scatter(x=x_samples, y=y_samples[:, 1], mode='markers', name='cos')
fig.add_scatter(x=x_samples, y=y_pred[:, 0], mode='markers', name='predicted sin')
fig.add_scatter(x=x_samples, y=y_pred[:, 1], mode='markers', name='predicted cos')
fig.show()


## A Differentiable Voxel Grid

The scheme above can be easily extended to 3D voxel grids, although the interpolation function has to be changed from simple linear interpolation over an interval, to **trilinear interpolation** on voxels. The code below implements this, and it really is not that complicated: the values on the *eight* corners of each voxel are combined using three blending weights, depending on where the queried point lies within that voxel:

In [9]:
def interpolate(v0, v1, alpha):
    # Interpolate between v0 and v1 using alpha, using unsqueeze to properly handle batches
    return v0 * (1 - alpha.unsqueeze(-1)) + v1 * alpha.unsqueeze(-1)

class VoxelGrid(nn.Module):
    def __init__(self, shape, features=1):
        super(VoxelGrid, self).__init__()
        self.grid = nn.Parameter(nn.init.normal_(torch.empty(*shape, features)))


    def forward(self, p):
        x, y, z = p[..., 0], p[..., 1], p[..., 2]
        X, Y, Z = torch.floor(x).long(), torch.floor(y).long(), torch.floor(z).long()
        a, b, c = x - X, y - Y, z - Z # blending weights along each axis

        c00 = interpolate(self.grid[Z, Y, X, :], self.grid[Z, Y, X + 1, :], a)
        c01 = interpolate(self.grid[Z, Y + 1, X, :], self.grid[Z, Y + 1, X + 1, :], a)
        c10 = interpolate(self.grid[Z + 1, Y, X, :], self.grid[Z + 1, Y, X + 1, :], a)
        c11 = interpolate(self.grid[Z + 1, Y + 1, X + 1, :], self.grid[Z + 1, Y + 1, X, :], a)

        c0 = interpolate(c00, c01, b)
        c1 = interpolate(c10, c11, b)
        
        return interpolate(c0, c1, c).squeeze(-1)

The VoxelGrid effectively defines a parameterized function in 3D. when we query it, we need to provide 3D coordinates. For example, the code below initializes a VoxelGrid with random values, and then evaluates the a scalar function at a 3D point:

In [10]:
voxel_grid_module = VoxelGrid(shape = (6, 6, 6), features=1)

point = torch.Tensor([1.5, 2.7, 3.4])
output = voxel_grid_module(point)
print("Interpolated Output:", output)

Interpolated Output: tensor(0.9033, grad_fn=<SqueezeBackward1>)


However, the code is much more powerful than this. As an example, below we create a grid with a 4D function, and evaluate it at a 2x2 batch of 3d points:

In [11]:
voxel_grid_module = VoxelGrid(shape = (6, 6, 6), features=4)

points = torch.Tensor([[[1.5, 2.7, 3.4], [2.3, 4.6, 1.1]], [[2.3, 4.6, 1.1], [2.3, 4.6, 1.1]]])
output = voxel_grid_module(points)
print("Interpolated Output:", output.shape)
print("Interpolated Output:", output)

Interpolated Output: torch.Size([2, 2, 4])
Interpolated Output: tensor([[[-0.5342, -0.0963, -0.1275, -0.4976],
         [-0.3181,  0.1425,  0.1531,  0.1384]],

        [[-0.3181,  0.1425,  0.1531,  0.1384],
         [-0.3181,  0.1425,  0.1531,  0.1384]]], grad_fn=<SqueezeBackward1>)


Being able to handle large batches of points is crucial when training with stochastic gradient descent, and especially when training a NeRF, which we finally get to in the section below.

## Differentiable Rendering