# Neural Radiance Fields

#### What Problem are we trying to solve?
- View Synthesis generation of images from new and continuous view points 

### What are Neural Randiance Fields?

- 3d neural network encoding for continous viewpoints, we want to over fit a neural network to a single point- 
- The weights of the Neural Networks represent the Scene


#### Representation of a scene as a continous 5D function (NN)

***Input:***
- $$(x,y,z,\theta, \phi)$$
    - $$(x,y,z) - Coordinates$$
    - $$(\theta, \phi) - viewing\;direction$$

***Output:***
- $$(r,g,b,\sigma)$$
    - $$(r,g,b) - color\;channels$$
    - $$\sigma - density$$
    
#### Techniques used
- Volume rendering( to accumulate samples from a scene representation along rays)

1) march camera rays through the scene to generate a sampled set of 3D points
2) Use those points and their corresponding 2D viewing directions as input to the newral network to produce an output set of colors and densities
3) use classical volume rendering techniques to naturally accumulate those colors and densities into a 2d image.

- naturally differentiable, can use gradient descent to optimize model by minimizing the error between each obserbed image and the corresponding views rendered from our representation
- Minimizing this error across multiple views encourages the network to predict a coherent model of the scene by assigning high volume densities and accurate colors to the locations that contain the true underlying scene content.

#### Training the Neural network

- optimizing a neural radiance field representation for a complex scene does not converge to a sufficiently highresolution representation and is inefficient in the required number of samples per camera ray.

- transforming input 5D coordinates with a **positional encoding** that enables the MLP to represent higher frequency functions, and we propose a hierarchical sampling procedure to reduce the number of queries required to adequately sample this high-frequency scene representation

#### Advantages
- can represent complex real-world geometry and appearance and are well suited for gradient-based optimization using projected images
- overcomes the prohibitive storage costs of discretized voxel grids when modeling complex scenes at high-resolutions.

##### 3 main contributions
- An approach for representing continuous scenes with complex geometry and materials as 5D neural radiance fields, parameterized as basic MLP networks.
- A differentiable rendering procedure based on classical volume rendering techniques, which we use to optimize these representations from standard RGB images. This includes a hierarchical sampling strategy to allocate the MLP’s capacity towards space with visible scene content.
- A positional encoding to map each input 5D coordinate into a higher dimensional space, which enables us to successfully optimize neural radiance fields to represent high-frequency scene content.

## Neural Scene Representation

### Neural 3D shape representation
- the implicit representation of continuous 3D shapes as level sets by optimizing deep networks that map xyz coordinates to signed distance functions or occupancy fields
    - However, these models are limited by their requirement of access to ground truth 3D geometry typically obtained from synthetic 3D shape datasets such as ShapeNet
    
- relaxed models
    - 3D occupancy fields
    - 
    
### View synthesis and image-based rendering

- simple light field sample interpolation techniques (Dense samples required)
- One popular class of approaches uses mesh-based representations of scenes with either diffuse or view-dependent appearance (sparser view samples)
    - However, gradient-based mesh optimization based on image reprojection is often difficult, likely because of local minima or poor conditioning of the loss landscape. Furthermore, this strategy requires a template mesh with fixed topology to be provided as an initialization before optimization, which is typically unavailable for unconstrained real-world scenes.
#### Volumetric Representation
- input rgb image,Volumetric approaches are able to realistically represent complex shapes and materials, are well-suited for gradient-based optimization, and tend to produce less visually distracting artifacts than mesh-based methods.
- voxel coloring
- 


## Import Statements

In [None]:
import torch
from torch.utils.data import DataLoader
import numpy as np
from tqdm.notebook import tqdm, trange
import torch.nn as nn
import matplotlib.pyplot as plt
import warnings
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)
warnings.filterwarnings("ignore")

## Defining the Neural network structure

In [None]:
class NerfModel(nn.Module):
    def __init__(self, embedding_dim_pos=10, embedding_dim_direction=4, hidden_dim=128):
        super(NerfModel, self).__init__()

        self.block1 = nn.Sequential(nn.Linear(embedding_dim_pos * 6 + 3, hidden_dim), nn.ReLU(),
                                    nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
                                    nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
                                    nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), )

        self.block2 = nn.Sequential(nn.Linear(embedding_dim_pos * 6 + hidden_dim + 3, hidden_dim), nn.ReLU(),
                                    nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
                                    nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
                                    nn.Linear(hidden_dim, hidden_dim + 1), )

        self.block3 = nn.Sequential(nn.Linear(embedding_dim_direction * 6 + hidden_dim + 3, hidden_dim // 2), nn.ReLU(), )
        self.block4 = nn.Sequential(nn.Linear(hidden_dim // 2, 3), nn.Sigmoid(), )

        self.embedding_dim_pos = embedding_dim_pos
        self.embedding_dim_direction = embedding_dim_direction
        self.relu = nn.ReLU()

    @staticmethod
    def positional_encoding(x, L):
        out = [x]
        for j in range(L):
            out.append(torch.sin(2 ** j * x))
            out.append(torch.cos(2 ** j * x))
        return torch.cat(out, dim=1)

    def forward(self, o, d):
        emb_x = self.positional_encoding(o, self.embedding_dim_pos)
        emb_d = self.positional_encoding(d, self.embedding_dim_direction)
        h = self.block1(emb_x)
        tmp = self.block2(torch.cat((h, emb_x), dim=1))
        h, sigma = tmp[:, :-1], self.relu(tmp[:, -1])
        h = self.block3(torch.cat((h, emb_d), dim=1))
        c = self.block4(h)
        return c, sigma

## Functions for sampling Rays

In [None]:
def compute_accumulated_transmittance(alphas):
    at = torch.cumprod(alphas,1)
    
    return torch.cat((torch.ones((at.shape[0],1), device=alphas.device), at[:,:-1]), dim=1)

def render_rays(nerf_model, ray_origins, ray_directions, hn=0, hf=0.5, nb_bins=192):
    device = ray_origins.device
    t = torch.linspace(hn, hf, nb_bins, device = device).expand(ray_origins.shape[0], nb_bins)
    
    mid = (t[:,:-1] + t[:,1:]) / 2
    lower = torch.cat((t[:,:1], mid),-1)
    upper = torch.cat((mid,t[:, -1:]), -1)
    u = torch.rand(t.shape, device = device)
    t = lower + (upper - lower) * u
    
    delta = torch.cat((t[:,1:] - t[:,:-1], torch.tensor([1e10], device=device).expand(ray_origins.shape[0],1)), -1)
    
    x = ray_origins.unsqueeze(1) + t.unsqueeze(2) * ray_directions.unsqueeze(1)
    ray_directions = ray_directions.expand(nb_bins, ray_directions.shape[0], 3).transpose(0,1)
    
    colors, sigma = nerf_model(x.reshape(-1,3), ray_directions.reshape(-1,3))
    colors = colors.reshape(x.shape)
    sigma = sigma.reshape(x.shape[:-1])
    
    alpha = 1 - torch.exp(-sigma*delta)
    weights = compute_accumulated_transmittance(1-alpha).unsqueeze(2) * alpha.unsqueeze(2)
    c = (weights*colors).sum(dim=1)
    weight_sum = weights.sum(-1).sum(-1)
    
    return c + 1 - weight_sum.unsqueeze(-1)
    
    

## Function for training and sampling the model

In [None]:
def train(nerf_model, optimizer, scheduler, data_loader, device='cpu', hn=0, hf=1, nb_epochs=int(1e5), nb_bins=192, H=400, W=400,render_each_epoch=False):
    training_loss = []
    for _ in trange(nb_epochs):
        for batch in tqdm(data_loader):
            
            ray_origins = batch[:,:3].to(device)
            ray_directions = batch[:,3:6].to(device)
            ground_truth_px_values = batch[:,6:].to(device)
            
            regenerated_px_values = render_rays(nerf_model, ray_origins, ray_directions, hn=hn, hf=hf, nb_bins=nb_bins)
            loss = ((ground_truth_px_values - regenerated_px_values)**2).sum()
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            training_loss.append(loss.item())
        scheduler.step()
        
        if render_each_epoch:
            print('Saving New Viewpoints')
            for img_index in range(200):
                test(hn, hf, testing_dataset, img_index=img_index, nb_bins =nb_bins, H=H, W=W,device=device)
    
    return training_loss

@torch.no_grad()
def test(hn, hf, dataset, chunk_size=10, img_index=0, nb_bins=192,H=400, W=400, device = 'cpu'):
    ray_origins = dataset[img_index*H*W: (img_index+1)*H*W, :3]
    ray_directions = dataset[img_index *H*W: (img_index+1)*H*W, 3:6]
    
    
    data = []
    
    for i in range(int(np.ceil(H/chunk_size))):
        ray_origins_ = ray_origins[i*W*chunk_size: (i+1)*W*chunk_size].to(device)
        ray_directions_ = ray_directions[i*W*chunk_size: (i+1)*W*chunk_size].to(device)
        regenerated_px_values = render_rays(model, ray_origins_, ray_directions_, hn=hn, hf=hf, nb_bins=nb_bins)
        data.append(regenerated_px_values)
        
    img = torch.cat(data).data.cpu().numpy().reshape(H,W,3)
    
    plt.figure();
    plt.imshow(img);
    plt.savefig(f'novel_views/img_{img_index}.png', bbox_inches='tight');
    plt.close();

## Load and Train the model

In [None]:
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

print(device)

training_dataset = torch.from_numpy(np.load('./data/training_data.pkl', allow_pickle=True))
testing_dataset = torch.from_numpy(np.load('./data/testing_data.pkl', allow_pickle=True))

model = NerfModel(hidden_dim=256).to(device)

model_optimizer = torch.optim.Adam(model.parameters(), lr=5e-4)
scheduler = torch.optim.lr_scheduler.MultiStepLR(model_optimizer, milestones=[2,4,8], gamma=0.5)

data_loader = DataLoader(training_dataset, batch_size=1024, shuffle=True)
train(model, model_optimizer, scheduler, data_loader, nb_epochs=16, device=device, hn=2, hf=6, nb_bins=192, H=400, W=400, render_each_epoch=False)


## Sample the model to generate a new View Point

In [None]:
for img_index in range(200):
    test(hn=2, hf=6, dataset=testing_dataset, chunk_size=10, img_index=img_index, nb_bins=192, H=400, W=400, device=device)