# Equivariant Neural Rendering

## Imports
- Import git modules & stuff

In [None]:
import random, os, sys
import matplotlib.pyplot as plt
from matplotlib import image as mpimg
%matplotlib inline
import imageio
import torch
import torchvision
from torchvision.transforms import ToTensor


sys.path.append('/content/equiv-neural-rendering/')
from models.neural_renderer import *


### Loading and plotting the original image

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


def plot_img_tensor(img, nrow=4):
    """Helper function to plot image tensors.
    
    Args:
        img (torch.Tensor): Image or batch of images of shape 
            (batch_size, channels, height, width).
    """
    img_grid = torchvision.utils.make_grid(img, nrow=nrow)
    plt.imshow(img_grid.cpu().numpy().transpose(1, 2, 0))

# Load trained chairs model
model = load_model('/content/equiv-neural-rendering/trained-models/chairs.pt').to(device)

# You can also try loading other examples (e.g. 'chair1.png')
img = imageio.imread('/content/equiv-neural-rendering/imgs/example-data/chair4.png')
# Visualize image
plt.imshow(img)

### Rendering the scene reprensentation without rotation and translation

In [None]:
# Convert image to tensor and add batch dimension
img_source = ToTensor()(img)
img_source = img_source.unsqueeze(0).to(device)

# Infer scene representation
scene = model.inverse_render(img_source)

# We can render the scene representation without rotating it
rendered = model.render(scene)

plot_img_tensor(rendered.detach())

org = rendered.detach().clone()

### Rotating and translating the scene reprensentation and rendering a novel view

In [None]:
# As a rotation matrix can feel a little abstract, we can also reason in terms of 
# camera azimuth and elevation. The initial coordinate at which the source image
# is observed is given by the following azimuth and elevation. Note that these
# are not necessary to generate novel views (as shown above), we just use them 
# for convenience to generate rotation matrices
azimuth_source = torch.Tensor([0.]).to(device)
elevation_source = torch.Tensor([0.]).to(device)
translations_source = torch.Tensor([0., 0., 0.]).to(device)

# You can set these to any value you like!
# Positive (negative) values correspond to moving camera to the right (left)
azimuth_shift = torch.Tensor([0.]).to(device)  
# Positive (negative) values correspond to moving camera up (down)
elevation_shift = torch.Tensor([0.]).to(device)
# Translation values
translations_shift = torch.Tensor([0., -0.5, 0.]).to(device)

azimuth_target = azimuth_source + azimuth_shift
elevation_target = elevation_source + elevation_shift
translations_target = translations_source + translations_shift

# Rotate scene to match target camera angle
rotated_scene = model.rotate_source_to_target(
    scene, 
    azimuth_source, elevation_source, translations_source,
    azimuth_target, elevation_target, translations_target
)

# Render rotated scene
rendered = model.render(rotated_scene)

plot_img_tensor(rendered.detach())

## 1. Introduction

*analysis of key components*

Part 1: Intro
- brief introduction on paper:
The paper from Dupont et al. introduces an approach to render 2D images into implicit, equivariant 3D representations. The authors argue that the scene representations need not be explicit, as long as the transformations to it occur in an equivariant manner. Their model is trained on a dataset of rotation symmetries, learning to produce novel views from a single image of a scene.

- brief motivation of paper (equivariant representations)
  - Implicit representations & View synthesis (Figure 2)
- define key goal: Learning equivariant scene representations from data

  
Part 2: Methodology
- Model design
- Transformations 
- equivariance, loss-definition (figure 4/5)

Part 3: Experiments of paper


Part 4: Datasets
- Their dataset
- Our focus (building new one)

## 2. Response 

Much of the success of Deep Learning can be attributed to effective representation learning. Such representations do not need to be humanly interpretable, but can also be abstract. The original authors proposed an implicit 3D representation of the scene, instead of an explicit 3D representation such as mesh-grids or point clouds. By removing the need for an explicit 3D representation, they developed a model that requires no 3D supervision. It only requires 2D images with the corresponding rotation angle of the camera, that was used between these images. Their model can generate a novel view from a single image. The qualitative results of their model’s performance motivated us to extent their research.

In the original paper the authors used 3D rotations to generate novel views, meaning that they rotate a camera on a sphere around the scene. 3D rotations do not act transitively on 3D space. Therefore, we proposed to extend their model to roto-translations, with the intermediate proof-of-concept step of using translations only. The objective was to obtain a model that can generate a novel view for any camera position in 3D space, within a reasonable range of movement.

## 3. Novel Contribution
- Describe your novel contribution.
* Methodology/theory for translation & rototranslations
  - justify group representation (homogeneous coords for translation and order of matrix multiplcation. 

- Support your contribution with actual code and experiments (hence the colab format!)
  - Demotime







### 3.1 Datasets

The authors present datasets consisting of rotational transformations. However, they do not provide instructions or tools for further data generation. To address this limitation we developed a new pipeline using blender for producing images of 3D-models under rotations, translations and roto-translations. Our pipeline can be used to increase the size of the training data, or to extend training data to new transformation groups.

The following section demonstrates the practical application of our pipeline for data production, enabling the generation of new training data for training translation and roto-translational invariant rendering models.

#### 3.1.1 Selecting 3D models
Similar to the authors, we perform experiments on ShapeNet benchmark. In particular, we download the [ShapeNet Core](https://shapenet.org/download/shapenetcore) subset. It is worth noting that the objects included in the ShapeNetCore dataset are already normalized and consistently aligned. From this subset we extract 2637 models.

#### 3.1.2 Build dataset with blender 

The subsequent pipeline can be adapted to accommodate any 3D-object data that is processable by Blender. Here follows a brief demonstration of how the pipeline can be used using blender 2.8. 


In [None]:
""" Detect local path """
local_path = !pwd
local_path = local_path[0]

_Run the subsequent cells once to install Blender with wget_



In [None]:
""" Install / Load wget """
%pip install wget
import wget

""" Install blender """
# Download blender 3.5.1
!wget https://ftp.nluug.nl/pub/graphics/blender/release/Blender3.5/blender-3.5.1-linux-x64.tar.xz

# Unpack 
!tar -xvf blender-3.5.1-linux-x64.tar.xz
!rm {local_path}/blender-3.5.1-linux-x64.tar.xz

# Move and rename for shorter commands
!mv {local_path}/blender-3.5.1-linux-x64 {local_path}/data_prep/demo/blender


_Run render demo_

In [None]:
""" Run Demo"""
!{local_path}/data_prep/demo/blender/blender -b --python data_prep/demo/render_blender.py -- --scene_name data --rotation --translation --scene_folder /data_prep/demo/data/model_1 --local_path {local_path}

_Display visual demonstration of roto-translation dataset_

In [None]:
""" Display random sample outputs """
# Load 3 random images from output directory
path = local_path + "/data_prep/demo/output/rot_trans_dataset/data/"
random_file = [random.choice(os.listdir(path)) for img in range(3)]
images = [mpimg.imread(path + image) for image in random_file]

# Plot sampleset
fig, axs = plt.subplots(1, 3, figsize=(10, 3))
for ax, id in zip(axs, range(3)):
    ax.imshow(images[id])
plt.show()



#### 3.1.3 New datasets

We first reproduce a rotation dataset which we train a model on to verify the reproducibility of the authors orginial results.

Subsequently, we produce two new datasets incorporating translations and roto-translations. 

|   | *# Scenes*  |  *# Images per scene* | Resolution  | *# datapoints*  |
|---|---|---|---|---|
| Train  | 2306  |  50 | 64 x 64  |  115300 |
| Validataion  | 331  | 50  |  64 x 64 | 16550  |

The rotations are sampled uniformly on a sphere with radius 1.5

We generate 50 images per object by applying transformations rotations, translations, and roto-translations.



The following section demonstrates the practical application of our pipeline for data production, enabling the generation of new training data for future research purposes.

Include:
- blender
- further applications using framework 
- demo of production of new datasets 

## 4. Conclusion

- Some preliminary results (working model)

## 5. Contributions 

Close the notebook with a description of the each students' contribution.