# DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

**Presentation**: Wednesday, July 24th 16:54

**Report and Code**: August 14th, 23:55

Dataset:
- Objectverse (https://objaverse.allenai.org)

Modifications:
- Explore various ways of improving the generalization ability across different categories, e.g., adding class embedding, and text feature descriptors.
- Adding additional test-time optimization loss for better shape fitting especially for objects with thin structure
- Add color code to shape latent code, also learn view-independant color field given colored-mesh

## 3.0. Running this notebook
We recommend running this notebook on a CUDA compatible local gpu. You can also run training on cpu, it will just take longer.

You have three options for running this exercise on a GPU, choose one of them and start the exercise below in section "Imports":
1. Locally on your own GPU
2. On our dedicated compute cluster
3. On Google Colab

We describe every option in more detail below:

---

### (a) Local Execution

If you run this notebook locally, you have to first install the python dependiencies again. They are the same as for exercise 1 so you can re-use the environment you used last time. If you use [poetry](https://python-poetry.org), you can also simply re-install everything (`poetry install`) and then run this notebook via `poetry run jupyter notebook`.

In case you are working with a RTX 3000-series GPU, you need to install a patched version of pytorch:

In [None]:
#%pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113

### (b) Compute Cluster

We provide access to a small compute cluster for the exercises and projects, consisting of a login node and 4 compute nodes with one dedicated RTX 3090 GPU each.
Please send us a short email with your name and preferred username so we can add you as a user.

We uploaded a PDF to Moodle with detailed information on how to access and use the cluster.

Since the cluster contains RTX 3000-series GPUs, you will need to install a patched version of pytorch:

In [None]:
#%pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113

### (c) Google Colab

If you don't have access to a GPU and don't want to use our cluster, you can also use Google Colab. However, we experienced the issue that inline visualization of shapes or inline images didn't work on colab, so just keep that in mind.
What you can also do is only train networks on colab, download the checkpoint, and visualize inference locally.

In case you're using Google Colab, you can upload the exercise folder (containing `exercise_2.ipynb`, directory `exercise_2` and the file `requirements.txt`) as `3d-machine-learning` to google drive (make sure you don't upload extracted datasets files).
Additionally you'd need to open the notebook `exercise_2.ipynb` in Colab using `File > Open Notebook > Upload`.

Next you'll need to run these two cells for setting up the environment. Before you do that make sure your instance has a GPU.

In [None]:
#import os
#from google.colab import drive
#drive.mount('/content/drive', force_remount=True)

# We assume you uploaded the exercise folder in root Google Drive folder

#!cp -r /content/drive/MyDrive/3d-machine-learning 3d-machine-learning/
#os.chdir('/content/3d-machine-learning/')
#print('Installing requirements')
#%pip install -r requirements.txt

# Make sure you restart runtime when directed by Colab

Run this cell after restarting your colab runtime

In [None]:
#import os
#import sys
#import torch
#os.chdir('/content/3d-machine-learning/')
#sys.path.insert(1, "/content/3d-machine-learning/")
#print('CUDA availability:', torch.cuda.is_available())

### Imports

The following imports should work regardless of whether you are using Colab or local execution.

In [None]:
!pip install --upgrade --quiet objaverse
%load_ext autoreload
%autoreload 2
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import k3d
import trimesh
import torch
import skimage
from PIL import Image
import shutil
import objaverse
from pathlib import Path
import random

Use the next cell to test whether a GPU was detected by pytorch.

In [None]:
torch.cuda.is_available()

## 3.1 DeepSDF


Here, we will take a look at 3D-reconstruction using [DeepSDF](https://arxiv.org/abs/1901.05103). We recommend reading the paper before attempting the exercise.

DeepSDF is an auto-decoder based approach that learns a continuous SDF representation for a class of shapes. Once trained, it can be used for shape representation, interpolation and shape completion. We'll look at each of these
applications.

<img src="exercise_3/images/deepsdf_teaser.png" alt="deepsdf_teaser" style="width: 800px;"/>

During training, the autodecoder optimizes both the network parameters and the latent codes representing each of the training shapes. Once trained, to reconstruct a shape given its SDF observations, a latent code is
optimized keeping the network parameters fixed, such that the optimized latent code gives the lowest error with observed SDF values.

An advantage that implicit representations have over voxel/grid based approaches is that they are not tied to a particular grid resolution, and can be evaluated at any resolution once trained.

Similar to previous exercise, we'll first download the processed dataset, look at the implementation of the dataset, the model and the trainer, try out overfitting and generalization over the entire dataset, and finally inference on unseen samples.

### (a) Downloading the data

Whereas volumetric models output entire 3d shape representations, implicit models like DeepSDF work on per point basis. The network takes in a 3D-coordinate (and additionally the latent vector) and outputs the SDF value at the queried point. To train such a model,
we therefore need, for each of the training shapes, a bunch of points with their corresponding SDF values for supervision. Points are sampled more aggressively near the surface of the object as we want to capture a more detailed SDF near the surface. For those curious,
data preparation is decribed in more detail in section 5 of the paper.

We'll be using the Objaverse Chairs class for the experiments in this project. 

In [None]:
download_path = Path("./objaverse/")

lvis_annotations = objaverse.load_lvis_annotations()
random.seed(42)
chairs_uids = lvis_annotations['chair']
chairs_uids = random.sample(chairs_uids, 100)

objects = objaverse.load_objects(
    uids=chairs_uids,
)

for objaverse_id, file_path in objects.items():
    if not Path(download_path / "chairs" / objaverse_id).exists():
        Path(download_path / "chairs").mkdir(parents=True, exist_ok=True)
    shutil.move(file_path, download_path / "chairs" )

For each shape, the downloaded chair .glb will be converted into a coloured mesh and stored in the same directory with its corresponding sdf file:
- `mesh.obj` representing the mesh representation of the shape
- `sdf.npz` file containing large number of points sampled on and around the mesh and their sdf values; contains numpy arrays under keys "pos" and "neg", containing points with positive and negative sdf values respectively

```
# contents of exercise_3/data/sdf_sofas
1faa4c299b93a3e5593ebeeedbff73b/                    # shape 0
    ├── mesh.obj                                    # shape 0 mesh
    ├── sdf.npz                                     # shape 0 sdf
    ├── surface.obj                                 # shape 0 surface
1fde48d83065ef5877a929f61fea4d0/                    # shape 1
1fe1411b6c8097acf008d8a3590fb522/                   # shape 2
:
```
The processing is performed in the cells below.

In [None]:
import os
print(os.getcwd())

In [None]:
data_path = Path("exercise_3/data/")

# Get the list of files in the sdf_chairs directory
sdf_files = list((data_path / "objaverse"/ "sdf_chairs").iterdir())

# For each file in data/objaverse/chairs, create a directory with the same name
for file in (data_path / "chairs").iterdir():
    if file.is_file():
        file_name = file.stem
        new_dir = data_path / "chairs" / file_name
        new_dir.mkdir(parents=True, exist_ok=True)

        # Then convert the .glb to a .obj mesh file
        mesh = trimesh.load(file, force="mesh")
        _ = mesh.export(new_dir / f"{file_name}.obj")

        # Remove the original .glb file
        file.unlink()

        # Finally, move the corresponding sdf file from data/objaverse/sdf_chairs to the same directory
        sdf_file = next(f for f in sdf_files if f.stem == file_name)
        shutil.move(sdf_file, new_dir / sdf_file.name)

In [None]:
# Rename all the sdf files to sdf.npz and all the obj files to mesh.obj
for chair_dir in (data_path / "objaverse"/ "chairs").iterdir():
    for file in chair_dir.iterdir():
        if file.suffix == ".obj":
            file.rename(chair_dir / "mesh.obj")
        elif file.suffix == ".npz":
            file.rename(chair_dir / "sdf.npz")

### (b) Dataset

We randomly generate train/test splits based on the ratio used in the exercise for ShapeNet.

In [None]:
# Generate random train/test/overfit splits modelled after the split ratios in the ShapeNet case
with open(data_path / "splits" / "sofas" / "train.txt", "r") as f:
    train_sofas = f.read().splitlines()
print(f"Number of sofas in the ShapeNet training set: {len(train_sofas)}")

# Read file splits/sofas/test.txt
with open(data_path / "splits" / "sofas" / "val.txt", "r") as f:
    test_sofas = f.read().splitlines()
    print(f"Number of sofas in the ShapeNet test set: {len(test_sofas)}")

with open(data_path / "splits" / "sofas" / "overfit.txt", "r") as f:
    overfit_sofas = f.read().splitlines()
print(f"Number of sofas in the ShapeNet overfit set: {len(overfit_sofas)}")

print(f"val/train ratio: {int(round(len(test_sofas) / (len(train_sofas) + len(test_sofas)),2)*100)}:{int(round(len(train_sofas) / (len(train_sofas) + len(test_sofas)),1)*100)}")

In [None]:
# We want to emulate the ratio of the ShapeNet dataset
val_ratio = int(round(len(test_sofas) / (len(train_sofas) + len(test_sofas)),2)*100)
train_ratio = 100-val_ratio

# Get a list of all chairs in the objaverse/chairs directory
all_chairs = list((data_path / "objaverse"/ "chairs").iterdir())

# Randomly shuffle the list
random.shuffle(all_chairs)

# Remove one chair to be used for overfitting
overfit_chair = all_chairs.pop()

# Split the list into training and validation sets
train_chairs = all_chairs[:int(len(all_chairs) * (train_ratio / 100))]
val_chairs = all_chairs[int(len(all_chairs) * (train_ratio / 100)):]

# Store the names of the chairs in corresponding text files
with open(data_path / "splits" / "chairs" / "train.txt", "w") as f:
    f.write("\n".join([c.stem for c in train_chairs]))

with open(data_path / "splits" / "chairs" / "val.txt", "w") as f:
    f.write("\n".join([c.stem for c in val_chairs]))

with open(data_path / "splits" / "chairs" / "overfit.txt", "w") as f:
    f.write(overfit_chair.stem)

In [None]:
from ml43dg.data.objaverse import Objaverse

num_points_to_samples = 40000
train_dataset = Objaverse(num_points_to_samples, "train")
val_dataset = Objaverse(num_points_to_samples, "val")
overfit_dataset = Objaverse(num_points_to_samples, "overfit")

# Get length, which is a call to __len__ function
print(f'Length of train set: {len(train_dataset)}')  # expected output: 1226
# Get length, which is a call to __len__ function
print(f'Length of val set: {len(val_dataset)}')  # expected output: 137
# Get length, which is a call to __len__ function
print(f'Length of overfit set: {len(overfit_dataset)}')  # expected output: 1

Let's take a look at the points sampled for a particular shape.

In [None]:
from ml43dg.util.visualization import visualize_mesh, visualize_pointcloud

uid = train_dataset[0]['name']
points = train_dataset[0]['points']
sdf = train_dataset[0]['sdf']

# sampled points inside the shape
inside_points = points[sdf[:, 0] < 0, :].numpy()

# sampled points outside the shape
outside_points = points[sdf[:, 0] > 0, :].numpy()

In [None]:
mesh = Objaverse.get_mesh(uid)
print('Mesh')
visualize_mesh(mesh.vertices, mesh.faces, flip_axes=True)

In [None]:
print('Sampled points with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, flip_axes=True)

In [None]:
print('Sampled points with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, flip_axes=True)

You'll notice that more points are sampled close to the surface rather than away from the surface.

### (c) Model

The DeepSDF auto-decoder architecture is visualized below:

<img src="exercise_3/images/deepsdf_architecture.png" alt="deepsdf_arch" style="width: 640px;"/>

Things to note:

- The network takes in the latent code for a shape concatenated with the query 3d coordinate, making up a 259 length vector (assuming latent code length is 256).
- The network consist of a sequence of weight-normed linear layers, each followed by a ReLU and a dropout. For weight norming a layer, check out `torch.nn.utils.weight_norm`. Each of these linear layers outputs a 512 dimensional vector, except the 4th layer which outputs a 253 dimensional vector.
- The output of the 4th layer is concatenated with the input, making the input to the 5th layer a 512 dimensional vector.
- The final layer is a simple linear layer without any norm, dropout or non-linearity, with a single dimensional output representing the SDF value.

Implement this architecture in file `exercise_3/model/deepsdf.py`.

Here are some basic sanity tests once you're done with your implementation.

In [None]:
from ml43dg.model.deepsdf import DeepSDFDecoder
from ml43dg.util.model import summarize_model

deepsdf = DeepSDFDecoder(latent_size=256)
print(summarize_model(deepsdf))

# input to the network is a concatenation of point coordinates (3) and the latent code (256 in this example);
# here we use a batch of 4096 points
input_tensor = torch.randn(4096, 3 + 256)
predictions = deepsdf(input_tensor)

print('\nOutput tensor shape: ', predictions.shape)  # expected output: 4096, 1

num_trainable_params = sum(p.numel() for p in deepsdf.parameters() if p.requires_grad) / 1e6
print(f'\nNumber of traininable params: {num_trainable_params:.2f}M')  # expected output: ~1.8M

### (d) Training script and overfitting to a single shape

In [None]:
from ml43dg.training import train_deepsdf

overfit_config = {
    'experiment_name': '0_objaverse_deepsdf_overfit',
    'device': 'cpu',  # change this to cpu if you do not have a GPU
    'is_overfit': True,
    'num_sample_points': 4096,
    'latent_code_length': 256,
    'batch_size': 1,
    'resume_ckpt': None,
    'learning_rate_model': 0.0005,
    'learning_rate_code': 0.001,
    'lambda_code_regularization': 0.0001,
    'max_epochs': 2000,
    'print_every_n': 50,
    'visualize_every_n': 250,
}

train_deepsdf.main(overfit_config)

Let's visualize the overfitted shape reconstruction to check if it looks reasonable.

In [None]:
# Load and visualize GT mesh of the overfit sample
gt_mesh = Objaverse.get_mesh('90dfb9e99ddd4b4ca414be7599ea6469')
print('GT')
visualize_mesh(gt_mesh.vertices, gt_mesh.faces, flip_axes=True)

# Load and visualize reconstructed overfit sample; it's okay if they don't look visually exact, since we don't run
# the training too long and have a learning rate decay while training
mesh_path = "ml43dg/runs/0_objaverse_deepsdf_overfit/meshes/01999_000.obj"
overfit_output = trimesh.load(mesh_path)
print('Overfit')
visualize_mesh(overfit_output.vertices, overfit_output.faces, flip_axes=True)

### (e) Training over entire train set

Once overfitting works, we can train on the entire train set.

Note: This training will take a few hours on a GPU (took ~3 hrs for 500 epochs on our 2080Ti, which already gave decent results). Please make sure to start training early enough before the submission deadline.

In [None]:
from ml43dg.training import train_deepsdf

generalization_config = {
    'experiment_name': '3_1_deepsdf_generalization',
    'device': 'cuda:0',  # run this on a gpu for a reasonable training time
    'is_overfit': False,
    'num_sample_points': 4096, # you can adjust this such that the model fits on your gpu
    'latent_code_length': 256,
    'batch_size': 1,
    'resume_ckpt': None,
    'learning_rate_model': 0.0005,
    'learning_rate_code': 0.001,
    'lambda_code_regularization': 0.0001,
    'max_epochs': 1000, #2000,  # not necessary to run for 2000 epochs if you're short on time, at 500 epochs you should start to see reasonable results
    'print_every_n': 50,
    'visualize_every_n': 5000,
}

train_deepsdf.main(generalization_config)

### (f) Inference using the trained model on observed SDF values

Fill in the inference script `exercise_3/inference/infer_deepsdf.py`. Note that it's not simply a forward pass, but an optimization of the latent code such that we have lowest error on observed SDF values.

In [None]:
from ml43dg.inference.infer_deepsdf import InferenceHandlerDeepSDF

device = torch.device('cuda:0')  # change this to cpu if you're not using a gpu

inference_handler = InferenceHandlerDeepSDF(256, "exercise_3/runs/3_1_deepsdf_generalization", device)

First, we try inference on a shape from validation set, for which we have a complete observation of sdf values. This is an easier problem as compared to shape completion,
since we have all the information already in the input.

Let's visualize the observations.

In [None]:
# get observed data
points, sdf = Objaverse.get_all_sdf_samples("b351e06f5826444c19fb4103277a6b93")

inside_points = points[sdf[:, 0] < 0, :].numpy()
outside_points = points[sdf[:, 0] > 0, :].numpy()

# visualize observed points; you'll observe that the observations are very complete
print('Observations with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, flip_axes=True)
print('Observations with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, flip_axes=True)

Reconstruction on these observations with the trained model:

In [None]:
# reconstruct
vertices, faces = inference_handler.reconstruct(points, sdf, 800)
# visualize
visualize_mesh(vertices, faces, flip_axes=True)

Next, we can try the shape completion task, i.e., inference on a shape from validation set, for which we do not have a complete observation of sdf values. The observed points are visualized below:

In [None]:
# get observed data
points, sdf = Objaverse.get_all_sdf_samples("b351e06f5826444c19fb4103277a6b93_incomplete")

inside_points = points[sdf[:, 0] < 0, :].numpy()
outside_points = points[sdf[:, 0] > 0, :].numpy()

# visualize observed points; you'll observe that the observations are incomplete
# making this is a shape completion task
print('Observations with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, flip_axes=True)
print('Observations with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, flip_axes=True)

Shape completion using the trained model:

In [None]:
# reconstruct
vertices, faces = inference_handler.reconstruct(points, sdf, 800)
# visualize
visualize_mesh(vertices, faces, flip_axes=True)

### (g) Latent space interpolation

The latent space learned by DeepSDF is interpolatable, meaning that decoding latent codes from this space produced meaningful shapes. Given two latent codes, a linearly interpolatable latent space will decode
each of the intermediate codes to some valid shape. Let's see if this holds for our trained model.

We'll pick two shapes from the train set as visualized below.

In [None]:
from ml43dg.data.shape_implicit import ShapeImplicit
from ml43dg.util.visualization import visualize_mesh

mesh = Objaverse.get_mesh("494fe53da65650b8c358765b76c296")
print('GT Shape A')
visualize_mesh(mesh.vertices, mesh.faces, flip_axes=True)

mesh = Objaverse.get_mesh("5ca1ef55ff5f68501921e7a85cf9da35")
print('GT Shape B')
visualize_mesh(mesh.vertices, mesh.faces, flip_axes=True)

Implement the missing parts in `exercise_3/inference/infer_deepsdf.py` such that it interpolates two given latent vectors, and run the code fragement below once done.

In [None]:
from ml43dg.inference.infer_deepsdf import InferenceHandlerDeepSDF

inference_handler = InferenceHandlerDeepSDF(256, "exercise_3/runs/3_1_deepsdf_generalization", torch.device('cuda:0'))
# interpolate; also exports interpolated meshes to disk
inference_handler.interpolate('494fe53da65650b8c358765b76c296', '5ca1ef55ff5f68501921e7a85cf9da35', 60)

Visualize the interpolation below. If everything works out correctly, you should see a smooth transformation between the shapes, with all intermediate shapes being valid sofas.

In [None]:
from ml43dg.util.mesh_collection_to_gif import  meshes_to_gif
from ml43dg.util.misc import show_gif

# create list of meshes (just exported) to be visualized
mesh_paths = sorted([x for x in Path("exercise_3/runs/3_1_deepsdf_generalization/interpolation").iterdir() if int(x.name.split('.')[0].split("_")[1]) == 0], key=lambda x: int(x.name.split('.')[0].split("_")[0]))
mesh_paths = mesh_paths + mesh_paths[::-1]

# create a visualization of the interpolation process
meshes_to_gif(mesh_paths, "exercise_3/runs/3_1_deepsdf_generalization/latent_interp.gif", 20)
show_gif("exercise_3/runs/3_1_deepsdf_generalization/latent_interp.gif")

## Submission

This is the end of exercise 3 🙂. Please create a zip containing all files we provided, everything you modified, your visualization images/gif (no need to submit generated OBJs), including your checkpoints. Name it with your matriculation number(s) as described in exercise 1. Make sure this notebook can be run without problems. Then, submit via Moodle.

**Note**: The maximum submission file size limit for Moodle is 100M. You do not need to submit your overfitting checkpoints; however, the generalization checkpoint will be >200M. The easiest way to still be able to submit that one is to split it with zip like this: `zip -s 100M model_best.ckpt.zip model_best.ckpt` which creates a `.zip` and a `.z01`. You can then submit both files alongside another zip containing all your code and outputs.

**Submission Deadline**: 11.06.2024, 23:55

## References

[1] Park, Jeong Joon, et al. "Deepsdf: Learning continuous signed distance functions for shape representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.