# DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

**Presentation**: Wednesday, July 24th 16:54

**Report and Code**: August 14th, 23:55

Dataset:
- Objectverse (https://objaverse.allenai.org)

Modifications:
- Explore various ways of improving the generalization ability across different categories, e.g., adding class embedding, and text feature descriptors.
- Adding additional test-time optimization loss for better shape fitting especially for objects with thin structure
- Add color code to shape latent code, also learn view-independant color field given colored-mesh

### Imports

The following imports should work regardless of whether you are using Colab or local execution.

In [8]:
#!pip install --upgrade --quiet objaverse
%load_ext autoreload
%autoreload 2
from pathlib import Path
import numpy as np
import os
import matplotlib.pyplot as plt
import k3d
import trimesh
import torch
import skimage
from PIL import Image
import shutil
import objaverse
from pathlib import Path
import random

Use the next cell to test whether a GPU was detected by pytorch.

In [None]:
torch.cuda.is_available()

## 1. DeepSDF


Here, we will take a look at 3D-reconstruction using [DeepSDF](https://arxiv.org/abs/1901.05103). We recommend reading the paper before attempting the exercise.

DeepSDF is an auto-decoder based approach that learns a continuous SDF representation for a class of shapes. Once trained, it can be used for shape representation, interpolation and shape completion. We'll look at each of these
applications.

<img src="ml43dg/images/deepsdf_teaser.png" alt="deepsdf_teaser" style="width: 800px;"/>

During training, the autodecoder optimizes both the network parameters and the latent codes representing each of the training shapes. Once trained, to reconstruct a shape given its SDF observations, a latent code is
optimized keeping the network parameters fixed, such that the optimized latent code gives the lowest error with observed SDF values.

An advantage that implicit representations have over voxel/grid based approaches is that they are not tied to a particular grid resolution, and can be evaluated at any resolution once trained.

Similar to previous exercise, we'll first download the processed dataset, look at the implementation of the dataset, the model and the trainer, try out overfitting and generalization over the entire dataset, and finally inference on unseen samples.

### (a) Downloading the data

Whereas volumetric models output entire 3d shape representations, implicit models like DeepSDF work on per point basis. The network takes in a 3D-coordinate (and additionally the latent vector) and outputs the SDF value at the queried point. To train such a model,
we therefore need, for each of the training shapes, a bunch of points with their corresponding SDF values for supervision. Points are sampled more aggressively near the surface of the object as we want to capture a more detailed SDF near the surface. For those curious,
data preparation is decribed in more detail in section 5 of the paper.

We'll be using the Objaverse Chairs class for the experiments in this project. 

In [None]:
download_path = Path("./ml43dg/data/objaverse/")

for class_name in ['sofa', 'table', 'vase']:
    print("Downloading " + (class_name+"s"))
    lvis_annotations = objaverse.load_lvis_annotations()
    random.seed(42)
    uids = lvis_annotations[class_name]
    uids = random.sample(uids, 5)

    objects = objaverse.load_objects(
        uids=uids,
    )

    for objaverse_id, file_path in objects.items():
        if not Path(download_path / (class_name+"s") / objaverse_id).exists():
            Path(download_path / (class_name+"s")).mkdir(parents=True, exist_ok=True)
        shutil.move(file_path, download_path / (class_name+"s") )

For each shape, the downloaded chair .glb will be converted into a coloured mesh and stored in the same directory with its corresponding sdf file:
- `mesh.obj` representing the mesh representation of the shape
- `sdf.npz` file containing large number of points sampled on and around the mesh and their sdf values; contains numpy arrays under keys "pos" and "neg", containing points with positive and negative sdf values respectively

```
# contents of ml43dg/data/objaverse
1faa4c299b93a3e5593ebeeedbff73b/                    # shape 0
    ├── mesh.obj                                    # shape 0 mesh
    ├── sdf.npz                                     # shape 0 sdf
    ├── surface.obj                                 # shape 0 surface
1fde48d83065ef5877a929f61fea4d0/                    # shape 1
1fe1411b6c8097acf008d8a3590fb522/                   # shape 2
:
```
The processing is performed in the cells below.

In [6]:
classes = ["tables", "sofas", "chairs", "vases"]

In [None]:
from ml43dg.data.preprocessing import preprocess

class_directories = {}
for class_name in classes:
    files = os.listdir("./ml43dg/data/objaverse/"+class_name)
    class_directories[class_name] = []
    for file in files:
        filename = Path(file).stem
        class_directories[class_name].append(filename)

for class_name in class_directories:
    random.seed(42)
    random.shuffle(class_directories[class_name])
    chunk = {class_name: class_directories[class_name][:80]}
    print(chunk)
    preprocess(data_dir="./ml43dg/data/preprocessed/", source_dir="./ml43dg/data/objaverse/", source_name="objaverse",  class_directories=chunk, number_of_points=200000, extension=".obj")

In [12]:
# Remove any existing sdf.npz files in the objaverse directory
for class_name in classes:
    directories = os.listdir("./ml43dg/data/objaverse/"+class_name)
    for file in directories:
        print(file)
        if file.endswith(".npz"):
            os.remove("./ml43dg/data/objaverse/"+class_name+"/"+file)

In [13]:
# Copy the preprocessed sdf.npz in each shape directory to the corresponding directory in the objaverse dataset
for class_name in classes:
    files = os.listdir("./ml43dg/data/preprocessed/objaverse/"+class_name)
    for file in files:
        filename = Path(file).stem
        shutil.copy("./ml43dg/data/preprocessed/objaverse/"+class_name+"/"+filename+"/sdf.npz", "./ml43dg/data/objaverse/"+class_name+"/"+filename+"/sdf.npz")

### (b) Dataset

We randomly generate train/test splits and save the corresponding shape ids in .json files. Then, we can generate our train and validation loaders.

In [31]:
# Using a 90-10 split for training and validation
train_ratio = 90
data_path = Path("./ml43dg/data/")

# List all the subdirectories in the objaverse directory
shapes = ["tables", "sofas", "chairs", "vases"]

all_shapes = []
for shape in shapes:
    all_shapes += list((data_path / "preprocessed" / "objaverse" / shape).iterdir())

# Randomly shuffle the list
random.shuffle(all_shapes)

# Remove 1 shape to be used for overfitting
overfit_shape = all_shapes.pop()

# Split the list into training and validation sets
train_shapes = all_shapes[:int(len(all_shapes) * (train_ratio / 100))]
val_shapes = all_shapes[int(len(all_shapes) * (train_ratio / 100)):]

# Store the names of the objects in corresponding text files
# Use the string identified class_label/uid
with open(data_path / "splits" / "objaverse" / "train.txt", "w") as f:
    f.write("\n".join([c.parent.stem+"/"+c.stem for c in train_shapes]))

with open(data_path / "splits" / "objaverse" / "val.txt", "w") as f:
    f.write("\n".join([c.parent.stem+"/"+c.stem for c in val_shapes]))

with open(data_path / "splits" / "objaverse" / "overfit.txt", "w") as f:
    f.write(overfit_shape.parent.stem+"/"+overfit_shape.stem)

In [1]:
from ml43dg.data.objaverse import Objaverse

num_points_to_samples = 40000
train_dataset = Objaverse(num_points_to_samples, "train")
val_dataset = Objaverse(num_points_to_samples, "val")
overfit_dataset = Objaverse(num_points_to_samples, "overfit")

# Get length, which is a call to __len__ function
print(f'Length of train set: {len(train_dataset)}')  # expected output: 1226
# Get length, which is a call to __len__ function
print(f'Length of val set: {len(val_dataset)}')  # expected output: 137
# Get length, which is a call to __len__ function
print(f'Length of overfit set: {len(overfit_dataset)}')  # expected output: 1

Length of train set: 232
Length of val set: 26
Length of overfit set: 1


Let's take a look at the points sampled for a particular shape.

In [2]:
from ml43dg.util.visualization import visualize_mesh, visualize_pointcloud

uid = train_dataset[0]['name']
class_name = train_dataset[0]['class_label']
points = train_dataset[0]['points']
sdf = train_dataset[0]['sdf']

# sampled points inside the shape
inside_points = points[sdf[:, 0] < 0, :].numpy()

# sampled points outside the shape
outside_points = points[sdf[:, 0] > 0, :].numpy()

  pos_tensor = torch.tensor(pos_tensor[pos_idx], dtype=torch.float32)
  neg_tensor = torch.tensor(neg_tensor[neg_idx], dtype=torch.float32)


In [3]:
mesh = Objaverse.get_mesh(class_name+"/"+uid)
print('Mesh')
visualize_mesh(mesh.vertices, mesh.faces, flip_axes=True)

Mesh




Output()

In [4]:
print('Sampled points with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, flip_axes=True)

Sampled points with negative SDF (inside)


Output()

In [5]:
print('Sampled points with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, flip_axes=True)

Sampled points with positive SDF (outside)


Output()

You'll notice that more points are sampled close to the surface rather than away from the surface.

### (c) Model

The DeepSDF auto-decoder architecture is visualized below:

<img src="ml43dg/images/deepsdf_architecture.png" alt="deepsdf_arch" style="width: 640px;"/>

Our modified DeepSDF auto-decoder architecture, that has an additional one-hot encoded class embedding and viewing direction and outputs the predicted colour in an additional head, is visualised below:

<img src="ml43dg/images/modified_architecture.png" alt="deepsdf_arch" style="width: 640px;"/>



In [18]:
from ml43dg.model.deepsdf import DeepSDFDecoder
from ml43dg.util.model import summarize_model

deepsdf = DeepSDFDecoder(shape_latent_size=256, colour_latent_size=128, one_hot_size=4)
print(summarize_model(deepsdf))

# The input to the network is a concatenation of:
#   - point coordinates (3) 
#   - viewing direction as Euler angles (2)
#   - shape latent code (256 in this example)
#   - colour latent code (128 in this example)
#   - and the one-hot encoding of the class label (4).
# Here we use a batch of 4096 points
input_points = torch.randn(4096, 3)
input_view = torch.randn(4096, 2)
shape_latent = torch.randn(4096, 256)
colour_latent = torch.randn(4096, 128)
class_label = torch.randn(4096, 4)
sdf_predictions, colour_predictions  = deepsdf(input_points, input_view, shape_latent, colour_latent, class_label)

print('\nOutput tensor shape for sdf: ', sdf_predictions.shape)  # expected output: 4096, 1
print('Output tensor shape for colour: ', colour_predictions.shape)  # expected output: 4096, 3

num_trainable_params = sum(p.numel() for p in deepsdf.parameters() if p.requires_grad) / 1e6
print(f'\nNumber of traininable params: {num_trainable_params:.2f}M')  # expected output: ~1.8M

   | Name      | Type           | Params 
-----------------------------------------------
0  | wnll1     | Linear         | 135680 
1  | wnll2     | Linear         | 263168 
2  | wnll3     | Linear         | 263168 
3  | wnll4     | Linear         | 127986 
4  | wnll5     | Linear         | 263168 
5  | wnll6     | Linear         | 263168 
6  | wnll7     | Linear         | 263168 
7  | wnll8     | Linear         | 263168 
8  | wnll5_col | Linear         | 329728 
9  | fc        | Linear         | 513    
10 | fc_col    | Linear         | 1539   
11 | sigmoid   | Sigmoid        | 0      
12 | relu      | ReLU           | 0      
13 | dropout   | Dropout        | 0      
14 | TOTAL     | DeepSDFDecoder | 2174454
colour_in: torch.Size([4096, 642])

Output tensor shape for sdf:  torch.Size([4096, 1])
Output tensor shape for colour:  torch.Size([4096, 3])

Number of traininable params: 2.17M


### (d) Training script and overfitting to a single shape

In [50]:
from ml43dg.training import train_deepsdf

overfit_config = {
    'experiment_name': '0_objaverse_deepsdf_overfit',
    'device': 'cuda:0',  # change this to cpu if you do not have a GPU
    'is_overfit': True,
    'num_sample_points': 4096,
    'latent_code_length': 256,
    'color_latent_code_length': 128,
    'batch_size': 1,
    'resume_ckpt': None,
    'learning_rate_model': 0.0005,
    'learning_rate_code': 0.001,
    'lambda_code_regularization': 0.0001,
    'max_epochs': 5000,
    'print_every_n': 150,
    'visualize_every_n': 500,
}

train_deepsdf.main(overfit_config)

Using device: cuda:0


  pos_tensor = torch.tensor(pos_tensor[pos_idx], dtype=torch.float32)
  neg_tensor = torch.tensor(neg_tensor[neg_idx], dtype=torch.float32)


[149/00000] train_loss: 0.176446
[299/00000] train_loss: 0.094961
[449/00000] train_loss: 0.070677
Saved mesh to ml43dg/runs/0_objaverse_deepsdf_overfit/meshes/00499_000.obj
[599/00000] train_loss: 0.058884
[749/00000] train_loss: 0.054229
[899/00000] train_loss: 0.051151
Saved mesh to ml43dg/runs/0_objaverse_deepsdf_overfit/meshes/00999_000.obj
[1049/00000] train_loss: 0.048560
[1199/00000] train_loss: 0.046724
[1349/00000] train_loss: 0.045809
[1499/00000] train_loss: 0.044688
Saved mesh to ml43dg/runs/0_objaverse_deepsdf_overfit/meshes/01499_000.obj
[1649/00000] train_loss: 0.043698
[1799/00000] train_loss: 0.043143
[1949/00000] train_loss: 0.042574
Saved mesh to ml43dg/runs/0_objaverse_deepsdf_overfit/meshes/01999_000.obj
[2099/00000] train_loss: 0.042090
[2249/00000] train_loss: 0.041670
[2399/00000] train_loss: 0.041434
Saved mesh to ml43dg/runs/0_objaverse_deepsdf_overfit/meshes/02499_000.obj
[2549/00000] train_loss: 0.041222
[2699/00000] train_loss: 0.041232
[2849/00000] train_

Let's visualize the overfitted shape reconstruction to check if it looks reasonable.

In [51]:
import trimesh

from ml43dg.util.visualization import visualize_mesh
from ml43dg.data.objaverse import Objaverse

# Load and visualize GT mesh of the overfit sample
gt_mesh = Objaverse.get_mesh('chairs/4fb3bab820de495599b2987e82c6e6e8')
print('GT')
visualize_mesh(gt_mesh.vertices, gt_mesh.faces, flip_axes=True)

# Load and visualize reconstructed overfit sample; it's okay if they don't look visually exact, since we don't run
# the training too long and have a learning rate decay while training
mesh_path = "ml43dg/runs/0_objaverse_deepsdf_overfit/meshes/04999_000.obj"
overfit_output = trimesh.load(mesh_path, process=True)
print('Overfit')
visualize_mesh(overfit_output.vertices, overfit_output.faces, flip_axes=True)

GT


Output()

Overfit


Output()

In [52]:
gt_mesh.show()

In [53]:
overfit_output.show()

### (e) Training over entire train set

Once overfitting works, we can train on the entire train set.

In [54]:
from ml43dg.training import train_deepsdf

generalization_config = {
    'experiment_name': '1_deepsdf_generalization',
    'device': 'cuda:0',  # run this on a gpu for a reasonable training time
    'is_overfit': False,
    'num_sample_points': 4096, # you can adjust this such that the model fits on your gpu
    'latent_code_length': 256,
    'color_latent_code_length': 128,
    'batch_size': 1,
    'resume_ckpt': None,
    'learning_rate_model': 0.0005,
    'learning_rate_code': 0.001,
    'lambda_code_regularization': 0.0001,
    'max_epochs': 1000, #2000,  # not necessary to run for 2000 epochs if you're short on time, at 500 epochs you should start to see reasonable results
    'print_every_n': 50,
    'visualize_every_n': 1000,
}

train_deepsdf.main(generalization_config)

Using device: cuda:0
[000/00049] train_loss: 0.219487
[000/00099] train_loss: 0.193114
[000/00149] train_loss: 0.197943
[000/00199] train_loss: 0.183336
[001/00017] train_loss: 0.178592
[001/00067] train_loss: 0.173316
[001/00117] train_loss: 0.169131
[001/00167] train_loss: 0.169131
[001/00217] train_loss: 0.168121
[002/00035] train_loss: 0.161517
[002/00085] train_loss: 0.166053
[002/00135] train_loss: 0.153779
[002/00185] train_loss: 0.154392
[003/00003] train_loss: 0.161785
[003/00053] train_loss: 0.160138
[003/00103] train_loss: 0.150745
[003/00153] train_loss: 0.151398
[003/00203] train_loss: 0.160857
[004/00021] train_loss: 0.146157
[004/00071] train_loss: 0.153997
Saved mesh to ml43dg/runs/1_deepsdf_generalization/meshes/00999_000.obj
Saved mesh to ml43dg/runs/1_deepsdf_generalization/meshes/00999_001.obj
Saved mesh to ml43dg/runs/1_deepsdf_generalization/meshes/00999_002.obj
Saved mesh to ml43dg/runs/1_deepsdf_generalization/meshes/00999_003.obj
Saved mesh to ml43dg/runs/1_dee

KeyboardInterrupt: 

### (f) Inference using the trained model on observed SDF values

In [58]:
from ml43dg.inference.infer_deepsdf import InferenceHandlerDeepSDF

device = torch.device('cuda:0')  # change this to cpu if you're not using a gpu

inference_handler = InferenceHandlerDeepSDF(256, 128, 4, "ml43dg/runs/1_deepsdf_generalization", device)

First, we try inference on a shape from validation set, for which we have a complete observation of sdf values. This is an easier problem as compared to shape completion,
since we have all the information already in the input.

Let's visualize the observations.

In [59]:
from ml43dg.data.objaverse import Objaverse
from ml43dg.util.visualization import visualize_mesh, visualize_pointcloud

# get observed data
points, sdf, colors, viewing_dirs = Objaverse.get_all_sdf_samples("vases/2c5969f0fa024e9d9629bf91df6c8baf")

inside_points = points[sdf[:, 0] < 0, :].numpy()
inside_colors = colors[sdf[:, 0] < 0, :].numpy()
outside_points = points[sdf[:, 0] > 0, :].numpy()
outside_colors = colors[sdf[:, 0] > 0, :].numpy()

# convert rgb colors to hex
inside_hex_colors = []
for color in inside_colors:
    inside_hex_colors.append("0x{0:02x}{1:02x}{2:02x}".format(int(color[0]*255), int(color[1]*255), int(color[2]*255)))
# convert to np array
inside_hex_colors = np.asarray(inside_hex_colors)
# convert to int with base 16
for i in range(len(inside_hex_colors)):
    inside_hex_colors[i] = int(inside_hex_colors[i], 16)
    
# convert rgb colors to hex
outside_hex_colors = []
for color in outside_colors:
    outside_hex_colors.append("0x{0:02x}{1:02x}{2:02x}".format(int(color[0]*255), int(color[1]*255), int(color[2]*255)))
outside_hex_colors = np.asarray(outside_hex_colors)
for i in range(len(outside_hex_colors)):
    outside_hex_colors[i] = int(outside_hex_colors[i], 16)

# visualize observed points; you'll observe that the observations are very complete
print('Observations with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, inside_hex_colors, flip_axes=True)
print('Observations with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, outside_hex_colors, flip_axes=True)

Observations with negative SDF (inside)


Output()

Observations with positive SDF (outside)


Output()

Reconstruction on these observations with the trained model:

In [62]:
# reconstruct
vertices, faces, vertex_colours = inference_handler.reconstruct(points, sdf, colors, viewing_dirs, "vase", 800)

# convert vertex_colours to hex values
hex_colours = []
for color in vertex_colours:
    hex_colours.append("0x{0:02x}{1:02x}{2:02x}".format(int(color[0]*255), int(color[1]*255), int(color[2]*255)))
hex_colours = np.asarray(hex_colours)
for i in range(len(hex_colours)):
    hex_colours[i] = int(hex_colours[i], 16)

# visualize
visualize_mesh(vertices, faces, hex_colours, flip_axes=True)

[00000] optim_loss: 0.231289
[00050] optim_loss: 0.077205
[00100] optim_loss: 0.069713
[00150] optim_loss: 0.065336
[00200] optim_loss: 0.065782
[00250] optim_loss: 0.065428
[00300] optim_loss: 0.063650
[00350] optim_loss: 0.059568
[00400] optim_loss: 0.062740
[00450] optim_loss: 0.060236
[00500] optim_loss: 0.058977
[00550] optim_loss: 0.060745
[00600] optim_loss: 0.057602
[00650] optim_loss: 0.059198
[00700] optim_loss: 0.057530
[00750] optim_loss: 0.060585
Optimization complete.




Output()

## References

[1] Park, J., Florence, P., Straub, J., Newcombe, R., & Lovegrove, S.. (2019). DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation.

[2] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, & Ren Ng (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.