3D data is usually heterogeneous: meshes within one mini-batch may contain different numbers of vertices and faces. Processing such data on GPUs efficiently is not trivial and coding for the heterogeneous mini-batch processing can also be tedious. 

Luckily, PyTorch3D has the capacity to handle heterogeneous mini-batches very efficiently. 

Assuming the orientation of the camera is known, **let's estimate the unknown location of the depth camera** using the sensing results of the camera.

In [2]:
import open3d
import os
import torch
import numpy as np

In [3]:
from pytorch3d.io import load_objs_as_meshes
from pytorch3d.structures.meshes import join_meshes_as_batch
from pytorch3d.ops import sample_points_from_meshes
from pytorch3d.loss import chamfer_distance

In [4]:
# Defining a torch device
if torch.cuda.is_available():
    device = torch.device("cuda:0")
else:
    device = torch.device("cpu")
    print("WARNING: CPU only, this will be slow!")



In [10]:
data_path = 'data/'
mesh_names = ['cube.obj', 'diamond.obj', 'dodecahedron.obj'] # the camera observes three objects in the scene 
                                                             # and we know the ground-truth mesh models 

In [6]:
# Having a look at the obj meshes
for mesh_name in mesh_names:
    mesh = open3d.io.read_triangle_mesh(os.path.join(data_path, mesh_name))
    open3d.visualization.draw_geometries([mesh], 
                                         mesh_show_wireframe = True, 
                                         mesh_show_back_face = True)



In [11]:
# Loading the same meshes with PyTorch3D, building mesh_list
mesh_list = list()

for mesh_name in mesh_names:
    mesh = load_objs_as_meshes([os.path.join(data_path, mesh_name)], device=device)
    mesh_list.append(mesh)

In [12]:
# Creating PyTorch3D mini-batch of meshes
mesh_batch = join_meshes_as_batch(mesh_list, include_textures = False)

There are three ways to represent vertices and faces in each PyTorch3D mini-batch:
- List format
- Padded format
- Packed format

The formats can be converted between each other efficiently by using the PyTorch3D API.
Each of the three representations has its pros and cons. 

In [13]:
# Returning vertices and faces in a list format
vertex_list = mesh_batch.verts_list()
face_list = mesh_batch.faces_list()

print('vertex_list: \n', vertex_list)
print('face_list: \n', face_list)

vertex_list: 
 [tensor([[-0.5000, -0.5000,  0.5000],
        [-0.5000, -0.5000, -0.5000],
        [-0.5000,  0.5000, -0.5000],
        [-0.5000,  0.5000,  0.5000],
        [ 0.5000, -0.5000,  0.5000],
        [ 0.5000, -0.5000, -0.5000],
        [ 0.5000,  0.5000, -0.5000],
        [ 0.5000,  0.5000,  0.5000]]), tensor([[  0.,   0.,  78.],
        [ 45.,  45.,   0.],
        [ 45., -45.,   0.],
        [-45., -45.,   0.],
        [-45.,  45.,   0.],
        [  0.,   0., -78.]]), tensor([[-0.5774, -0.5774,  0.5774],
        [ 0.9342,  0.3568,  0.0000],
        [ 0.9342, -0.3568,  0.0000],
        [-0.9342,  0.3568,  0.0000],
        [-0.9342, -0.3568,  0.0000],
        [ 0.0000,  0.9342,  0.3568],
        [ 0.0000,  0.9342, -0.3568],
        [ 0.3568,  0.0000, -0.9342],
        [-0.3568,  0.0000, -0.9342],
        [ 0.0000, -0.9342, -0.3568],
        [ 0.0000, -0.9342,  0.3568],
        [ 0.3568,  0.0000,  0.9342],
        [-0.3568,  0.0000,  0.9342],
        [ 0.5774,  0.5774, -0.5774]

In [14]:
# Returning vertices and faces in the padded format
vertex_padded = mesh_batch.verts_padded()
face_padded = mesh_batch.faces_padded()

print('vertex_padded: \n', vertex_padded)
print('face_padded: \n', face_padded)

vertex_padded: 
 tensor([[[ -0.5000,  -0.5000,   0.5000],
         [ -0.5000,  -0.5000,  -0.5000],
         [ -0.5000,   0.5000,  -0.5000],
         [ -0.5000,   0.5000,   0.5000],
         [  0.5000,  -0.5000,   0.5000],
         [  0.5000,  -0.5000,  -0.5000],
         [  0.5000,   0.5000,  -0.5000],
         [  0.5000,   0.5000,   0.5000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000]],

        [[  0.0000,   0.0000,  78.0000],
         [ 45.0000,  45.0000,   0.0000],
         [ 45.0000, -45.0000,   0.0000],
         [-45.0000, -45.0000,   0.0000

In [15]:
# Getting vertices and faces in the packed format
vertex_packed = mesh_batch.verts_packed()
face_packed = mesh_batch.faces_packed()
print('vertex_packed: \n', vertex_packed)
print('face_packed: \n', face_packed)

num_vertices = vertex_packed.shape[0]
print('num_vertices = ', num_vertices)

vertex_packed: 
 tensor([[ -0.5000,  -0.5000,   0.5000],
        [ -0.5000,  -0.5000,  -0.5000],
        [ -0.5000,   0.5000,  -0.5000],
        [ -0.5000,   0.5000,   0.5000],
        [  0.5000,  -0.5000,   0.5000],
        [  0.5000,  -0.5000,  -0.5000],
        [  0.5000,   0.5000,  -0.5000],
        [  0.5000,   0.5000,   0.5000],
        [  0.0000,   0.0000,  78.0000],
        [ 45.0000,  45.0000,   0.0000],
        [ 45.0000, -45.0000,   0.0000],
        [-45.0000, -45.0000,   0.0000],
        [-45.0000,  45.0000,   0.0000],
        [  0.0000,   0.0000, -78.0000],
        [ -0.5774,  -0.5774,   0.5774],
        [  0.9342,   0.3568,   0.0000],
        [  0.9342,  -0.3568,   0.0000],
        [ -0.9342,   0.3568,   0.0000],
        [ -0.9342,  -0.3568,   0.0000],
        [  0.0000,   0.9342,   0.3568],
        [  0.0000,   0.9342,  -0.3568],
        [  0.3568,   0.0000,  -0.9342],
        [ -0.3568,   0.0000,  -0.9342],
        [  0.0000,  -0.9342,  -0.3568],
        [  0.0000,  -0.

In [17]:
# Simulating a noisy and displaced version of the three meshes
mesh_batch_noisy = mesh_batch.clone() # clone the ground truth mesh models
noise = (0.1**0.5)*torch.randn(mesh_batch_noisy.verts_packed().shape).to(device) #generate random Gaussian noise
motion_gt = np.array([3, 4, 5])
motion_gt = torch.as_tensor(motion_gt)
print('motion ground truth = ', motion_gt)

motion_gt = motion_gt[None, :]
motion_gt = motion_gt.to(device)
noise = noise + motion_gt

mesh_batch_noisy = mesh_batch_noisy.offset_verts(noise).detach()

motion ground truth =  tensor([3, 4, 5])


In [18]:
# Estimating the unknown displacement between the camera and the origin

# Defining optimisation variable
motion_estimate = torch.zeros(motion_gt.shape, 
                              device=device, 
                              requires_grad=True)


# Defining Torch optimizer
optimizer = torch.optim.SGD([motion_estimate], lr=0.1, momentum=0.9)


# Running SGD for 200 iterations 
for i in range(0, 200):
    optimizer.zero_grad()
    current_mesh_batch = mesh_batch.offset_verts(motion_estimate.repeat(num_vertices,1))
    
    # we randomly sample 5,000 points from the two meshes and compute their Chamfer distances
    sample_trg = sample_points_from_meshes(current_mesh_batch, 5000)
    sample_src = sample_points_from_meshes(mesh_batch_noisy, 5000)
    loss, _ = chamfer_distance(sample_trg, sample_src)

    loss.backward()
    optimizer.step()
    print('i = ', i, ', motion_estimation = ', motion_estimate)

i =  0 , motion_estimation =  tensor([[0.7660, 1.0591, 1.3091]], requires_grad=True)
i =  1 , motion_estimation =  tensor([[1.9890, 2.7370, 3.4198]], requires_grad=True)
i =  2 , motion_estimation =  tensor([[3.2689, 4.4703, 5.6283]], requires_grad=True)
i =  3 , motion_estimation =  tensor([[4.3714, 5.9458, 7.5484]], requires_grad=True)
i =  4 , motion_estimation =  tensor([[5.0222, 6.7549, 8.7200]], requires_grad=True)
i =  5 , motion_estimation =  tensor([[5.0668, 6.7020, 8.8755]], requires_grad=True)
i =  6 , motion_estimation =  tensor([[4.5661, 5.8858, 8.0746]], requires_grad=True)
i =  7 , motion_estimation =  tensor([[3.7233, 4.6184, 6.6561]], requires_grad=True)
i =  8 , motion_estimation =  tensor([[2.8137, 3.3175, 5.1016]], requires_grad=True)
i =  9 , motion_estimation =  tensor([[2.0189, 2.2299, 3.7003]], requires_grad=True)
i =  10 , motion_estimation =  tensor([[1.4702, 1.6002, 2.6817]], requires_grad=True)
i =  11 , motion_estimation =  tensor([[1.2992, 1.5865, 2.2981]]

i =  96 , motion_estimation =  tensor([[3.0023, 3.9722, 5.0977]], requires_grad=True)
i =  97 , motion_estimation =  tensor([[3.0045, 3.9813, 5.0887]], requires_grad=True)
i =  98 , motion_estimation =  tensor([[3.0012, 3.9843, 5.0799]], requires_grad=True)
i =  99 , motion_estimation =  tensor([[2.9956, 3.9824, 5.0734]], requires_grad=True)
i =  100 , motion_estimation =  tensor([[2.9900, 3.9747, 5.0663]], requires_grad=True)
i =  101 , motion_estimation =  tensor([[2.9822, 3.9634, 5.0600]], requires_grad=True)
i =  102 , motion_estimation =  tensor([[2.9740, 3.9506, 5.0566]], requires_grad=True)
i =  103 , motion_estimation =  tensor([[2.9671, 3.9365, 5.0589]], requires_grad=True)
i =  104 , motion_estimation =  tensor([[2.9606, 3.9244, 5.0633]], requires_grad=True)
i =  105 , motion_estimation =  tensor([[2.9556, 3.9160, 5.0698]], requires_grad=True)
i =  106 , motion_estimation =  tensor([[2.9539, 3.9104, 5.0779]], requires_grad=True)
i =  107 , motion_estimation =  tensor([[2.9575

i =  191 , motion_estimation =  tensor([[2.9900, 3.9303, 5.0820]], requires_grad=True)
i =  192 , motion_estimation =  tensor([[2.9895, 3.9327, 5.0836]], requires_grad=True)
i =  193 , motion_estimation =  tensor([[2.9862, 3.9374, 5.0873]], requires_grad=True)
i =  194 , motion_estimation =  tensor([[2.9804, 3.9409, 5.0871]], requires_grad=True)
i =  195 , motion_estimation =  tensor([[2.9768, 3.9443, 5.0863]], requires_grad=True)
i =  196 , motion_estimation =  tensor([[2.9750, 3.9462, 5.0861]], requires_grad=True)
i =  197 , motion_estimation =  tensor([[2.9747, 3.9483, 5.0866]], requires_grad=True)
i =  198 , motion_estimation =  tensor([[2.9725, 3.9494, 5.0888]], requires_grad=True)
i =  199 , motion_estimation =  tensor([[2.9735, 3.9480, 5.0914]], requires_grad=True)


Main Outcome: The optimization process converges to the [3,4,5] ground-truth location very quickly.