# Efficient 3D Deep Learning with torchsparse and SPVNAS

In this tutorial, we will introduce how to efficiently process LiDAR point clouds with our efficient 3D sparse computation library, `torchsparse` and our newly proprosed 3D AutoML framework, SPVNAS.
<img src="https://hanlab.mit.edu/projects/spvnas/figures/overview.png" width="720">

Let's clone the codebase first.

In [None]:
!git clone https://github.com/mit-han-lab/e3d.git
import os
os.chdir('e3d/tutorial')
print(os.getcwd())

Let's install necessary libraries used in this tutorial. Note: `pip install` can be time consuming. It takes ~5 minutes on Google Colab.

In [None]:
# Google's sparse hash project. Used in torchsparse.
!sudo apt-get install libsparsehash-dev

In [None]:
# The library used for plotting.
!pip install plotly --upgrade 1>/dev/null
# torchsparse is our high-performance 3D sparse convolution library.
!pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git 1>/dev/null

Let's import the libraries used in this tutorial.

In [None]:
# numpy
import numpy as np
# PyTorch
import torch
import torch.nn as nn
# torchsparse is our high-performance 3D sparse convolution library.
import torchsparse
import torchsparse.nn as spnn
from torchsparse import SparseTensor
from torchsparse.utils import sparse_quantize, sparse_collate_fn

## Efficient 3D Sparse Computation with torchsparse

We start the first part of this tutorial, where we will present how to use our library `torchsparse` to load input point cloud data, define networks and do training. The library `torchsparse` is a high-performance computing library for efficient 3D sparse convolution. This library aims at accelerating sparse computation in 3D, in particular the Sparse Convolution operation. 

<img src="https://hanlab.mit.edu/projects/spvnas/figures/sparseconv_illustration.gif" width="720">

The major advantage of this library is that we support all computation on the GPU, especially the kernel map construction (which is done on the CPU in latest [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) V0.4.3). In addition, we support more general 3D modules such as [Sparse Point-Voxel Convolution](https://arxiv.org/abs/2007.16100) presented in our SPVNAS project.

Here we introduce how to process input point cloud with the `torchsparse` library. The core idea is that we want to downsample the input into sparse volumetric representation through the function `sparse_quantize` (which we imported from `torchsparse.utils` just now). In order to perform batching, we convert the format of each input to `torchsparse.SparseTensor`, which is composed of `features (F)` and `coordinates (C)`.

In [None]:
def process_point_cloud(input_point_cloud, input_labels, 
                        voxel_size=0.05, ignore_label=19):
    input_point_cloud[:, 3] = input_point_cloud[:, 3]
    # get rounded coordinates
    pc_ = np.round(input_point_cloud[:, :3] / voxel_size)
    pc_ -= pc_.min(0, keepdims=1)
    labels_ = input_labels
    feat_ = input_point_cloud
    # filter out unlabeled points
    out_pc = input_point_cloud[labels_ != ignore_label, :3]
    pc_ = pc_[labels_ != ignore_label]
    feat_ = feat_[labels_ != ignore_label]
    labels_ = labels_[labels_ != ignore_label]
        
    # sparse quantization: filter out duplicate points after downsampling
    inds, labels, inverse_map = sparse_quantize(pc_,
                                                feat_,
                                                labels_,
                                                return_index=True,
                                                return_invs=True)
    # construct members as sparse tensor so that they can be collated
    pc = pc_[inds]
    feat = feat_[inds]
    labels = SparseTensor(
        labels_[inds], pc
    )
    lidar = SparseTensor(
        feat, pc
    )
    targets_mapped = SparseTensor(
        labels_, out_pc
    )
    inverse_map = SparseTensor(
        inverse_map, out_pc
    )
    out_pc = SparseTensor(
        out_pc, out_pc
    )
    # construct the feed_dict
    feed_dict = {
        'pc': out_pc,
        'lidar': lidar,
        'targets': labels,
        'targets_mapped': targets_mapped,
        'inverse_map': inverse_map
    }
    
    return feed_dict

We then introduce how to perform batching in `torchsparse`. Here, we assume we receive a list of `feed_dict` from the input preprocessor. What we need to do is to directly invoke the `sparse_collate_fn` function (imported from `torchsparse.utils`) and our library will help you deal with batching.

In [None]:
def generate_random_batch(batch_size=2, pc_size=100000, num_classes=10):
    lis = []
    for i in range(batch_size):
        dummy_pc = np.random.randn(pc_size, 4) * 10
        dummy_label = np.random.choice(num_classes, pc_size)
        feed_dict = process_point_cloud(
            dummy_pc,
            dummy_label
        )
        lis.append(feed_dict)
    return sparse_collate_fn(lis)


Great! Now you are familiar with input processing pipeline with `torchsparse`. For further reading, you are welcomed to checkout our [SPVNAS codebase](https://github.com/mit-han-lab/e3d/blob/master/spvnas/core/datasets/semantic_kitti.py) for real-world dataset processing. 

We'll continue to introduce how we can define models with `torchsparse`. Notice that this part is very similar to `PyTorch`, where we have `torchsparse.nn (spnn)` corresponds to `torch.nn (nn)` and `torchsparse.nn.functional (spf)` corresponds to `torch.nn.functional (F)` by convention. The module `spnn.Conv3d` means Sparse 3D Convolution and you can define its input, output channels, kernel size and stride as is shown below. It is also possible to specify whether it is a transposed convolution (*i.e.* one used to upsample the input feature map). Non-spatial operations such as `spnn.BatchNorm` and `spnn.ReLU` are similar to batchnorm / ReLU operations in 2D CNNs.

In [None]:
# define the device to run inference / training
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print('We will run training on', device)

num_classes = 10

dummy_model = nn.Sequential(
    spnn.Conv3d(4, 32, kernel_size=3, stride=1),
    spnn.BatchNorm(32),
    spnn.ReLU(True),
        
    spnn.Conv3d(32, 64, kernel_size=2, stride=2),
    spnn.BatchNorm(64),
    spnn.ReLU(True),
        
    spnn.Conv3d(64, 64, kernel_size=2, stride=2, transpose=True),
    spnn.BatchNorm(64),
    spnn.ReLU(True),
        
    spnn.Conv3d(64, 32, kernel_size=3, stride=1),
    spnn.BatchNorm(32),
    spnn.ReLU(True),
        
    spnn.Conv3d(32, num_classes, kernel_size=1)
).to(device)

print(dummy_model)

Here we perform dummy training. The code structure is very similar to conventional 2D CNN training. However, one has to pay special attention to the outputs from the model. In the `dummy_model` defined above, the output is `SparseTensor`, we therefore need to convert it to `torch.Tensor` via `outputs.F` before feeding it to the `criterion`. When the output is `torch.Tensor`, such conversion will be unnecessary.

In [None]:


def dummy_train(model, device, num_classes=10):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    criterion = nn.CrossEntropyLoss().to(device)
    
    print('Starting dummy training...')
    for i in range(10):
        feed_dict = generate_random_batch(
            batch_size = 2,
            pc_size = 50000
        )
        inputs = feed_dict['lidar'].to(device)
        targets = feed_dict['targets'].F.to(device).long()
        outputs = model(inputs)
        optimizer.zero_grad()
        loss = criterion(outputs.F, targets)
        loss.backward()
        optimizer.step()
        print('[step %d] loss = %f.'%(i, loss.item()))
    print('Finished dummy training!')

    
dummy_train(dummy_model, device, num_classes)

## Fast and Accurate 3D Deep Learning with SPVNAS

Congratulations! You have got yourself familiar with our efficient 3D sparse computing library, `torchsparse`! Now, we will move on to the second part of this tutorial, which is related to our newly proposed SPVNAS at ECCV 2020.

[SPVNAS](https://arxiv.org/abs/2007.16100) is the **first** AutoML method for efficient 3D scene understanding. In this work, we first adapt [Point-Voxel Convolution](https://arxiv.org/abs/1907.03739) (NeurIPS 2019) to large-scale outdoor LiDAR scans by introducing Sparse Point-Voxel Convolution (SPVConv):

<img src="https://hanlab.mit.edu/projects/spvnas/figures/spvconv.png" width="720">

We then apply 3D Neural Architecture Search (3D-NAS) to automatically search for the best architectures built from SPVConv under efficiency constraints.

<img src="https://hanlab.mit.edu/projects/spvnas/figures/3dnas.png" width="720">

Let's define some helpers for visualization (`color_map` and `label_map`) and select the device to run inference on.

In [None]:
from utils import create_label_map
# color map for visualization
color_map = np.array(['#f59664', '#f5e664', '#963c1e', 
             '#b41e50', '#ff0000', '#1e1eff', 
             '#c828ff', '#5a1e96', '#ff00ff', 
             '#ff96ff', '#4b004b', '#4b00af', 
             '#00c8ff', '#3278ff', '#00af00', 
             '#003c87', '#50f096', '#96f0ff', 
             '#0000ff', '#ffffff'])

# label map maps SemanticKITTI labels to [0,19]
label_map = create_label_map()
# define the device to run inference / training
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print('We will run inference on', device)

Let's load some real data and process it with aforementioned `process_point_cloud` and `sparse_collate_fn` functions for inference.

In [None]:
# load prepared data
point_cloud = np.fromfile('./sample_data/000000.bin', dtype=np.float32).reshape(-1,4)
label = np.fromfile('./sample_data/000000.label', dtype=np.int32) & 0xFFFF
label = label_map[label]
# use sparse_collate_fn to create batch
feed_dict = sparse_collate_fn([process_point_cloud(point_cloud, label)])
print('Created feed dict with keys:', feed_dict.keys())

Now we import the pretrained SPVNAS model from our model zoo to run inference.

In [None]:
# import SPVNAS model from model zoo
from model_zoo import spvnas_specialized
model = spvnas_specialized('SemanticKITTI_val_SPVNAS@65GMACs').to(device)
model.eval()
# run inference
inputs = feed_dict['lidar'].to(device)
outputs = model(inputs)
predictions = outputs.argmax(1).cpu().numpy()
# map predictions from downsampled sparse voxels to original points
predictions = predictions[feed_dict['inverse_map'].F.int().cpu().numpy()]
print(model)

Finally, we will visualize the predictions from SPVNAS in an interactive window. Please run the following cell and enjoy it! (Notice that rendering the interactive window may take some time.)

In [None]:
%matplotlib inline
def configure_plotly_browser_state():
  import IPython
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              plotly: 'https://cdn.plot.ly/plotly-latest.min.js?noext',
            },
          });
        </script>
        '''))

# Import dependencies
import plotly
import plotly.graph_objs as go
# Configure Plotly to be rendered inline in the notebook.te

pc = feed_dict['pc'].F.cpu().numpy()
# Configure the trace.
trace = go.Scatter3d(
    x=pc[:, 0],
    y=pc[:, 1],
    z=pc[:, 2],
    mode='markers',
    marker={
        'size': 1,
        'opacity': 0.8,
        'color': color_map[predictions].tolist(),
    }
)
configure_plotly_browser_state()
plotly.offline.init_notebook_mode(connected=False)
# Configure the layout.
layout = go.Layout(
    margin={'l': 0, 'r': 0, 'b': 0, 't': 0},
    scene=dict(aspectmode="manual", aspectratio=dict(x=1, y=1, z=0.2))
)

data = [trace]

plot_figure = go.Figure(data=data, layout=layout)

# Render the plot.
plotly.offline.iplot(plot_figure)

Congratulations! You've finished the entire notebook on **Efficient 3D Deep Learning with torchsparse and SPVNAS**!

If you want to learn more, here are some papers and repos for your reference:

[1] Z. Liu, H. Tang, Y. Lin and S. Han. Point-Voxel CNN for Efficient 3D Deep Learning. In NeurIPS 2019, spotlight. [[paper]](https://arxiv.org/abs/1907.03739)

[2] H. Tang, Z. Liu, S. Zhao, Y. Lin, J. Lin, H. Wang and S. Han. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. In ECCV 2020. [[paper]](https://arxiv.org/abs/2007.16100)

[3] H. Tang, Z. Liu, S. Zhao, Y. Lin, J. Lin, H. Wang and S. Han. [GitHub Repo](https://github.com/mit-han-lab/e3d) of Efficient 3D Deep Learning Methods.