# SPVNAS

- Utilities
1. Coordinate Hashing
2. Voxelization
- Modules
0. Sparse Tensor
1. Sparse Convolution
2. Sparse Point-Voxel Convolution (SPVConv) 

In [1]:
import random

import torch
import torch.nn as nn
import numpy as np
print(torch.__version__)

1.12.1


#### Install PyTorch Scatter Library (Linux, MacOS, and Windows are supported)
- Installation guide: https://github.com/rusty1s/pytorch_scatter

In [2]:
!pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cpu.html

Looking in links: https://data.pyg.org/whl/torch-1.12.1+cpu.html
[0m

In [3]:
import torch_scatter

#### There are several libraries to support `sparse convolution`:
- MinkowskiEngine: https://github.com/NVIDIA/MinkowskiEngine
- TorchSparse: https://github.com/mit-han-lab/torchsparse
- SpConv: https://github.com/traveller59/spconv

This notebook uses MinkowskiEngine.

#### Install MinkowskiEngine Library (Only Ubuntu is supported)
- Installation guide: https://github.com/NVIDIA/MinkowskiEngine
- Since we do not use GPU acceleration, cpu-only version is fine.

#### Full commands I have used for the environment:
On Ubuntu 18.04,
```
$ conda create -n fastcampus python=3.7 -y
$ conda activate fastcampus
(fastcampus) $ conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
(fastcampus) $ conda install openblas-devel -c anaconda -y
(fastcampus) $ pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cpu.html
(fastcampus) $ pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--cpu_only"
(fastcampus) $ pip install notebook
```

In [4]:
import MinkowskiEngine as ME

  "It is recommended to set it below 24.",
  "If you want to compile with CUDA support, make sure `torch.cuda.is_available()` is True when you install MinkowskiEngine.",


## Utilities

### 1. Coordinate Hashing (NumPy)
- Input: voxel indices (N, 3), dtype: int
- Output: the output hash key (N,) dtype: uint64

In [5]:
def ravel_hash(voxel_indices):
    # 1. Find the maximum value of each axis.
    max_index = np.max(voxel_indices, axis=0).astype(np.uint64) + 1
    
    # 2. Hashing
    keys = np.zeros(len(voxel_indices), dtype=np.uint64)
    for d in range(voxel_indices.shape[1] - 1): # dimension
        keys += voxel_indices[:, d]
        keys *= max_index[d + 1]
    keys += voxel_indices[:, -1]
    
    return keys

In [6]:
N = 100

voxel_indices = np.random.randint(low=0, high=50, size=(N, 3), dtype=np.uint64)
keys = ravel_hash(voxel_indices)
print(keys.shape)
print(keys) # Note that there may be some points whose hash keys are the same.

(100,)
[ 80124 101522  27085 104077  39531  83395 114710  75434  52978  98661
  78653   9868  12972   2216 123514  29779  19242  80672  81327 114496
 108399 109561  40284  32324 124171  96369   3259  36842 109757 118175
 120817  12114  17455  96765  34064  43451  94597 124493  61401  31135
  37963 118940 123780 115904  55155  17586  80579  65816   7538  38793
  54235 121742  95221 110037  64789 100083 111279  49505  46166  87573
  74684  31817  77589  92719  87408  18746  92182 124149 100910  74573
  18776  23072 117475  81395  43131  32169  75392  44292 119008 112761
 110113  21827 118341  73797 107387  36226  38249  42403  14755 107365
   8556  24736  14855 112135  97791  98972  25940  21286  72179 112503]


### 2. Voxelization (NumPy)
- Input: points (N, 3), voxel size
- Output: origin (3,), voxel indices (M, 3), unique mapping (M,), inverse mapping (N,)

The unique mapping is an index mapping from point indices to (unique) voxel indices. \
The inverse mapping is the inverse of unique mapping (this indicates which voxel a point corresponds to).

In [7]:
def voxelize(points, voxel_size):
    # 1. Make all the coordinates positive
    origin = np.min(points, axis=0)
    points = points - origin
    
    # 2. Make the voxel indices and hash keys
    voxel_indices = np.floor(points / voxel_size).astype(np.uint64)
    keys = ravel_hash(voxel_indices)
    
    # 3. Find the unique voxel indices and the mappings.
    _, unique_mapping, inverse_mapping = np.unique(keys, return_index=True, return_inverse=True)
    unique_voxel_indices = voxel_indices[unique_mapping]
    
    return origin, unique_voxel_indices, unique_mapping, inverse_mapping

In [8]:
N = 100
l = 1. # voxel size

points = np.random.randn(N, 3)
origin, voxels, unique_map, inverse_map = voxelize(points, l)
print(origin)
print(voxels.shape)
print(unique_map.shape)
print(inverse_map.shape)

[-2.49306876 -2.29741528 -2.54250384]
(51, 3)
(51,)
(100,)


#### Why `origin` is needed?

In [9]:
# The origin can be used to know the Euclidean coordinates of voxel centers.
# (0, 0, 0) -> origin
# the center of [0, 0, 0] voxel = (0.5, 0.5, 0.5)

voxel_index = voxels[0]
print(voxel_index)
voxel_center = (voxel_index + 0.5) * l + origin
print(origin)
print(voxel_center)

[0 0 0]
[-2.49306876 -2.29741528 -2.54250384]
[-1.99306876 -1.79741528 -2.04250384]


#### The (pooled) voxel features can be calculated from point-wise features using `torch_scatter`!

In [10]:
N = 100
C = 4 # the dimension of point-wise features
l = 1. # voxel size

points = torch.randn(N, 3)
features = torch.randn(N, C)

# First, voxelize the points.
origin, voxels, unique_map, inverse_map = voxelize(points.numpy(), l)
M = len(voxels)
print(f"[voxelization] {N} points -> {M} voxels")

# Then, calculate voxel features.
# Option 1: MaxPool
voxel_features, _ = torch_scatter.scatter_max(features, torch.from_numpy(inverse_map), dim=0, dim_size=M)
print(voxel_features.shape)

# Option 2: AvgPool
voxel_features = torch_scatter.scatter_mean(features, torch.from_numpy(inverse_map), dim=0, dim_size=M)
print(voxel_features.shape)

[voxelization] 100 points -> 53 voxels
torch.Size([53, 4])
torch.Size([53, 4])


## Modules

### 0. Sparse Tensor
- Sparse Tensor is a basic data structure for sparse convolution and sparse point-voxel convolution.
- MinkowskiEngine and TorchSparse support Sparse Tensor with the same API.
- Here is a common process to convert an input point cloud with features into the sparse tensor.

In [11]:
N1 = 100
C = 4 # the dimension of point-wise features
l = 1. # voxel size

points = torch.randn(N, 3)
features = torch.randn(N, C)

# Make a TensorField (setup for sparse quantization). You can consider this as a point cloud
tfield = ME.TensorField(
    features=features,
    coordinates=ME.utils.batched_coordinates([points / l], dtype=torch.float32)
)
# TensorField -> Sparse Tensor
stensor = tfield.sparse()

print(stensor)

SparseTensor(
  coordinates=tensor([[ 0, -3,  0, -1],
        [ 0,  0, -1,  1],
        [ 0,  0, -1, -1],
        [ 0, -2,  0,  2],
        [ 0, -1, -2, -1],
        [ 0,  0,  0, -1],
        [ 0,  0, -1,  0],
        [ 0,  0, -2, -2],
        [ 0,  0, -2,  0],
        [ 0, -1, -1, -2],
        [ 0, -2,  0,  0],
        [ 0,  0,  0,  1],
        [ 0, -1,  0,  0],
        [ 0, -1, -2, -2],
        [ 0, -2, -1, -1],
        [ 0,  0,  0,  2],
        [ 0, -1, -2,  0],
        [ 0, -1,  0, -1],
        [ 0,  0,  0,  0],
        [ 0,  0,  1,  0],
        [ 0, -1, -1,  1],
        [ 0, -2,  0, -1],
        [ 0, -1, -1, -1],
        [ 0,  0, -2, -1],
        [ 0, -2, -1,  0],
        [ 0,  2, -2,  0],
        [ 0, -2, -2, -2],
        [ 0,  1, -2, -1],
        [ 0, -1,  1,  1],
        [ 0, -3, -1,  0],
        [ 0, -1, -1,  0],
        [ 0, -1,  0,  2],
        [ 0,  0, -1, -2],
        [ 0, -1, -1,  2],
        [ 0,  1,  0,  0],
        [ 0, -1,  0, -2],
        [ 0,  0, -1, -3],
        [ 

### 1. Sparse Convolution
- Both MinkowskiEngine and TorchSparse supports sparse convolution.

In [13]:
sparse_conv = ME.MinkowskiConvolution(C, 2*C, kernel_size=3, stride=1, dimension=3)

out_stensor = sparse_conv(stensor)
print(out_stensor.C.shape) # voxel indices: batch_idx + ijk
print(out_stensor.F.shape) # voxel features: 2*C
print(out_stensor)

torch.Size([46, 4])
torch.Size([46, 8])
SparseTensor(
  coordinates=tensor([[ 0, -3,  0, -1],
        [ 0,  0, -1,  1],
        [ 0,  0, -1, -1],
        [ 0, -2,  0,  2],
        [ 0, -1, -2, -1],
        [ 0,  0,  0, -1],
        [ 0,  0, -1,  0],
        [ 0,  0, -2, -2],
        [ 0,  0, -2,  0],
        [ 0, -1, -1, -2],
        [ 0, -2,  0,  0],
        [ 0,  0,  0,  1],
        [ 0, -1,  0,  0],
        [ 0, -1, -2, -2],
        [ 0, -2, -1, -1],
        [ 0,  0,  0,  2],
        [ 0, -1, -2,  0],
        [ 0, -1,  0, -1],
        [ 0,  0,  0,  0],
        [ 0,  0,  1,  0],
        [ 0, -1, -1,  1],
        [ 0, -2,  0, -1],
        [ 0, -1, -1, -1],
        [ 0,  0, -2, -1],
        [ 0, -2, -1,  0],
        [ 0,  2, -2,  0],
        [ 0, -2, -2, -2],
        [ 0,  1, -2, -1],
        [ 0, -1,  1,  1],
        [ 0, -3, -1,  0],
        [ 0, -1, -1,  0],
        [ 0, -1,  0,  2],
        [ 0,  0, -1, -2],
        [ 0, -1, -1,  2],
        [ 0,  1,  0,  0],
        [ 0, -1,  0, -

### 2. Sparse Point-Voxel Convolution

In [16]:
class SPVConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size):
        super(SPVConv, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        
        # voxel branch (sparse convolution)
        self.sparse_conv = ME.MinkowskiConvolution(in_channels, out_channels, kernel_size=kernel_size, dimension=3)
        # point branch (shared MLP)
        self.mlp = nn.Sequential(
            ME.MinkowskiLinear(in_channels, out_channels, bias=False),
            ME.MinkowskiBatchNorm(out_channels),
            ME.MinkowskiReLU(True),
            ME.MinkowskiLinear(out_channels, out_channels)
        )
        
    def forward(self, tfield: ME.TensorField):
        # 1. Voxelization
        stensor = tfield.sparse()
        
        # 2. Feed-forward: voxel branch and point branch
        out_stensor = self.sparse_conv(stensor)
        out_tfield = self.mlp(tfield)
        
        # 3. Devoxelize the output sparse tensor to fuse with the output tensor field.
        interp_features, _, interp_map, interp_weights = ME.MinkowskiInterpolationFunction().apply(
            out_stensor.F, out_tfield.C, out_stensor.coordinate_key, out_stensor.coordinate_manager
        )
        
        # 4. Fuse the outputs.
        out = out_tfield.F + interp_features
        
        return out

In [17]:
N1 = 100
C = 4 # the dimension of point-wise features
l = 1. # voxel size

points = torch.randn(N, 3)
features = torch.randn(N, C)

tfield = ME.TensorField(
    features=features,
    coordinates=ME.utils.batched_coordinates([points / l], dtype=torch.float32)
)
spv_conv = SPVConv(C, 2*C, kernel_size=3)
out_features = spv_conv(tfield)

print(out_features.shape)

torch.Size([100, 8])
