# Implementing VoxelNet Feature Encoder

Implementation details, I referenced the following:

1. https://github.com/baudm/VoxelNet-Keras/blob/master/model.py (MIT license)
2. https://github.com/steph1793/Voxelnet/blob/master/model.py (GPL license)
3. https://github.com/qianguih/voxelnet (No license)
4. The paper https://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_VoxelNet_End-to-End_Learning_CVPR_2018_paper.pdf

I am focusing on the feature encoding layer here specifically!

Also, we'll need to look at the NuScenes data descriptor (https://www.nuscenes.org/nuscenes) eventually. For now, let's assume some pointclouds.

---



# 1. Feature learning network

The feature learning network performs the following steps:

## Partitioning, grouping

Given a pointcloud, quantize the space into equally sized voxels, with range $D, H, W$ along axes $Z, Y, X$, and each voxel having size $v_D, v_H, v_W$ resulting in a 3D grid of size.

All points $p = (x, y, z)$ corresponding to a given $i, j, k$ in the grid are considered 'grouped'.

**Partitioning spec for Car Detection**

| Axis                           | $Z (D)$    | $Y (H)$     | $X (W)$    |
| ------------------------------ | ---------- | ----------- | ---------- |
| Range  $(D, H, W)$ 			 | $[-3, 1]$  | $[-40, 40]$ | $[0, 70.4]$|
| Voxel sizes $(v_D, v_H, v_W)$  | $0.4$      | $0.2$       | $0.2$      |
| Grid size  $(D', H', W')$      | $10$       | $400$       | $352$      |


**Partitioning spec for Pedestrian and Cyclist detection**

| Axis                           | $Z (D)$    | $Y (H)$     | $X (W)$    |
| ------------------------------ | ---------- | ----------- | ---------- |
| Range  $(D, H, W)$ 			 | $[-3, 1]$  | $[-20, 20]$ | $[0, 70.4]$|
| Voxel sizes $(v_D, v_H, v_W)$  | $0.4$      | $0.2$       | $0.2$      |
| Grid size  $(D', H', W')$      | $10$       | $200$       | $240$      |

For range $[a, b]$ and size $v$, the transform here is `(x - a) // v`, discarding `x` outside of the range `[a, b]` of course.

We can define this function, $$quantize(x, a, b v) = (x - a) // v$$

In [1]:
def quantize(x, a = - 3, b = 1, v = 0.4):
    '''Given a floating value x, quantize its value within
    range [a, b] with spacing v.
    
    :param x: The value to quantize.
    :type x: float
    :param a: The left-bound of the range.
    :type a: float
    :param b: The right-bound of the range.
    :type b: float
    :param v: The size of each quantize
    :type v: float
    
    :return: The quantized index of x
    :rtype: int
    
    Examples:
    >>> quantize(x = -3.0, a = - 3, b = 1, v = 0.4)
    0
    >>> quantize(x =  1.0, a = - 3, b = 1, v = 0.4)
    9
    >>> quantize(x =  0.3, a = - 3, b = 1, v = 0.4)
    8
    '''
    return int((x - a) // v)


## Sampling

After this, for each cell that has more than $T$ points ($T = 35$ for vehicle and $T = 45$ for pedestrian), randomly sample $T$ points from that cell.

Custom implementation: Rather than collect the points and *then* shuffle, we *shuffle* as we collect points. The idea is is to keep track of how many points we added to a cell, and if we've added more than `T` to that cell, randomly replace one element in that cell.

Because we randomly shuffle the points before doing this, there's no risk of bias.

1. Instantiate `G = (D', H', W', T, 4)` array of floats.
2. Instantiate `N = (D', H', W')` array of ints.
3. Randomly shuffle indices for incoming points `P`.
4. For each point `p in P` to add to the grid,
    1. Get indices `(i, j, k)` per the function `quantize` above.
    2. Let `t = N[z, y, x]`. Iterate `N[z, y, x] += 1`.
    3. If `t >= T`, choose `t = random_int(0, T)`.
    4. Set `G[z, y, x, t, :] = (z, y, x, r)`.

