# Introduction to Point Cloud

Point cloud represents an object or scene in 3D space defined by the coordinates X, Y, Z values for each point. It may contain additional attributes such as color information (RGB), intensity, etc. The data is either generated by depth sensors, or 3D LiDAR sensors. Point Cloud data can be seen as a geometric data defined by a set containing the points.

 In the following, we visualize and explore the most used public Point Cloud Datasets:

 (1) ModelNet Dataset: The ModelNet40/10 dataset contains objects (in CAD format) from 40 or 10 classes respectively. The dataset also contains the class labels, and is used for classification. In our use case, we use the 40 classes. The data comes with a <modelnet40>.zip file. 

 The ModelNet dataset can be downloaded using the link, https://modelnet.cs.princeton.edu/
 
 (2) S3DIS Dataset: The Stanford 3D Indoor Scene Dataset contains 6 large-scale indoor areas divided into 271 rooms. The dataset represents 13 object categories, and provides the point-wise label for semantic segmentation.

 The dataset can be downloaded using the link, http://buildingparser.stanford.edu/dataset.html

## ModelNet Dataset 
#### ***How to download the dataset and prepare it accordingly?***

To download the dataset and prepare it for the training, follow the given instruction. 

1. Go the the link to download the data
https://modelnet.cs.princeton.edu

2. Unzip the data and place the unzipped directory inside data directory.
(Note: the 'data' directory is already created for you. You do not need to create it again. You need to place the unzipped folder inside this directory only.)

Note: The data directory name should be checked. It is 'modelnet40_normal_resampled' by default. If it is changed, do check the DATA_PATH in the following block.

In [1]:
### required modules
import torch
from dataloaders.ModelNetDataLoader import ModelNetDataLoader

In [2]:
### parameters to be used in Point Cloud Classification with ModelNet in 1-Classification.ipynb
class Args:
    use_cpu =False
    gpu='0'
    batch_size = 24
    model='pointnet_cls'
    num_category = 40
    epoch=200
    learning_rate=0.001
    num_point=1024
    optimizer='Adam'
    log_dir = 'runs'
    decay_rate=1e-4
    use_normals=False
    process_data=False
    use_uniform_sample=False
args = Args()

In [3]:
### ModelNet40 Dataset (classification) load from the disk

data_path = '../data/modelnet40_normal_resampled/'

train_dataset = ModelNetDataLoader(root=data_path, args=args,  split='train', process_data=args.process_data)
trainDataLoader = torch.utils.data.DataLoader(train_dataset, batch_size=args.batch_size, shuffle=True, num_workers=10, drop_last=True)

test_dataset = ModelNetDataLoader(root=data_path, args=args,  split='test', process_data=args.process_data)
trainDataLoader = torch.utils.data.DataLoader(test_dataset, batch_size=args.batch_size, shuffle=True, num_workers=10, drop_last=True)

The size of train data is 9843
The size of test data is 2468


In [4]:
### Get data from the dataloader for one iteration 

dataloader_iterator = iter(trainDataLoader)
for i in range(1):
    try:
        data, target = next(dataloader_iterator)
    except StopIteration:
        dataloader_iterator = iter(trainDataLoader)
        data, target = next(dataloader_iterator)

In [6]:
### Get the data and visualize it
import pyvista as pv
pv.set_jupyter_backend('ipygany')
import numpy as np
from utilities.data_description import class_names, class_dist

points1 = data
points = data.numpy()
class_label = target
classes = target.numpy()

print('batch_size:', args.batch_size) ## check batch size: (default: 24)
### Choose i < batch_size 
i = 12
points = points[i]
class_label = class_label[i]

color = np.ones((points.shape[0], 1))
color = color * classes

data_plt = pv.PolyData(points)
data_plt['color'] = color 
data_plt.plot()

cls_label = class_label.numpy()

print('The object belongs to class {} which has {} samples.'.format(class_names[cls_label], str(class_dist[cls_label])))

print('Point Cloud Data shape: {}.'.format(points1.shape))
print('Label shape: {}.'.format(target.shape))

batch_size: 24


AppLayout(children=(VBox(children=(HTML(value='<h3>color</h3>'), Dropdown(description='Colormap:', options={'B…

The object belongs to class table which has 492 samples.
Point Cloud Data shape: torch.Size([24, 1024, 3]).
Label shape: torch.Size([24]).


### 2. S3DIS Data



In [2]:
### import required modules
import torch
from dataloaders.S3DISDataLoader import S3DISDataset
import time
import numpy as np
import os

In [3]:
### parameters to be used in Point Cloud Segmentation 
# with ModelNet in 2-Segmentation.ipynb
class Args:
    gpu='0'
    batch_size = 16
    model='pointnet_sem_seg'
    epoch=32
    learning_rate=0.001
    num_point=1024
    optimizer='Adam'
    log_dir = 'runs'
    decay_rate=1e-4
    npoint = 4096
    step_size =10
    lr_decay = 0.7
    test_area=5
args = Args()

In [4]:
### We require a GPU to train on this dataset
# verify that we have a GPU installed

os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu

In [5]:
### S3DIS Dataset (segmentation) load from the disk
root = '../../../Codespace/data/datasets/S3DIS/scenes/'
NUM_CLASSES = 13
NUM_POINT = args.npoint
BATCH_SIZE = args.batch_size

print("start loading training data ...")
TRAIN_DATASET = S3DISDataset(split='train', data_root=root, num_point=NUM_POINT, test_area=args.test_area, block_size=1.0, sample_rate=1.0, transform=None)

trainDataLoader = torch.utils.data.DataLoader(TRAIN_DATASET, batch_size=BATCH_SIZE, shuffle=True, num_workers=10,
                                                pin_memory=True, drop_last=True,
                                                worker_init_fn=lambda x: np.random.seed(x + int(time.time())))


TEST_DATASET = S3DISDataset(split='test', data_root=root, num_point=NUM_POINT, test_area=args.test_area, block_size=1.0, sample_rate=1.0, transform=None)

testDataLoader = torch.utils.data.DataLoader(TEST_DATASET, batch_size=BATCH_SIZE, shuffle=True, num_workers=10,
                                                pin_memory=True, drop_last=True,
                                                worker_init_fn=lambda x: np.random.seed(x + int(time.time())))

start loading training data ...


100%|██████████| 204/204 [00:16<00:00, 12.54it/s]


47576 samples in train set.


100%|██████████| 67/67 [00:07<00:00,  9.09it/s]

18822 samples in test set.





In [6]:
### Get data from the dataloader for one iteration 

dataloader_iterator = iter(trainDataLoader)
for i in range(1):
    try:
        data, target = next(dataloader_iterator)
    except StopIteration:
        dataloader_iterator = iter(trainDataLoader)
        data, target = next(dataloader_iterator)

In [7]:
### Get the data and visualize it
import pyvista as pv
pv.set_jupyter_backend('ipygany')
import numpy as np
from utilities.data_description import classes_s3dis

points1 = data
points = data.numpy()
points = points[:, :, 0:3]
class_labels = target
classes = class_labels.numpy()

print('batch_size:', args.batch_size)
### Visualize the data: Choose value of i < batch_size
i = 12  # check the batch_size before assinging a value to i (default:16)
points = points[i]
color = classes[i]
data_plt = pv.PolyData(points)
data_plt.points *= 10
class_color = color.astype(int)
plotter = pv.Plotter()
plotter.add_mesh(data_plt, scalars=class_color)
plotter.show()

print('Point Cloud Data shape: {}.'.format(points1.shape))
print('Label shape: {}.'.format(target.shape))

cls_lbl = classes.flatten()
cls_lbl = set(cls_lbl)
cls_lbl = list(cls_lbl)
cls_lbl = [int(x) for x in cls_lbl]


batch_size: 16




AppLayout(children=(VBox(children=(HTML(value='<h3></h3>'), Dropdown(description='Colormap:', options={'BrBG':…

Point Cloud Data shape: torch.Size([16, 4096, 9]).
Label shape: torch.Size([16, 4096]).


In [11]:
### The classes which are present in the file
for i in range(len(cls_lbl)):
    cls_name = cls_lbl[i]
    cls_names = classes_s3dis[cls_name]
    print('The classes are {} : {}'.format(cls_name,cls_names))

The classes are 0:Ceiling
The classes are 1:Floor
The classes are 2:Wall
The classes are 3:Beam
The classes are 4:Column
The classes are 5:Window
The classes are 6:Door
The classes are 7:Table
The classes are 8:Chair
The classes are 9:Sofa
The classes are 12:Clutter


## Point Cloud and Machine Learning

Unlike images or texts, point clouds are unordered. The Convolutional Neural Networks that works on images or texts can not be directly applied to point cloud data. Modern machine learning methods solve this issue by transforming point cloud data to other representations: projections (rendering) for 2D CNNs or voxelization for 3D CNNs. This conversion is time-consuming for the training of a neural network (even in milliseconds). In this conversion process, the point cloud data may lose some of the features (of point cloud). 
Charles Qi et al. proposed PointNet to solve this issue by considering point clouds as a set of points. This network directly consumes the point clouds and is invariant to the points permutations. This unified architecture is well presented with different vision tasks for point clouds such as classification, segmentation, and part-segmentation. Later, PointNet++ was introduced by the same authors, which is an extension of PointNet. Unlike PointNet, it tries to accumulate the local features (point features) for classification.

In the following, we will see how the PointNet architecture looks like. We will try to understand the intuition behind the network and what it does. This network can be used for features extraction from raw point cloud data.


![image](../.description/pointnet.png)


Reference: Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652-660).

In [15]:
# import the PointNet semantic segmentation model
from models.pointnet_sem_seg import get_model, get_loss

In [14]:
NUM_CLASSES = 13

classifier = get_model(NUM_CLASSES).cuda()
print(classifier)

get_model(
  (feat): PointNetEncoder(
    (stn): STN3d(
      (conv1): Conv1d(9, 64, kernel_size=(1,), stride=(1,))
      (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
      (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
      (fc1): Linear(in_features=1024, out_features=512, bias=True)
      (fc2): Linear(in_features=512, out_features=256, bias=True)
      (fc3): Linear(in_features=256, out_features=9, bias=True)
      (relu): ReLU()
      (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (conv1): Conv1d(9, 64, kernel_size=(1,), stride=(1,))
    (c

### Parameters in the PointNet Network
In the following, we will see the layers, output shape, and the number of parameters per layer. We will also see the total number of parameters contained in the network.

In [15]:
### import the required modules
from torchsummary import summary
classifier = classifier.cuda()
summary(classifier, (9, 1024))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv1d-1             [-1, 64, 1024]             640
       BatchNorm1d-2             [-1, 64, 1024]             128
            Conv1d-3            [-1, 128, 1024]           8,320
       BatchNorm1d-4            [-1, 128, 1024]             256
            Conv1d-5           [-1, 1024, 1024]         132,096
       BatchNorm1d-6           [-1, 1024, 1024]           2,048
            Linear-7                  [-1, 512]         524,800
       BatchNorm1d-8                  [-1, 512]           1,024
            Linear-9                  [-1, 256]         131,328
      BatchNorm1d-10                  [-1, 256]             512
           Linear-11                    [-1, 9]           2,313
            STN3d-12                 [-1, 3, 3]               0
           Conv1d-13             [-1, 64, 1024]             640
      BatchNorm1d-14             [-1, 6

### The transform networks
PointNet utilizes two T-Nets (Transform Networks). The first T-Net is applied to the input data (raw point cloud), while the second T-Net is applied to the features. Thus, the first T-Net is the input Transform Network, and the second T-Net is the feature transform network. These networks output an unconstrained affine transformation.

In the following, we will see the network architecture for T-Nets in PointNet. We will also see the number of layers in both the T-Nets with their respective parameters and output shapes.

#### ***T-Net on Inputs***

In [22]:
# import the T-Net input
from models.pointnet_utils import STN3d
inp_trans = STN3d(channel=3)
print(inp_trans)

STN3d(
  (conv1): Conv1d(3, 64, kernel_size=(1,), stride=(1,))
  (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
  (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
  (fc1): Linear(in_features=1024, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=256, bias=True)
  (fc3): Linear(in_features=256, out_features=9, bias=True)
  (relu): ReLU()
  (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)


In [23]:
from torchsummary import summary
inp_trans_sum = inp_trans.cuda()
summary(inp_trans_sum, (3,1024))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv1d-1             [-1, 64, 1024]             256
       BatchNorm1d-2             [-1, 64, 1024]             128
            Conv1d-3            [-1, 128, 1024]           8,320
       BatchNorm1d-4            [-1, 128, 1024]             256
            Conv1d-5           [-1, 1024, 1024]         132,096
       BatchNorm1d-6           [-1, 1024, 1024]           2,048
            Linear-7                  [-1, 512]         524,800
       BatchNorm1d-8                  [-1, 512]           1,024
            Linear-9                  [-1, 256]         131,328
      BatchNorm1d-10                  [-1, 256]             512
           Linear-11                    [-1, 9]           2,313
Total params: 803,081
Trainable params: 803,081
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/

#### ***T-Net on Features***

In [32]:
from models.pointnet_utils import STNkd
feat_trans = STNkd(k=64)
print(feat_trans)


STNkd(
  (conv1): Conv1d(64, 64, kernel_size=(1,), stride=(1,))
  (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
  (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
  (fc1): Linear(in_features=1024, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=256, bias=True)
  (fc3): Linear(in_features=256, out_features=4096, bias=True)
  (relu): ReLU()
  (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)


In [33]:
from torchsummary import summary
feat_trans_sum = feat_trans.cuda()
summary(feat_trans_sum, (64,1024))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv1d-1             [-1, 64, 1024]           4,160
       BatchNorm1d-2             [-1, 64, 1024]             128
            Conv1d-3            [-1, 128, 1024]           8,320
       BatchNorm1d-4            [-1, 128, 1024]             256
            Conv1d-5           [-1, 1024, 1024]         132,096
       BatchNorm1d-6           [-1, 1024, 1024]           2,048
            Linear-7                  [-1, 512]         524,800
       BatchNorm1d-8                  [-1, 512]           1,024
            Linear-9                  [-1, 256]         131,328
      BatchNorm1d-10                  [-1, 256]             512
           Linear-11                 [-1, 4096]       1,052,672
Total params: 1,857,344
Trainable params: 1,857,344
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.25
Forw

### The PointNet Encoder (provides the Global Features)

The PointNet encoder extracts the global features from raw point cloud data. However, it fails to consider the local features. In the classification network, the global features are extracted using the encoder. It does not consider the local features. On the other hand, in the semantic segmentation task, the global features from the encoder are concatenated with the point-wise features from the features transform network.

In [35]:
### import the required PointNet encoder module
from models.pointnet_utils import PointNetEncoder
encoder = PointNetEncoder(global_feat=True, feature_transform=False, channel=3)
print(encoder)

PointNetEncoder(
  (stn): STN3d(
    (conv1): Conv1d(3, 64, kernel_size=(1,), stride=(1,))
    (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
    (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
    (fc1): Linear(in_features=1024, out_features=512, bias=True)
    (fc2): Linear(in_features=512, out_features=256, bias=True)
    (fc3): Linear(in_features=256, out_features=9, bias=True)
    (relu): ReLU()
    (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (bn4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (bn5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv1): Conv1d(3, 64, kernel_size=(1,), stride=(1,))
  (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))

In [29]:
from torchsummary import summary
encoder_sum = encoder.cuda()
summary(encoder_sum, (3,1024))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv1d-1             [-1, 64, 1024]             256
       BatchNorm1d-2             [-1, 64, 1024]             128
            Conv1d-3            [-1, 128, 1024]           8,320
       BatchNorm1d-4            [-1, 128, 1024]             256
            Conv1d-5           [-1, 1024, 1024]         132,096
       BatchNorm1d-6           [-1, 1024, 1024]           2,048
            Linear-7                  [-1, 512]         524,800
       BatchNorm1d-8                  [-1, 512]           1,024
            Linear-9                  [-1, 256]         131,328
      BatchNorm1d-10                  [-1, 256]             512
           Linear-11                    [-1, 9]           2,313
            STN3d-12                 [-1, 3, 3]               0
           Conv1d-13             [-1, 64, 1024]             256
      BatchNorm1d-14             [-1, 6